Tuesday, March 27, 2012

Dimensional Design Question

I have done some reading on dimensional modeling and am a bit confused as to
how I create a dimension in our data warehouse using our transactional source.
I have a customer table which contains the customer id, customer name,
region codes and industry codes. Naturally I also have a region table and
industry table which has the region/industry descriptions.
I have read that the data warehouse should be designed using a star schema.
If that is the case, should I be creating a customer dimension with the
customer id, customer name, region desc, and industry name? Or should I
have a customer dimension, an industry dimension, and a region dimension?
It depends on the kind of your analysis.
If Region and Industry are attributes of the customer and not
independent attributes of the fact you want to measure, they should be
normalized into the Customer dimension. If you want to track historycal
changes of Region/Industry of the customer, you should use SCD Type II
(Slowly Changing Dimension).
Only in particular scenarios you should consider a snowflake schema
instead of a star schema, but this is an exception and in my experience
you can safely avoid it in most of the models you have to manage.
I suggest you to read Kimball's books
(http://www.kimballgroup.com/html/books.html) as a reference for DW
modeling.
Marco Russo
http://www.sqlbi.eu
http://www.sqljunkies.com/weblog/sqlbi
JP wrote:
> I have done some reading on dimensional modeling and am a bit confused as to
> how I create a dimension in our data warehouse using our transactional source.
> I have a customer table which contains the customer id, customer name,
> region codes and industry codes. Naturally I also have a region table and
> industry table which has the region/industry descriptions.
> I have read that the data warehouse should be designed using a star schema.
> If that is the case, should I be creating a customer dimension with the
> customer id, customer name, region desc, and industry name? Or should I
> have a customer dimension, an industry dimension, and a region dimension?
|||thanks marco. i do have the msft data warehouse toolkit but it doesn't
address this issue directly.
the region and industry are attributes of the customer so sounds like i
should have them "normalized" within the customer attribute. the fact table
is based on customer but i want to be able to roll-up the data to report by
industry, region, sub-industry, etc.
one item i left out was sub-industry. each sub-industry belongs to an
industry. so my customer dimension should look like the following?
customerkey (surrogate)
customerid (business key from source system)
region
industry
subindustry
"Marco Russo" wrote:

> It depends on the kind of your analysis.
> If Region and Industry are attributes of the customer and not
> independent attributes of the fact you want to measure, they should be
> normalized into the Customer dimension. If you want to track historycal
> changes of Region/Industry of the customer, you should use SCD Type II
> (Slowly Changing Dimension).
> Only in particular scenarios you should consider a snowflake schema
> instead of a star schema, but this is an exception and in my experience
> you can safely avoid it in most of the models you have to manage.
> I suggest you to read Kimball's books
> (http://www.kimballgroup.com/html/books.html) as a reference for DW
> modeling.
> Marco Russo
> http://www.sqlbi.eu
> http://www.sqljunkies.com/weblog/sqlbi
>
> JP wrote:
>
|||Yes, you're right.
Flatten all customer attributes (Region, Industry, Subindustry, ...)
into Customer dimension.
You will find it's a very good schema to query now and to maintain in
the future, when you will add some other attributes.
Just to make an example: if you would have started with several tables,
imagine how adding a subindustry attribute would have impacted on your
schema and on your ETL implementation...
Marco Russo
http://www.sqlbi.eu
http://www.sqljunkies.com/weblog/sqlbi
JP wrote:[vbcol=seagreen]
> thanks marco. i do have the msft data warehouse toolkit but it doesn't
> address this issue directly.
> the region and industry are attributes of the customer so sounds like i
> should have them "normalized" within the customer attribute. the fact table
> is based on customer but i want to be able to roll-up the data to report by
> industry, region, sub-industry, etc.
> one item i left out was sub-industry. each sub-industry belongs to an
> industry. so my customer dimension should look like the following?
> customerkey (surrogate)
> customerid (business key from source system)
> region
> industry
> subindustry
>
> "Marco Russo" wrote:

No comments:

Post a Comment