Sunday, March 25, 2012

Dimension theory design question.

If I have a sales fact table with 1000 sales to 20 different customers would I have 20 rows in my customer dimension table or 1000 rows? Each sales row has a customer, but many would be duplicates. Do I add a row to the customer dimension for each sale or each customer?

Thanks.

Hello! You will have 20 customer records in your customer dimension table and 1000 records in your sales fact table.

You will have a one to many relation between the customer dimension table and the fact table.

Select Distinct (TSQL) will help you with duplicates in the customer dimension.

HTH

Thomas Ivarsson

|||

Thanks.

To recap: I have an identity integer field for the surrogate key in the dimCustomer table and a business key field. I have two choices for the business key: I can use the CustomerID or the SalesNum from the OLTP. I gather from your response that I should use the CustomerID as the business key. Then when I load the dimension table, if a new sale goes to an existing customer, a new row will NOT be added to the dimCustomer table. When I subsequently load my fact table, I'll use the CustomerID in my sales row to point to the business key in the dimension table and retrieve the surrogate key which will be loaded into the fact table as the foreign key to dimCustomer and part of my aggregate key in the factSales table.

Did I say that right? (There's a whole lot of keys goin on.)

|||

Correct! You will only add customers when the customer business key is not in that dimension table.

Yes, it is good design to use integers, non business keys, as a primary key(dimension table) and foreign key(fact table)

You load the dimension table first and the fact table after and you update the keys in the way you have described.

The fact table will only have the surrogate keys and the dimension table both the business key and the surrogate key.

Regards

Thomas Ivarsson

No comments:

Post a Comment