Tuesday, March 27, 2012

Dimensions from Fact tables

Most of the Fact tables I'm building seem to have a bunch of categorization type columns in addition to the numeric columns that naturally fit into the Fact tables. However it feels "wrong" to create dimensions from a Fact table, so instead I'm getting the ETL to split the tables: for every FactX table I get a DimXInfo table that has the non-numeric/non-key data.

E.g.
FactDeal includes dealID, price, quantity, product key, customer key (and other keys that relate to shared dimensions.
DimDealInfo table includes dealID, ContractNumber, DealType

Then in the SSAS cube DimDealInfo naturally forms a Dimension and FactDeal naturally forms a measure group. And you don't have to create a SSAS Dimension from the Fact table.

Question is does this way of doing it make sense. It makes life easier at cube level but at ETL level you're getting two tables that are one to one relationship which also feels kind of "wrong". Any suggestions ?

Yes, in my case it does make sense. I use it with weblogs, and if I want to parse these into a datacube, I need to make keys for every occurence with the global @.@.IDENTITY variable.
Wich I think is really wrong.|||

Degenerate dimensions are not evil!

Indeed there are some business scenario where they are useful and sometimes the only way to model your data.

Using it is a "simple" choice if you have some alternatives (but if you have alternatives maybe you should not use DD!) or a "must" if you haven't.

In my opinion if it seems your unique solution and you don't have performance or other kind of problem (during ETL and process)...why not?

|||

Hello. This is a link to a good article about a method of what to do with fact or degenerate dimensions. It will be appropriate for SSAS2005:

http://www.intelligententerprise.com/000320/webhouse.jhtml

I think that you are on the right track.

HTH

Thomas Ivarsson

|||

Thanks everyone. I think I'm getting closer to the answer now! Yes it is degenerate dimensions I'm dealing with. In some cases I have followed Ralph Kimball's suggestion of a "Junk Dimension" to group together a few apparently unrelated items and get them out of the fact table. This works for items where there are few distinct values. (I think I should move e.g. DealType to a "Junk Dimension")

I guess the remaining item really is "ContractNumber". There is going to be a separate distinct value for every row in the Fact Table. My reasons for taking it out to a separate Dimension table are 1) Remove all text/character based columns from Fact Table to make it run faster and 2) because I need users to be able to drill right down to individual deal level when they browse a cube and the only way I can see to do that is inlcude it as a dimension.

I'm guessing that my reason 1 is not valid because we're only talking one or maybe a few columns, so probably won't have significant performance hit. But what about reason 2, can you some how drill down to a descriptive field of individual row without having to make that field a dimension ? There was something in Analysis Services 2000 that let you view all the rows that made up a specific value but can't see it in SQL 2005 and besides didn't have a client to handle it, so in the end I added ContractNumber as a dimension so any client can drill down to single row level. (Clients I'm using are : SSRS and Excel 2003).

|||

Hello.

Perhaps you are thinking about drillthrough? You can find them under actions in the cube editor.

From a contract number/id you can probably create artifical levels above the single contract by using the first two/three/four characters as levels. It is better to build these levels in a dimension.

You can create these levels in named calculations by using the TSQL function LEFT().

Regards

Thomas Ivarsson

No comments:

Post a Comment