Friday, February 24, 2012

Different Granularity in the Time Dimension

This is, I think, a commonly asked question. But from what I can see it is one for which a clearly defined answer is hard to find.

What I am trying to achive is essentially the ability to analyse a number of fact tables with the same time dimension using different grains. For the sake of simplicity let us consider the problem one of analysing one fact table down to the level of a week [ie Year -> Quarter - Month ->Week> and the second merely down to the level of a month.

I had an existing cube for which the more fine grained analysis was already working. Based on the example in the AdventureWorks sample I was confident I could work out how to add the new fact table and wire it up to the existing time dimension for analysis down to a monthly level. However, on changing the dimension attributes from the default key to a compound relationship based on Calendar Year + Month I started to see incorrect roll-ups as per other posts I have seen on this. I checked the keys etc for these various levels but nothing seemed to work.

So I pared the problem down to the bare bones in a sample test cube. in it I have two dimensions - dimJob and dimTime. In addition to standard surrogate jeys the former has a single attribut of job ref and the latter attributes of calendar year and calendar quarter. These are linked to a single fact table - factJobCosts with time and job keys, calendar year + quarter attributes and the value which will make up the single measure, cost.

The dimTime is loaded up with data for 2001-2008. The job table is populated with a single row. I then arrange for the fact table to receive data for each month in the years 2006/7 - the same value of 20.0 just to make life simple.

Using the default time key the browsed cube shows the correct aggregations for each of the four quarters in the years 2007 and 2008; in my real system which has a grain of week available the drill down to the week containing the dat value used is also possible. If I change the dimension relationships to use the calendar year the roll-ups remain correct.

But if I change the dimension relationships to use calendar quarter - having set up calendar quarter in dimTime to be keyed on year, quarter then I see aggregates for all seven years.

Why is this happening? I have poured over the AdventureWroksd sample and all the online tutorial adn I am at a lost for which bit I have missed out. My time dimension is slightly different but is essentially the same and I have compared everything with the working example.

I also tried building a very cutdown version of AdventureWorks with only the quota, employee and time dimensions and was interested to see that this had issues with the aggregates at the annual level.

It would appear that there is a level of magic in the AdventureWorks sample that I havent picked up on. Anyone got an idea what that might be?

Any help gratefully received!

Cheers

Paul

Is it possible to post a simple example of the output you are seeing? It sounds like there might be an issue with how your attribute relationships are defined, but I'm not sure I understand your exact problem enough to suggest where the issue might be. I'm also assuming that you have some sort of "adjusted month" concept that makes the weeks fit evenly into the months.

There is no real magic in Adventure Works, it's all just smoke and mirrors

|||

Hi Darren

Thanks for the reply. I have always found the AdventureWorks sample very straightforward and very useful; I have usefully found what I needed somewhere in the tutoral or by combing through the example code but this one has got me stumped

In my simple test scenario, using the default dimension usage/relationship - I see the following results for the selection of job vs years -

A123 2006 160 2007 160

which when drilled down to quarters looks like

A123 2006 Q1 20 Q2 20 Q3 20 Q4 20 2007 Q1 20 Q2 20 Q3 20 Q4 20

This is what I would expect.

But I want to redefine the relationship in the dimension usage to calendar quarter in anticipation of its use in the real world context of the time dimension having a finer grain down to week level.

When I do that - ie when the granularity attribute for the relationship between time and fact table is changed to CalendarQuarter with dimension/measure columns CalendarYear, CalendarQuarter I see a different result set, in which all the years are populated with the 'default' aggregate:

A123 2001 160 2002 160 2003 160 2004 160 2005 160 2006 160 2007 160 2008 160

When I browse this I see that I can drill down to the quarters in 2006 and 2007 to see the correct values at that level and that the other years - which shouldnt be in the result set at all - dont drill down past the year level. That is, the result set is almost but not quite right! Its almost as if the roll-up above the grain for years which have no rows in the fact table.

Just to clarify. The simple example here has a time dimension which has levels of year and quarter, no weeks and months, though in the real world example I will need to include those in order to support other fact tables that need to drill down to a finer grain. My thoughts are that if I can get this to work for calendar quarter - a la AdventureWorks which does the same for the Sales Quota/Employee then I could repeat the process for CalendarMonth.

Regards

Paul

|||

I would have expected the first query to have returned the following if you had 20 in each quarter.

A123 2006 80 2007 80

Not 160 for each year.

It "smells" like an issue with attribute relationships, but I'm not quite sure what the exact explaination is at the moment. Attribute relationships did take a bit of work to get the hang of. I think you almost need to break one to really start getting the hang of them .

Is this roughly what your dimension looks like at present?

Calendar Year

(no relationships)

Calendar Quarter (key attribute with a composite key of Year, Quarter)

Calendar Year

|||Erm. good point re the first query. Maths were never my stroing card. In mitigation I have been staring at this one for so long that the fact that rows were displayed for those years alone was enough for me to conclude that it had worked!

Well I think I am homing in on the problem. Your post pushed me in the direction of the dimension attributes and although I had set up the composite key for CalendarQuarter the relationship with CalendarYear had gone AWOL; dragging CalendarYear onto a new attribute relationship in the attribute tree and a reprocess did the trick. Values for 2006,2007; 20.0 per quarter.

Armed with this I returned to my real work example and, to cut a long story short :-) I'm partially there. In this cube time goes down to week and I have two heirarchies - Y->Q->M->W and Y->M->W. With the correct key and attribute relationships defined for CalendarMonth the roll-ups for each year are now correct and drilldown to month is now correct. But it is now at the month level at which the aggregation is now incorrect . For each year in which there is data in the fact table each month is showing as the aggregate for the year itself. [IE if I put a single entry into the fact table for 2004 of 250.00 I see a monthly aggregate of 250.00 for each of the months in 2004.]

So it looks like I havent got the attributes right for the relationships below month - presumably month->week. I'm in the middle of looking at this but thought I would post my progress...

Thanks for your help

Paul

|||Should also add that the roll-ups from Year->Quarter are also correct. Is just at the month level that we have a problem.|||

What attribute is set as the key?

And how are you handling the week to month rollup? Are you using some sort of adjusted months so that they line up with the week boundaries?

|||

I'm not at work at the moment so I'm doing this from memory...

Do you mean the key attribute for the time dimension? If so the primary key is a surrogate key based on an identity column.

The week -> month rollup is done purely by the attribute relationship between the two. I have presumed that if a relationship exists between them then aggregation up to a monthly level will be driven by that. Even if one week starts say in Jan and ends in Feb then if the monthly attribute for that row is Jan then that week is rolled up into Jan. I have, however, suspected that that might be a naiive assumption ... If a week falls near a month boundary such that one year it is in one month and another year it is another does this hold true?

BTW From what I can recall the Calendar Quarter attribute is keyed on Year, Quarter and Calendar Month on Year, Month.

I'll go over this again when I get in the office tomorrow,

|||

The week -> month rollup is done purely by the attribute relationship between the two. I have presumed that if a relationship exists between them then aggregation up to a monthly level will be driven by that. Even if one week starts say in Jan and ends in Feb then if the monthly attribute for that row is Jan then that week is rolled up into Jan. I have, however, suspected that that might be a naiive assumption ... If a week falls near a month boundary such that one year it is in one month and another year it is another does this hold true?

This relationship is going to cause all sorts of strange issues in your data. What you have essentially created is a many-to-many relationship, which you cannot have within a dimension (you can create these between a dimension and a fact table). This is probably easiest to explain with an example.

Take Sep 1st 2007, If you got through the Year-Month-Day hierarchy then it belongs to the Sept 2007 month. If you go through the Year - Month - Week - Day hierarhcy, then it belongs to the Aug 2007 month. A given member cannot have 2 different parents. When the dimension is built only one of these will be used (you cannot really say which one is used, it depends a bit on the processing order). This means that one of your hierarchies will be under reported.

You could have a Year-Month-Day and Year-Week-Day hierarchy (but you would need to do some manupulations to make sure that a week does not belong to 2 years.)

The key is, that if you assign attribute relationships, then you are saying that there is a many-to-one relationship. One or more of the attribute members are related to only one of the related attribute members.

You could try building relationships Y-M-D and W-D and then setting up the Y-M-D and Y-M-W-D hierarchies. This would make the Y-M-W-D an unnatural hierarchy and some weeks, days would appear to be double counted as they would appear more than once. If this sort of inconsistancy is not aceptable, you would need to adjust the week or month attributes so that they "fit" together.

|||

So it is the creation of a M-M relationship between month and week attributes [arising out of the boundary conditions in different years] that is the issue here?

BTW My time dimension only goes down to the grain of week, but I do have two heirarchies: Year-Quarter-Month-Week and Year-Month-Week. There is no need to define down to a granularity of a day.

Presumably the relationship between quarter and month further influences things. As you say, things need to be arranged such that a week only appears under one quarter. However, as months are always children of the same quarters year on year we essentially have only one problem to solve and that is how to ensure that the attribute for the week is always assigned to the same month in each year. Have I got that right?

|||

Some additional info.

Looking at the dimension table we see that week number 13 is variously in March or April, and is therefore either in the first or second quarter of the year [2001 it's in April, 2004 it's March].

Unfortunately I cannot change this - that is, I cannot move to an alternative heirarchy in which week 13 is always in April, for example - since the fact tables are built from source tables in which this relationship is mandated/assumed.

What I need is a solution that supports alternative parents...

|||

Further info [again!]

I should have mentioned earlier that from what I can see the roll-ups are actually OK when the grain of the fact table goes down to the week level.

The only issue that I have is fopr the case where the fact table is joined at the garin of calendar month. When this occurs the rollups at the annual and quarterly level are correct but the drilldown into month is wrong - I always see data for each of the months even when only one of the months has an entry in the fact table. In additon it is possible to drilldown through the month tho the weeks, something I specifically want to stop by setting the grain to the month level.

The issue would appear to be related to the week level since in my test cube, in which the time dimension is limited to year, quarter and month with no week attributes then I can limit the grain halfway down the heirarchy.

Can't help thinking that someone must have solved this problem as it must be a common situation...!

|||

Well, progress, of sorts, but still not a total solution.

I traced the rollup problem in the months to a redundant relationship in the hierarchy. So I now have what I need in terms of aggregation down to week or down to month as required by specific fact tables.

The only problem that remains is that I can still see the week level in the drilldown limit to the month granularity. Whereas in AdventureWorks Sales Amount Quota can only be viewed at the quarters level of the Calendar hierarchy, and the months are not displayed, my cube shows the weeks, containing the aggregate value for the parent level. You have alluded to issues regarding the parent child relationship between a week and the month to which it belongs and the fact that certain weeks can 'belong' to different months year on year. Would that lead to weeks appearing in the drilldown when they should not? I would have expected that any level below the stated grain would simply be ignored.

Just to recap here is my time dimension as it stands [note there is a need for some cleanup on some of these names!].

Calendar Month [Key: CalendarYear, MonthNumber]

> Calendar Quarter

Calendar Quarter [Key: CalendarYear, CalendarQuarter]

> Calendar Year

Calendar Week [Key: CalendarYear, WeekNumber]

Calendar Year [Key: CalendarYear]

Month Name [Key: CalendarYear, MonthNumber]

> Calendar Quarter

> Month of Year

Month of Year [Key: MonthNumber]

Time [Key: TimeKey]

> Month Number of Year

> Calendar Quarter

> Week Number of Year

> Calendar Year

> Month Name

> Month of Year

Hierarchies:

Year (Calendar Year)

> Quarter (Calendar Quarter)

> > Month (Calendar Month)

> > > Week (Caelndar Week)

and

Year (Calendar Year)

> Month (Calendar Month)

> > Week (Caelndar Week)

|||

Ever started something that you wished you hadnt?!

I can't for the life of me work out why I can't limit the drilldown to the level defined by the granularity attribute; despite the grain being set at calendar month I still see the weeks that make up each month.

In case there was an issue with week to month rollups etc I thought I would go back to first principals. Or rather, I thought I would try to build a simple example based pretty much on something I knew worked. That is, the AdventureWorks sample and the Sales Amount Quota which is limited to Calendar Quarters. [Note that looking at the raw data in the AdventureWorks DW database I'm not convinced that this is an issue: the data looks pretty much like my own with respect to calendar month and week numbers]

I set up a simple three table schema - dimTime, dimJob and factJobCosts.

DimTim was set up by a DTS copy of dimTime from AdventureWorks DW and then stripped down slightly to remove the Spanish and French names and the Fiscal columns.I also removed the semester data as I wanted to model the hierarchy year->quarter since that is in my real database. Either of these may have been a mistake <g> but ...

DimJob is a simple table - basically jobKey, jobReference; factTable was TimeKey, JobKey, CalendarKey, CalendarQuarter, Cost and was populated with a single row mapped to the first timeKey entry in dimTime - ie 1/7/2001 or Q3 2001; cost was 5.0 for job 123.

I created a new cube based on this schema creating time and job dimensions and a fact table based on a view on factJobCosts in which the fact table was related to the time dimension through CalendarYear/Quarter as per FactSalesQuota in AW.

As far as possible, given the absence of semester and fiscal attributes I set up the time attribute as per AW.

The dimension usage was set up to use a granularity attribute of Calendar Month for Time -> JobCosts

Browsing the deployed I would expect to see

A123

CY 2001 -> Q3 CY 2001 -> Total 5.0

But what I see is

A123

CY 2001 -> Q3 CY 2001 -> July 2001 5.0

CY 2001 -> Q3 CY 2001 -> Aug 2001 5.0

CY 2001 -> Q3 CY 2001 -> Sept 2001 5.0

So which bit of the process have I missed out?

|||

Cracked it!

I went into Dimension Usage, examined the properties for the measure group and switched the IgnoreUnrelatedDimensions property from True to False and et voila. Drill down was limited to the monthly level.

A quick search on this turned up the following link which had a similar issue: http://hccmsbi.blogspot.com/2007/07/ignoreunrelateddimensions.html

Now back to the 'production' environment to see if this does the trick there.

No comments:

Post a Comment