Sunday, March 25, 2012

Dimension values without data in a fact table

I have an ODS system and a Data warehouse system
I have a Sales fact table in the ODS system and I have these fields:
SALES
ID_CUSTOMER (PK),
ID_MODEL (PK),
ID_TIME (PK),
SALES,
QUANT_ART,
COST
Then in some records in the fields ID_Time or ID_Model or ID_Customer I
don’t have values (NULL) because in the transactional systems these record
don’t have values (NULL).
The users want to generate aggregate reports with the Sales table...
The question is:
I have to put a “dummy” value in the dimensions Customer, Model and Time
(for example “0”) and put this value in the fact table if the dimensions
fields have NULL values'
Or I have to leave the NULL values?
What is the best choice? Why?Personally -- and this really does boil down to personal preference -- I
believe that one of the key things that should happen during an ETL is
elimination of all "questionable" data. That includes unknown data -- can
you really report on something that's unknown? At the least, generate
well-known tokens to replace the NULLs with. If possible, get rid of those
rows on the way in (of course, that really depends on context) -- perhaps
they can appear in the aggregate data, but not in the line-level data?
There are various ways of dealing with the problem, but personally I am very
firm when designing data warehouses and ensure that, one way or another,
there will be absolutely no NULLs in the database.
Adam Machanic
SQL Server MVP
http://www.sqljunkies.com/weblog/amachanic
--
"CLAUDIO" <CLAUDIO@.discussions.microsoft.com> wrote in message
news:937E414E-AB9A-4E4D-8A58-D8251139550E@.microsoft.com...
> I have an ODS system and a Data warehouse system
> I have a Sales fact table in the ODS system and I have these fields:
> SALES
> ID_CUSTOMER (PK),
> ID_MODEL (PK),
> ID_TIME (PK),
> SALES,
> QUANT_ART,
> COST
>
> Then in some records in the fields ID_Time or ID_Model or ID_Customer I
> don't have values (NULL) because in the transactional systems these record
> don't have values (NULL).
> The users want to generate aggregate reports with the Sales table...
> The question is:
> I have to put a "dummy" value in the dimensions Customer, Model and Time
> (for example "0") and put this value in the fact table if the dimensions
> fields have NULL values'
> Or I have to leave the NULL values?
> What is the best choice? Why?
>

No comments:

Post a Comment