Sunday, March 25, 2012

Dimension Tables

Hi
Currently I'm stuck between the options
A - Using several(20) small 3-5 columns tables to store my slow changing
dimensions.
OR
B - Use 1 big table 5 columns and "!" blank columns not used and use an
identifier column to get the dimension name to identify the specific
dimension in order to use it.
Now maintenance on option A is an overhead. But reading would be fast and I
won't have locking issues at all. I'd save some space with the ability to us
e
int and even tiny int keys (dimension won't exceed 255). As apposed to 20*25
5
which would be max key in option B.
Maintenance for option B is easy since it's one table. (I can't go for a
hybrid option B plus views for option A (Due to report tool unable to use
views).
For option B there is a bit more overhead in the inital coding but it will
only take place once. Temp tables will be used to extract dimensions while
ETL takes place and update the master (B) once it's done.
There's obviously a lot more pro's and con's but I don't have something to
draw a clear line between the two options in terms of what works.
Option B is something that's common in Oracle from what I've seen and A I've
seen in SQL more often.
Anyone with a solid Yes/No suggestion and explanation ?
Thanks for the help.Hi Mal
If the number of tables (which you have said as 20) might increase then
go for a single table.
If the number of columns in a table (which you have said as 3-5) might
increase then go for multiple tables.
Make sure you don't end up changing the schema quite often.
Hope this helps.|||Mal .mullerjannie@.hotmail.com> < (removethis) writes:
> Currently I'm stuck between the options
> A - Using several(20) small 3-5 columns tables to store my slow changing
> dimensions.
> OR
> B - Use 1 big table 5 columns and "!" blank columns not used and use an
> identifier column to get the dimension name to identify the specific
> dimension in order to use it.
>...
> Option B is something that's common in Oracle from what I've seen and A
> I've seen in SQL more often.
> Anyone with a solid Yes/No suggestion and explanation ?
If you want an answer from the OLAP side of things,
microsoft.public.sqlserver.olap or .datawarehouse are better venues.
From a general data-modelling perspective, option A is much to prefer. The
idea in a relational database is that each table describes a unique entity.
Mixing apples and oranges in the same table is rarely a good idea.
It may seem that two entities are similar: all they have is an idea and a
name. But then one day there are new requirements, and one of the entities
needs another attribute that does apply to other entity.
There are also problem with maintaining foreign-key constraints when using
one big table. Table B requires a two-column key, where the first key
column identifies the entity. All referencing table then needs a constant
column, so that you define the FK properly. And that has to be real
column - a computed column won't do.
Again, this is from a general RDBMS perspective. I don't work with OLAP,
and don't know about dimension and fact tables.
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

No comments:

Post a Comment