Data Modeling Conundrum

Posted on January 16, 2011

4



A post on the Kimball forum found at www.rkimball.com has raised a question very similar to one I worked on five years ago when I implemented an Oracle OLTP system capturing Marine Incidents, and the associated data mart for reporting.

The question was this: How to model an incident when an incident may involve zero or many persons and may involve zero or many vessels. Standard dimensional practice says you cannot join one fact table to another fact table, and that fact tables can only be joined via dimensions. To resolve a many-to-many situation such as an incident having many vessels, a bridge table would be used, with a vessel group key in the incident fact table. However, each vessel will have a role e.g. instigator, victim, by-stander, as will a person – captain, owner, passenger, witness. Each vessel may have sustained various levels of damage in the incident, as each person may have sustained an injury. Each of these attributes is not an attribute of the vessel or person, but a relationship between the incident and the person i.e. they are not slowly changing dimensions.

For example, a person may have a boating licence, but it may not be valid for the type of vessel they were piloting. So, we come to the question of how to model this. We could for example have one incident over many rows, with each row containing a vessel and a person. E.g.

Option 1:

The obvious disadvantage with this method is that any query not requiring person or vessel would need to do a SELECT DISTINCT. I’d hate to have to use this.

So should I use a bridge table to keep in line with standard dimensional principles, storing attributes in the bridge table such as Option 2:

I’ve included the DimPersonGrp and DimVesselGrp dimensions to show the foreign key constraints. With these bridge tables storing the attributes as descriptions rather than as foreign keys, the width of these bridge tables would have to have a performance impact.

Perhaps I should just have three fact tables, and join from the Incident fact table to the IncidentVessel fact table on the IncidentID which is a degenerate dimension?

Option 3

Lastly, I could use the bridge table hybrid and store the additional attributes as foreign keys in the bridge tables.

Option 4

However the bridge tables now strongly resemble fact tables. The only difference between the fact table method and the hybrid bridge method is the use of the vessel and person group keys to join the fact table to the bridge table, whereas the fact tables would join on the degenerate dimension IncidentID.

At this point it comes close to just being a matter of semantics, although I have a feeling it would be slightly easier to implement the ETL on the fact table method rather than generating the vessel group and person group dimensions. Not a big deal either way though.

From a strictly dimension modeling perspective, the 2 and 4 are more correct. Like most things, there are rules, but the more advanced you get the more okay it is to break the rules because you are aware of the consequences.

I went with version 3 when I implemented this scenario.

What do you think?

See follow up to this post here: https://jsimonbi.wordpress.com/2011/03/17/data-modeling-conundrum-follow-up/

Please note that the follow up post explains why the bridge table options do not work.

Advertisements
Posted in: Data Modeling