Dealing with Hierarchies Part 2: unknown members in an unified hierarchy dimension
When unifying master data und hierarchy structure you have to take a minute to think about your unknown members and how you distribute your surrogate key.
I believe it’s useful to use a single surrogate key stabilization table for both entities of your source system. By this you avoid the risk of overlapping surrogate keys. An alternative would be for example to use positve numbers for your hierarchy and negative numbers for master data and keep a range empty for your unknown members. I prefer the stabilization table because it’s helpful for known reasons.
In regard to your unknown members of your dimension i use the following concept:
|-1||no leaf information||for measures which match to leaves and have an empty business key|
|-2||unknown leaf information||for measures which match to leaves and have an unknown business key|
|-3||no node information||for measures which match to nodes and have an empty business key|
|-4||unknown leaf information||for measures which match to nodes and have an unknown business key|
Now we need some unknown hierarchy nodes:
|-5||no hierarchy entry||parent node for records outside hierarchy|
|-10||unknown||top node for unknown hierarchy|
All master data records which are not included as a leaf of the hierarchy are included by giving them the parent -5.
The unknown members -1 to -5 all get the parent -10 as the top-level unknown member.
By this you provide a way to analyse why data relates to the unknown hierarchy.
In addition you guarantee that if a leaf is added to your hierarchy that all measures find their way to their nodes, even if your ETL uses delta mechanisms.
With the help of the -2 and -4 unknown members you have an identifier for all measures which need further processing (early arriving facts).
If your hierarchy changes over time and you use scd1 you have to decide what you do with hierarchy nodes which are not longer included in your structure. In general all leaves which formerly used this node as a parent will have new parents are should get the parent -5. But if you have measures directly related to nodes I would recommend to keep these nodes with their old parent.
I’m looking forward to any comments and ideas.