BLOG

Parent-child dimensions, historical values (SCD) and keys

09.05.2009 Hilmar Buchta

SQL Server 2005 | SQL Server 2008

Usually working with historical values in an SSAS cube is not really complicated. Think of a customer whose ZIP code changed over time which is modeled as a slowly changing dimension of type 2:

ID Customer ZIP Current
1 John Doe 10000 false
2 John Doe 10200 false
3 John Doe 13041 true

If we don’t include the ZIP-code in the MDX query, we just see the sales. For example, let’s take a look at the following query:

  1. select {[Date].[Year].[Year]} on 0,
  2. {[Customer].[Customer].[John Doe]} on 1
  3. from [Sample Cube]
  4. where {[Measures].[Sales]}

The result of this statement would look somewhat like this:

Customer 2004 2005 2006
John Doe $3,400 $2,000 $1,999

The sales of the customer are related to the ID (fact to dimension relationship), so we can see the aggregated sales for John Doe in total and we can still filter sales per ZIP code over the time (let’s assume John Doe moved to a new house with a new ZIP twice).

But why don’t we see John Doe three times in this example? Well, if you don’t, it’s because you used the customer name field (not the id) as the key for the attribute Customer. When reading the dimension attributes from the dimension, SSAS sends a query that looks somewhat similar like this:

  1. select distinct Customer as Key, Customer as Name, from DimCustomer

So the key determines if two rows in the source table make the same dimension member or not. Generally speaking it’s good practice not to use the artificial surrogate key as the key for your dimension attribute, especially when you’re working with slowly changing dimensions.

But what about parent-child dimensions? In this situation, the primary key of your dimension table will automatically be the key for your dimension members as each node of the parent-child tree has its unique key. If you want to keep historical values in the parent child dimension (like changes in other attributes or the location within the tree) you’ll end up with different members sharing the same name but having different keys.

Ok, this sounds confusing, so let’s start with an example. I’m using Adventure Works DWH 2K8 as a source. When looking in the base dimension table we’ll find something like this:

image

Of course, this is not the same account (one is for assets, one for expenditures) but for our example let’s pretend that the two accounts with the name „Other Assets“ have been created because of a slowly dimension change, maybe the account with the id 24 has moved from the parent account 2 to the parent account 79. Again, this is not what was intended in the dimension table but for our example, the situation would be exactly the same.

Just like in our example with the customer John Doe above, when selecting „Other Assets“ we want to see all fact values that are associated to one of the two accounts. So we might try this MDX query:

  1. select
  2. [Measures].[Amount] on 0,
  3. [Account].[Accounts].[Other Assets] on 1
  4. from [Adventure Works]

The result looks like this:

image

Now, is this really the value for every amount that is associated with at least one of the two keys (24 and 84) above? Not really. In fact we just saw the values on one of the two rows. The reason is that because of the key column being different we really have two separate members in the dimension, both named „Other Assets“. Each member expression just resolves to one member (no matter if you’re using name or key or MDX functions like strtomember, nametoset etc.). So the expression [Account].[Accounts].[Other Assets] just resolves to one of the two dimension members and therefor the displayed value just reflects the value on that one member.

In order to show the difference, let’s rewrite the query using a filter expression:

  1. select {[Measures].[Amount]} on 0,
  2. filter (
  3.   [Account].[Accounts].members,[Account].[Accounts].currentmember.name=„Other Assets“
  4.   ) on 1
  5. from [Adventure Works]

Now, the result shows both accounts with the given name:

image

Here you can clearly see what was missing in our previous query.

Of course, we could as well aggregate both values (never do this with your real balance sheet though…) using the aggregate function:

  1. with member [Account].[Accounts].[Other MemberAgg] as
  2. Aggregate(
  3. filter (
  4.   [Account].[Accounts].members,[Account].[Accounts].currentmember.name=„Other Assets“
  5.   )
  6. )
  7. select {[Measures].[Amount]} on 0,
  8. [Account].[Accounts].[Other MemberAgg]
  9. on 1
  10. from [Adventure Works]

This kind of problem applies to several situations. Just think of a situation where you want to remove the account „Other Assets“ from your query using the set minus operation:

  1. select {[Measures].[Amount]} on 0,
  2. [Account].[Accounts].members [Account].[Accounts].[Other Assets]
  3. on 1
  4. from [Adventure Works]

Without thinking about different keys you might have been surprised to still find „Other Assets“ in the result of the query above:

image

As expected, we just eliminated one of the two accounts from the list by addressing it with the member expression above. In order to get the desired result we also have to use the filter expression:

  1. select {[Measures].[Amount]} on 0,
  2. [Account].[Accounts].members
  3. filter (
  4.   [Account].[Accounts].members,[Account].[Accounts].currentmember.name=„Other Assets“
  5.   )
  6. on 1
  7. from [Adventure Works]

image

In the next post we’ll continue thinking about those duplicate members and how to work with them efficiently.

Your email address will not be published. Required fields are marked *

Join #teamoraylispeople

Gestalte mit uns
die Welt der Daten