11.10.2009

# Role-playing time dimensions and semi-additive measures

Technical Value

SQL Server 2005 | SQL Server 2008

My last two posts have been about semi-additive measures, so this is part 3 of my semi-additive measures mini series. This post is about role-playing time dimensions together with semi-additive measures. The good news is, that this shouldn't happen too often. At least it took me some time to invent an example. However, since I couldn't use the Adventure Works samples database I had to create my own data.

This is my scenario: I measure the number of cars in certain car parks (dimension) over the time (dimension). I modeled the time as date and hour (one of my next posts will be about how to separate date and time to separate dimensions), so I'm getting hourly snapshots of my car parks. The model looks somewhat like this:

Since my car parks are located all over the world, I want to see the number of cars that were in the car park by local time and by GMT thus explaining the two links from the fact table to the date dimension. In my SSAS project I defined the DimDate dimension as a time dimension. So we have role playing time dimensions (roles are local time and GMT). The number of cars is our semi additive measure (using the average aggregation function for the time dimension).

Just to get familiar with the data I generated as my test data I first started with two separate cubes, one for the local time and one for GMT. In this step we don't have multiple time dimensions per cube so it's our normal and safe situation.

The result reflects the time shift, therefore in the GMT cube you can see the same values just shifted by 6 and 9 hours respectively. As each cube has just one time dimension the aggregation is just the way we would expect it to be. For example, the local time cube takes the average over the local time and sums up the values across the car parks.

Now, what happens, if we put both dimensions and our parking fact table into one cube? As we know, it depends on the question, which time dimension comes first. If the local date is the first time dimension my result looked like this:

The aggregation across the local time is correct (average) as expected while the aggregation across GMT looks strange. If you remember my post about the effects of semi-additive measures on other dimensions you will not be surprised. What you can see from the picture above is that

• The aggregation across the GMT is correct for each car park (it averages the details) although GMT is not treated as a time dimension
• The aggregation of the total across multiple car parks is wrong (it also averages the details).
• The grand total cannot be easily explained (neither the sum nor the average of one of the two axis).

How can that behavior be explained? Let's take a look at the details:

For this table I simply put the local time on the columns and the GMT on the rows. I also filtered for our two car parks we analyzed above. Now what you can see is that you get multiple local time values for each GMT hour. When looking at GMT 0:00 you will see 2239 cars in Denver and 2052 in Toronto but both at different local time (16:00 and 18:00). As our cube takes the local time is the time dimension, it averages the number of cars in both car parks, resulting in 2146 cars as (2239+2052) / 2 = 2146. This explains the reason why we see averages here while aggregating different car parks.

So, if we take a look again at the table above, this explains the averages:

The numbers 2239 and 2052 in the first line are resulting from different car parks but also from different local times and that's why we see the average as a total here.

This also explains the total of 1932 as it is the average across all sums of our GMT data (in the picture above it would be the average of the last line which is not completely visible).

Ok, some things to remark:

1. The behavior in our example above is exactly the same if the second time dimension is not a role playing one but a real dimension (own data source and own database dimension definition). Therefore it makes no difference here how you model your dimensions.
2. In order to get proper aggregates you could use cube scripts (but keep in mind that you can only override existing cube cells, so there are drawbacks with multi-selects for example using Microsoft Excel).
3. The aggregation problem would not exist if you would use separate measure groups for each of the role playing dimensions (ie. FactUtilizationLocal and FactUtilizationGMT) or separate cubes.
4. The above data has nothing to do with real car parks but it is randomly created (using a noised sine curve).

Teilen auf