BLOG

Moving Average in DAX / BISM Tabular

21.08.2011 Hilmar Buchta

SQL Server Denali |  PowerPivot

Alberto Ferrari already wrote about calculating moving averages in DAX by using a calculated column . I’d like to present a different approach here by using a calculated measure. For the moving average I’m calculating a daily moving average (over the last 30 days) here.

For my example, I’m using the PowerPivot workbook which can be downloaded as part of the SSAS Tabular Model Projects from the Denali CTP 3 samples .

In this post, I’m developing the formula step by step. However, if you are in a hurry, you might directly want to jump to the final results below.

With calendar year 2003 on the filter, date on columns and sales amount (from table Internet Sales) in the details, the sample data looks like this:

t1

In each row’s context, the expression ‚Date'[Date] gives the current context, i.e. the date for this row. But from a calculated measure we cannot refer to this expression (as there is no current row for the Date table), instead we have to use an expression like LastDate(‚Date'[Date]).

So, in order to get the last thirty days we can use this expression

  1. DatesInPeriod(‚Date‘&SQUARE_BRACKETS_OPEN;Date],LastDate(‚Date‘[Date]),-30,DAY)

We can now summarize our internet sales for each of those days by using the summarize function:

  1. Summarize(
  2.   DatesInPeriod(‚Date‘&SQUARE_BRACKETS_OPEN;Date],LastDate(‚Date‘[Date]),-30,DAY)
  3.   ,‚Date‘[Date]
  4.   , „SalesAmountSum“
  5.   , Sum(‚Internet Sales‘[Sales Amount])
  6. )

And finally, we’re using the DAX function AverageX to compute the average of those 30 values:

  1. Sales Amount (30d avg):=AverageX(
  2.   Summarize(
  3.     DatesInPeriod(‚Date'[Date],LastDate(‚Date'[Date]),-30,DAY)
  4.     ,’Date'[Date]
  5.     , „SalesAmountSum“
  6.     , Sum(‚Internet Sales'[Sales Amount])
  7.   )
  8.   ,[SalesAmountSum]
  9. )

This is the calculation that we are using in our Internet Sales table as shown in the screenshot below:

t2

When adding this calculation to the pivot table from above, the result looks like this:

image

Looking at the result it seems that we don’t have any data prior to January 1, 2003: The first value for the moving average is identical to the day value (there are no rows before that date). The second value for the moving average is actually the average of the first two days and so on. This is not quite correct but I’m getting back to this problem in a second. The screenshot shows the computation for the moving average of at January 31 as the average of the daily values from January 2 to 31.

Our calculated measure also works fine when filters are applied. In the following screenshot I used two product categories for the data series:

t3

How does our calculated measure work on higher aggregation levels? In order to find out, I’m using the Calendar hierarchy on the rows (instead of the date). For simplicity I removed the semester and quarter levels using Excel’s pivot table options (Show/Hide fields option).

t4

As you can see, the calculation still works fine. Here, the monthly aggregate is the moving average for the last day of the specific month. You can see this clearly for January (value of 14,215.01 also appears in the screenshot above as the value for January 31). If this was the business requirement (which sounds reasonable for a daily average), then the aggregation works fine on a monthly level (otherwise we will have to fine tune our calculation and this will be a topic of am upcoming post).

But although the aggregation makes sense on a monthly level, if we expand this view to the day level you’ll see that our calculated measure simply returns the sales amount for that day, not the average of the last 30 days anymore:

t5

How can this be. The problem results from the context in which we calculate our sum, as highlighted in the following code:

  1. Sales Amount (30d avg):=AverageX(
  2.   Summarize(
  3.     datesinperiod(‚Date‘&SQUARE_BRACKETS_OPEN;Date],LastDate(‚Date‘[Date]),-30,DAY)
  4.     ,‚Date‘[Date]
  5.     , „SalesAmountSum“
  6.     , Sum(‚Internet Sales‘[Sales Amount])
  7.   )
  1. ,[SalesAmountSum]
  2. )

Since we evaluate this expression over the given dates period, the only context that is overwritten here, is ‚Date'[Date]. In our hierarchy we’re using different attributes from our dimension (Calendar Year, Month and Day Of Month). As this context is still present, the calculation is also filtered by those attributes. And this explains why we the current day’s context is still present for each line. To get things clear, as long as we evaluate this expression outside of a date context, everything is fine as the following DAX query shows when being executed by Management Studio on the Internet Sales perspective of our model (using the tabular database with the same data):

  1. evaluate (
  2.     Summarize(
  3.         datesinperiod(‚Date‘[Date],date(2003,1,1),-5,DAY)
  4.         ,‚Date‘[Date]
  5.         , „SalesAmountSum“
  6.         , Sum(‚Internet Sales‘[Sales Amount])
  7.     )
  8. )

Here, I reduced the time period to 5 days and also set a fixed date as LastDate(…) would result in the last date of my date dimension table for which no data is present in the sample data. Here is the result from the query:

t6

However, after setting a filter to 2003, no data rows outside of 2003 will be included in the sum. This explains the remark above: It looked like we only have data starting from January 1, 2003. And now, we know why: The year 2003 was on the filter (as you can see in the very first screen shot of this post) and therefore it was present when calculating the sum. Now, all we have to do is to get rid of those additional filters because we’re already filtering our results by Date. The easiest way to do so, is to use the Calculate function and apply ALL(…) for all attributes for which we want to remove the filter. As we have some of those attributes (Year, Month, Day, Weekday, …) and we want to remove the filter from all of them but the date attribute, the shortcut function ALLEXCEPT is very useful here.

If you do have an MDX background you will wonder why we don’t get a similar problem when using SSAS in OLAP mode (BISM Multidimensional). The reason is that our OLAP database has attribute relationships, so after setting the date (key) attribute, the other attributes are automatically changed too and we don’t have to take care about this (see my post here ). But in the tabular model we don’t have attribute relationships (not even a true key attribute) and therefore we need to eliminate unwanted filters from our calculations.

So here we are with the …

Final results

  1. Sales Amount (30d avg):=
  1. AverageX(
  2.   Summarize(
  3.     datesinperiod(‚Date‘&SQUARE_BRACKETS_OPEN;Date],LastDate(‚Date‘[Date]),-30,DAY)
  4.     ,‚Date‘[Date]
  5.     , „SalesAmountSum“
  6.     , calculate(Sum(‚Internet Sales‘&SQUARE_BRACKETS_OPEN;Sales Amount]), ALLEXCEPT(‚Date‘,‚Date‘[Date]))
  7.   )
  8. ,[SalesAmountSum]
  9. )

And this is our final pivot table in Excel:

t7

To illustrate the moving average, here is the same extract of data in a chart view (Excel):

t8

Although we filtered our data on 2003 the moving average for the first 29 days of 2003 correctly takes the corresponding days of 2002 into account. You will recognize the values for January 30 and 31 from our first approach as these were the first days for which our first calculation had a sufficient amount of data (full 30 days).

Your email address will not be published. Required fields are marked *

Join #teamoraylispeople

Gestalte mit uns
die Welt der Daten