Calculating Quantiles in MDX

16.06.2008 Hilmar Buchta

SQL Server 2005

Microsoft MDX offers a huge set of mathematical function, which can be further enhanced by using Excel-functions. Excel-functions may be called as Excel!functionname but the performance for this approach is not really good. Besides that, not all of your every-day functions are available through MDX. Before using Excel-functions or writing your own SSAS stored procedures it’s worth taking a look at the built-in capabilities of MDX.

In this post we’re talking about the calulcation of quantiles. Quantiles are used e.g. for computing risk measures like the so called Value-At-Risk (VaR). The Value-At-Risk is usually a 1% or 5% quantile of the P&L distribution over a set of scenarios (typically created by a monte carlo simulation).

In the following MDX-sample we calculate a simple quantile over a set of days (listed in a dimension „Date“). In order to compute the p% quantile of a set of n values we first have to sort the values and take the element at position (n-1)*p%+1. Usually we will not end up with an integer here but we don’t want to complicate things here too much.

So let’s try this code in the Adventure Works cube script:

  2. max(bottomcount([Date].[Date].[Date],
  3. int((Count([Date].[Date].[Date])1)*0.05)+1,
  4. [Measures].[Gross Profit]),[Measures].[Gross Profit]),
  5. NON_EMPTY_BEHAVIOR = { [Measures].[Sales Amount], [Measures].[Total Product Cost] },
  6. VISIBLE = 1 ;

We use a bottomcount here to get the smallest p% elements. By using the max-function we can easily get the biggest of those values (which is actually the last if we had sorted the complete set). The expression int(Count([Date].[Date].[Date]) gives the count of days in our date dimension. Be aware that if you have the unknown member enabled for the dimension this element is also counted so you should disable the unknown member or reduce the count by 2 instead of 1.

For Adventure Works I found 1158 days in the Date dimension. Calculating the quantile for single elements or sets of elements executes pretty fast. Of course, query performance depends on the value of p%. For calculating a 90% quantile with the above formula the same query takes longer than a 5% quantile. So it’s better to provide a low p% and switch to topcount instead of sorting too many rows.

The simple test query I used looked like this:

  1. select {[MEASURES].[Quantil5Prc]} on 0
  2. from [Adventure Works]

The query results in the value of 986.6533$ meaning that only in 5% of all days we have a gross profit below this value. Of course, you could put [Product].[Category].[Category] for example on the rows to get a more detailed result.

For further enhancements think of putting the p% value in a separate dimension with given numbers and a reasonable default member. Then you could easily switch the percentage of the quantile or compare different quantiles in one pivot table.

Your email address will not be published. Required fields are marked *

Join #teamoraylispeople

Gestalte mit uns
die Welt der Daten