16.06.2008

# Calculating Quantiles in MDX

Technical Value

SQL Server 2005

Microsoft MDX offers a huge set of mathematical function, which can be further enhanced by using Excel-functions. Excel-functions may be called as Excel!functionname but the performance for this approach is not really good. Besides that, not all of your every-day functions are available through MDX. Before using Excel-functions or writing your own SSAS stored procedures it's worth taking a look at the built-in capabilities of MDX.

In this post we're talking about the calulcation of quantiles. Quantiles are used e.g. for computing risk measures like the so called Value-At-Risk (VaR). The Value-At-Risk is usually a 1% or 5% quantile of the P&L distribution over a set of scenarios (typically created by a monte carlo simulation).

In the following MDX-sample we calculate a simple quantile over a set of days (listed in a dimension "Date"). In order to compute the p% quantile of a set of n values we first have to sort the values and take the element at position (n-1)*p%+1. Usually we will not end up with an integer here but we don't want to complicate things here too much.

So let's try this code in the Adventure Works cube script:

1. CREATE MEMBER CURRENTCUBE.[MEASURES].[Quantil5Prc] AS
2. max(bottomcount([Date].[Date].[Date],
3. int((Count([Date].[Date].[Date])-1)*0.05)+1,
4. [Measures].[Gross Profit]),[Measures].[Gross Profit]),
5. NON_EMPTY_BEHAVIOR = { [Measures].[Sales Amount], [Measures].[Total Product Cost] },
6. VISIBLE = 1 ;

We use a bottomcount here to get the smallest p% elements. By using the max-function we can easily get the biggest of those values (which is actually the last if we had sorted the complete set). The expression int(Count([Date].[Date].[Date]) gives the count of days in our date dimension. Be aware that if you have the unknown member enabled for the dimension this element is also counted so you should disable the unknown member or reduce the count by 2 instead of 1.

For Adventure Works I found 1158 days in the Date dimension. Calculating the quantile for single elements or sets of elements executes pretty fast. Of course, query performance depends on the value of p%. For calculating a 90% quantile with the above formula the same query takes longer than a 5% quantile. So it's better to provide a low p% and switch to topcount instead of sorting too many rows.

The simple test query I used looked like this:

1. select {[MEASURES].[Quantil5Prc]} on 0
2. from [Adventure Works]

The query results in the value of 986.6533\$ meaning that only in 5% of all days we have a gross profit below this value. Of course, you could put [Product].[Category].[Category] for example on the rows to get a more detailed result.

For further enhancements think of putting the p% value in a separate dimension with given numbers and a reasonable default member. Then you could easily switch the percentage of the quantile or compare different quantiles in one pivot table.

## Kommentare

Do, 01.07.2010 - 16:16

Is there any way to do subsetting? like 90 260 day version, if your date dimension has a larger population?

Or LSTD? (lower semi std) - The standard deviation of all values below zero?

Do, 01.07.2010 - 16:27

Michael, subsetting can be done by using a set in MDX (instead of the full day level). A set in MDX can also be defined as a range (from/to date). For compatibility with Excel I recommend using dynamic sets, as described here:
http://ms-olap.blogspot.com/2010/02/solution-von-ssas-2008-multi-select…

STD is a standard function in MDX (Stdev/StdevP). You can find the full reference here:
http://msdn.microsoft.com/en-us/library/ms145970.aspx

If you need more flexibility in doing your calculations, you could also use a stored procedure. I've shown this approach here:

http://ms-olap.blogspot.com/2008/07/some-more-thoughts-on-quantiles.html

Hope this helps,
Hilmar

Di, 13.07.2010 - 17:16

Thanks, this was infinitely helpful!

Last major question:

Do you have any suggestions for using MDX for Marginal Value at Risk or Incremental Value at Risk?

Marginal defintion would be the change in Value at Risk if the child (current member) was removed.

Ex.
North America (Parent)
Total Var = 100

US (Child 1)
Total = 90
Marginal = 50
(Change in North America's Total if US's contribution was Removed)

Total = 40
Marginal = 20
(Change in North America's Total if Canada's contribution was Removed)

Mexico (Child 3)
Total = 25
Marginal = 10
(Change in North America's Total if Mexico's contribution was Removed)

Incremental = Essentially very similar, but instead of the child (being removed) - if the child was 1% more what would be the affect on the total Value at Risk for the parent.

Thanks so much for your insight!!!

Michael

Do, 15.07.2010 - 17:55

Hi Michael, Sorry for the late approval of your comment. We had some issues with spam which is now solved thanks to a new gotcha check on the comment page.
For the marginal VAR you could use the MDX functions parent (to get to the parent category) and currentmember to refer to the current element. As my example is about dates this could look somewhat like

descendants(Date.Date.parent, Date.Date)-Date.Date.currentmember

Of course,the actual implementation depends on your model. The minus is the set operator to remove the member that you're actually placed on.

For the incremental VAR you could replace the Measure in the bottomcount function using a condition (if it is the current member than take 1.01 of the value else take only the value).
However, it might be easier to read if implemented in an SSAS stored procedure. In this case you're working on array and removing or rescaling an element is well supported there, so I recommend implementing the VAR calculation as well as other distribution functions like marginal VAR, conditional VAR, incremental VAR as stored procedures.
You can find a sample implementation for such a function (only VAR) in my blog post here:
http://blog.oraylis.de/2008/07/some-more-thoughts-on-quantiles/

Hope this helps.
Hilmar

Teilen auf