18.05.2016

Monte Carlo Simulation with DAX and Power Query

Technical Value

In this blog post I show you how to do a simple Monte Carlo Simulation by using DAX or Power Query. I will compare both approaches and evalute the pros and cons. Last month a colleague of mine wrote about the amazing possibilities of the R integration into SQL Server 2016. In his blog post he showed how easy it is to do a Monte Carlo Simulation with R in SQL Server 2016. The demo case shows a calculation of the expected value of leads from our CRM. I don't know why, but i started thinking about the possibility of doing a Monte Carlo Simulation in Power BI only with DAX or Power Query, so without R. There is nothing against R and my colleague's simulation is easy to transfer to other uses cases, but sometimes things arouse my curiosity even if I know that there are smarter ways.

Monte Carlo Simulation
Monte Carlo Simulation is a random sampling methode. It can be used to find probability distributions. Sometime you get confronted with problems where it is hard to evaluate the distribution, because their are so many variable factors or you don't know how different components interact. The idea of a Monte Carlo Simulation is to try out many different scenarios or variable configurations and see how the output of your model is behaving. I will stick to the use case of my colleague Hilmar in his article.

The setup
Let's say we want to evaluate the range of our sales amount from new customer leads. For that we build a little demo table. I think the table is very straight forward to keep thinks simple: all customers generate the same revenue, but the probability of getting the Job differs. To make things a bit more interesting I added the scenario column. Let's say our company needs to make a decision that will influence the probability of getting new projects (scenario=1 and scenario=2). Every customer has a different probability for each of the two scenarios. Which one is the better scenario? To evaluate this we do a Monte Carlo simulation.

Create a random sample with M

At the beginning we need random samples. To create this random smaple we can use Power Query to create a simple list of x elements. Every element is one Monte Carlo simulation run. By cross joining our iteration list with the shown customer table we get a table that has x entries for every row in our customer table. For example our Customer table has 20 rows and the Monte Carlo Simulation should do 100 iterations. We end up with 20*100 = 2000 rows.

1. let     Quelle = List.Generate(()=>1, each _ < #"Number of Simulation Runs", each _ + 1),
2. CreateTable = Table.FromList(Quelle, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
3. RenameCol = Table.RenameColumns(CreateTable,{{"Column1", "Iteration"}}),
4. Result = Table.AddColumn(RenameCol, "temp", each Customer),
5. ResultExpand = Table.ExpandTableColumn(Result, "temp", {"Szenario", "Customer", "Revenue", "Propability"}, {"Szenario", "Customer", "Revenue", "Propability"})
6. in
7. ResultExpand

Now we can add a new column which will generate a random number between 0 and 1. This is the random draw of the Monte Carlo Simulation. Now we add one more column ([Result]) with a simple if condition if [Random]<[Probability] then 1 else 0.  [Result] shows if we got the job (1) or not (0).

Now we keep only the rows where [Result] is 1. After that we simply group by [Szenario] and [Iteration] and  aggregate [Revenue] by sum operation.

In the end we have a table which shows a [Revenue] value for every iteration and scenario.

Complete M pattern:

1. let
2.  Quelle = List.Generate(()=>1, each _ < #"Number of Simulation Runs", each _ + 1),
3.  CreateTable = Table.FromList(Quelle, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
4.  RenameCol = Table.RenameColumns(CreateTable,{{"Column1", "Iteration"}}),
5.  Result = Table.AddColumn(RenameCol, "temp", each Customer),
6.  #"Erweiterte temp1" = Table.ExpandTableColumn(Result, "temp", {"Szenario", "Customer", "Revenue", "Propability"}, {"Szenario", "Customer", "Revenue", "Propability"}),
7.  #"Hinzugefügte benutzerdefinierte Spalte" = Table.AddColumn(#"Erweiterte temp1", "Random", each Number.Random()),
8.  #"Hinzugefügte benutzerdefinierte Spalte1" = Table.AddColumn(#"Hinzugefügte benutzerdefinierte Spalte", "Result", each if [Propability] > [Random] then "1" else "0" ),
9.  #"Gefilterte Zeilen" = Table.SelectRows(#"Hinzugefügte benutzerdefinierte Spalte1", each ([Result] = "1")),
10.  #"Gruppierte Zeilen" = Table.Group(#"Gefilterte Zeilen", {"Szenario", "Iteration"}, {{"Kum Umfang", each List.Sum([Revenue]), type number}}),
11.  #"Umbenannte Spalten" = Table.RenameColumns(#"Gruppierte Zeilen",{{"Szenario", "Scenario"}, {"Kum Umfang", "Agg Revenue"}})
12.  in
13.  #"Umbenannte Spalten"

Monte Carlo DAX driven
It is also possible to do the random sample drawing in DAX. For that you have to use SSAS Tabular 2016 or Power BI Desktop, the following steps are not possible with the Excel Power BI implementation. The problem is that Excel Power BI does not support Calculated Tables via DAX. First you have to add a table to the datamodel which contains a list of numbers from 1 to x (iterations). Actually you can't create a list or set in DAX itself, so you have to use Power Query to create a dummy list with our iterations.

1. let     Quelle = List.Generate(()=>1, each _ < #"Number of Simulation Runs", each _ + 1),     #"In Tabelle konvertiert" = Table.FromList(Quelle, Splitter.SplitByNothing(), null, null, ExtraValues.Error),     #"Umbenannte Spalten" = Table.RenameColumns(#"In Tabelle konvertiert",{{"Column1", "Iteration"}}) in     #"Umbenannte Spalten"

When you have loaded the dummy table and the customer table into the data model, you can use DAX to do the same steps we have done before, but here it is a bit more compact.

1. MonteCarloResultDax =
2. SUMMARIZE(
3.   FILTER(
4.          CROSSJOIN(Customer;DummyTable);RAND()<Customer[Propability]);
5.              Customer[Scenario];
6.              DummyTable[Iteration];
7.              "Summe Umfang";SUM(Customer[Revenue])
8. )

The best thing of all this stuff is that now you can use Power BI to do some nice visuals for your result set. In my example you can see the distribution of both scenarios, the divergence between  scenario one and  two and the winner! I also expanded it a bit so that you can filter by Customer etc. Click here to explore my model .

Performance DAX vs Power Query
Both approaches work and can be used, but which one is faster? In the end the results are the same, both approaches only differ in the engine that does the computation(Power Query or VertiPaq-Engine).  To evaluate the performance I have done a bigger simulation by using 1.000.000 Monte Carlo and stopped the time. The result is clear!

• Power Query needed 7 Minutes and 30 Seconds for creating the list, doing a cross join, drawing random samples, grouping everything by scenario and iteration and loading it to the data model.
• DAX needed 31 Seconds for creating the dummy list, loading the dummy list to the data model, doing a cross join, drawing random samples, summarize by scenario and iteration and create the calculated table.

To be honest, I thought it would be a draw or Power Query would be slightly faster, but actually the VertiPaq-Engine of DAX seems to be a real beast. I'm very impressed. I hope you enjoyed reading this blog post.

Update 1 :

Monte Carlo Simulation Power BI

Thank you for your positive feedback!

Kommentare

Do, 12.07.2018 - 14:58

Hi there,
The Dax version in MontecarloResultDAX table error
Part = [NumberOfRows]/CALCULATE([NumberOfRows],ALL(MonteCarloResultDax[Kum Umfang]))

Thanks

Fr, 20.05.2016 - 08:55

Hi Marco,
actually i don't have R Benchmarks, but I will create some benchmarks and hand them in here!

Fr, 20.05.2016 - 08:56

HI Min,

I added the pbix files (Update 1). Some columns names will be in german, but i hope thats fine.

Lukas

Mi, 18.05.2016 - 12:54

Very impressive work. I'm glad you did the comparison between Dax and Power Query. I like you, would have expected Power Query to be faster. Dax and Vertipaq are really powerful.

Mi, 18.05.2016 - 14:06

Do you have a corresponding benchmark using R for the same date on the same machine? It would be really interesting.
Thanks and good post!

Marco

Do, 19.05.2016 - 07:21

great article! Could you share the pbix for this experiment?
Thanks!
Min

Teilen auf