Do you trust your Data Mining results? – Part 3/3
SQL Server 2005 | SQL Server 2008 | SQL Server 2008 R2
This is the third and last post of my Data Mining back testing mini series. For my first posts I made the assumption that no action is taken based on the prediction. Although the mining process marked some of our contracts with a high probability for being cancelled, we did nothing to prevent our customers from doing so.
Of course, this is not the real life scenario. Usually our goal is churn prevention (or some other action based on the mining results). However, any action we take will (hopefully) change the behavior of our customers. Or to be more precise, we expect that less customers are really cancelling their contract. This is always a challenge for our mining process in total as the prevention may “dry out” our mining source data. After a while we might not have any non-influenced customers left with a high probability for cancelling the contract. A test group would help but in many scenarios this is not wanted (because the customers are just too important to risk loosing them just for the sake of some statistical validation). Another method would be to leave the influenced customers out of the training set. This doesn’t disturb our prediction but do we still know our prevention is successful enough?
For our back testing process, we can also add the expected prevention rate to our Monte Carlo procedure. In this case we’re not only trying to validate the model but also the prevention rate. On the other hand, if our test fails we’re not sure if it’s the model or our assumed prevention rate that is actually wrong.
For the following I assume that every customer with a churn probability greater then 35% gets prevention. From the past we know that 90% of our preventions are successful. Here are the results:
Old model (no prevention included):
New model (including prevention):
As expected the curves have shifted to the left. In the new model (including the prevention) the number of customers cancelling the contract is significantly lower than in the old model.
We can still use the same methods as shown in the first post to derive the threshold value for our test. This is the curve for the alpha and beta error (again: for beta for choose an alternative model that goes off by 3%).
Again, here are some values for the threshold T, alpha and beta:
T |
Alpha |
Beta |
1580 |
88.7 |
0.0 |
1600 |
75.6 |
0.1 |
1620 |
56.7 |
0.4 |
1640 |
36.5 |
1.6 |
1660 |
19.5 |
5.1 |
1680 |
8.4 |
13.3 |
1700 |
3.0 |
27.0 |
1720 |
0.9 |
46.0 |
1740 |
0.2 |
65.7 |
For example with T=1680 we get alpha<10% and beta<14%. But as being said above, if our model fails the test we’re now not sure anymore if it is the model or the prevention rate. Again, the best way to avoid this uncertainty would be to use test groups. For example, we could exclude a random sample of 10% of all contracts with a churn score above 35% from prevention. Then we’re able to compare the behavior of the random sample (test group) with the customers that received prevention benefits. This test group enables us to measure the effectiveness of our prevention activity. And of course you could set up a statistical test in the same way we described here, to proof that the test group really supports a certain success rate.
To wrap things up, depending on the products, processes, the way new customers are acquired etc., the real-life data mining problems may be much more difficult than presented in this small blog series. Anyway, the methods described here, like the Monte Carlo algorithm or the statistical hypothesis test, are the main building blocks for achieving a reliable understanding how well the model performs. So, depending on the actual scenario, the above mentioned methods may be combined, adjusted and repeated to get the desired results.
This also ends this mini series about back testing. The basic methods are more or less ubiquitous and may also be used for several similar problems, for example to do the back testing for a financial risk model. By adjusting and combining the methods, a specific business requirement may be modeled into a reliable test. And as long as our model conforms to the test, we can have enough trust in our data mining models to use them as the basis for business decisions.