Profit calculation for churn prevention data mining models (part 3 of 3)
SQL Server 2005  SQL Server 2008  SQL Server 2008R2
The last two posts were about cost optimization for a churn prevention campaign. We analyzed the following four options:
 Trivial option (only using average return rate and average customer value)
 Data Mining (using average customer value but individual return rate based on a mining model)
 Value driven approach (using average return rate but individual customer value)
 Combination of method 2 and 3 (using individual return rate and value)
As been said in the last post, the results may differ a lot from case to case. Especially the effectiveness of option 2 and 3 depend a lot on the available information for data mining (for option 2) and the variance of the customer value (for option 3). If both methods give some improvement, then the combination can be expected to be the best choice.
However, we always looked at a scenario where the trivial approach gave almost no insight. What I mean is, that for the trivial approach there is not much difference between giving every customer a voucher compared to giving no customer a voucher (costs between $12,500 and $13,750 in this example). In other words the line chart showing the costs for the trivial example was almost a horizontal line. Here is the corresponding chart from the first post (click on the image to see a larger version):
Now, let’s change our basic parameter a little bit. The following table shows the old and the new parameters:
Old scenario 
New scenario 

Avg. Customer Profit 
$25 
$50 
Avg. Churn Rate 
50% 
80% 
Voucher costs (prevention costs) 
$10 
$5 
Over all Churn Rate with prevention 
15% 
5% 
Here are the resulting chart (I’ve copied the chart for the old scenario from my last post):
Old scenario 
New scenario 
While we clearly see the effect of the optimization in the old scenario (left chart), as the purple line (combined method) has a significant minimum compared to the blue line (trivial approach), in the new scenario (right chart) the minimum of all three methods is almost identical. The trivial method (blue line) has it’s minimal costs at the right end point of the line but although the other methods perform better in the middle range of the chart, they cannot deliver values that are much lower than the right end point of the blue line (for my sample data, even the combined model performs only about 1.2% better than the trivial model from our first approach in this case).
In general, a good optimization is more easy to achieve, if the trivial method has no clear decision (line is almost horizontally oriented). In the new scenario, the profit for each customer is high compared to the prevention costs and the prevention campaign is extremely efficient. In such a case you cannot expect much optimization from your value or data mining driven approach.
So the first thing to remember from this post is to first check the trivial approach. This is the approach that is almost instantly available for most situations, even if the values are just estimations. If the result is more like the blue line in the left chart (horizontally oriented) it is very likely that you can achieve a significant optimization. If it is more like the blue line in the right chart (either falling or rising) you might only want to check further improvements if the costs for getting the churn score or the individual value are not too high. Otherwise you would risk doing an expensive optimization project with the result, that no optimization is possible.
There is still one open question from the last post and that is about the value for the voucher. Up to this point our success rate for making customers return by using a voucher was a fixed average value based on a test sample. Of course, this does not appear to be sensible: Customer with a high value might even be annoyed by a cheap voucher. Also, customers with a high churn value (likely to go away) might not respond to our voucher campaign in the same way as customers with a low churn rate (customers that are likely to return no matter if there is a voucher or not). So it’s time to add more reality to the model. To do so, we’ll have to analyze the data from our customer test samples (first post) in more detail. We only analyzing by customer value (an advanced model could also analyze by churn score, although a bigger test sample is required then). The following table shows the range for the customer profit, the voucher that was associated with that range and the number of customers who did not return although they received the voucher:
Profit Rage 
Number of customers in sample 
Value of Voucher 
Churn rate (not returning) 
010 
78 
0 
31% 
1015 
55 
3 
21% 
1525 
447 
5 
17% 
2535 
275 
10 
10% 
>35 
145 
15 
7% 
With this extra information, we can calculate the costs for the voucher and the probability to prevent the customer from going away more precisely. I’m still using the sort order from my combined model. Here is the result in the chart view. In order to make the difference easier to see I changed the minimum for the yaxis (don’t be fooled by the different presentation):
The newly created model is named ‘Variable Model’ in this plot (line color cyan). Because of the different bands the line is not as smooth as for the other models. However it turns out, that this approach is the best one, based on my sample data. Also you’ll notice that the end points of the cyan line differ from the other lines’ endpoints because the costs for the vouchers are no longer a constant. Here are the detailed results from all the approaches:

Trivial Model 
Profit Model 
Churn Score Model 
Combined Model 
Variable Model 
Minimal costs 
12,500.00 
11,761.15 
11,817.60 
11,228.95 
10,999.31 
Improvement 
0.00 
738.85 
682.40 
1,271.05 
1,500.69 
Improvement % 
0.0% 
5.9% 
5.5% 
10.2% 
12.0% 
After the three posts now it’s time for a short summary:
Summary
For cost optimization it’s important to first analyze the actual situation. Optimization is not always possible and before starting an expensive project, it’s better to look at the parameters (as shown above, see remarks for the ‘trivial model’). If you decide for optimization, data mining is not the only option. A value driven approach is as important as a mining model in many scenarios, unless all your customers share the same value. And it’s important to know your parameters as good as possible. In my example, the test sample of customers with and without vouchers was very important. Also keep in mind, that the success rate of a voucher (or any other method of prevention) is not a constant, but depends at least on the customer value (usually computed based on the orders of the past) and the likeliness of the customer to turn away (churn score, usually computed by a mining model). The combination of all these parameters is the key to making the optimization methods more efficient.
And I should also add a warning remark that applies to all the approaches we did here: We always try to model the behavior of the customers in the future based on data from the past. The first four approaches are all based on the same modeling idea (only the last model with the variable voucher is based on different preconditions). The only difference is the subset of customers that are addressed (different sort order). However, in all these cases the effect of the optimization may be different in reality and with every model it is important to also validate and constantly refine the model.