Skip to main content

Uplift Modelling as Addition to Classic Response Modelling

Uplift modelling can support campaign managers in managing and planning campaigns as it supplements the classic response model of campaign scoring.

Uplift modelling is based on the principal idea that campaign responders are grouped in two categories: those who would have reacted even without the campaign and those who would not have responded without the campaign. Unlike classic scoring, which equally aims at both groups, uplift scoring tries to exclusively isolate the second group and, wherever possible, ignore the first. For this purpose, the response information from the control group is used, which remains unused in classic campaign scoring.

An example is given in the graphic which illustrates the comparison of two customer groups (here: SMEs vs private customers) and their response to the campaign, e.g. a mailing. Whereas the mailing only led to a slight increase of the SME customers' take rate, it worked much better for the private customers.  

This is exactly what it is about: Finding good predictors, not only for the forecast, but exclusively for the uplift! 

 

reaction-customer-groups

 

The model can be used in many areas of campaign management. Whether the objective is retention, prevention or cross- and upsell.

Possible Approaches of Uplift Modelling

There are three approaches for uplift modelling: The first one models the responses in both groups independently from one another and subsequently calculates the differences of the reaction probabilities. Thus, the uplift is modelled only indirectly and accordingly, optimization is not based thereon. The results are rather coincidental as there is no guarantee that the same variables are selected in both models.  Thus, the approach cannot be taken very seriously.  

All other approaches thus build on a standardized data set which includes both groups. The group affiliation is modelled by means of an indicator variable (0/1). The second approach uses decision trees with a modified splitting criterion. This criterion measures the value of a potential split variable using the difference of the distributions of the response variables between test and control group. In this process, the difference is measured by means of an information-theoretical measured value, the Kullback-Leibler divergence.

As generally in practical implementation, this approach of a decision tree-based modelling also has its pitfalls: It implicitly assumes that the splitting criterion is reduced to a normal splitting criterion if the size of the control group is zero, as in the case of classic response modelling. This is not formally proven, but, in practice, the approach delivers useful results.                                                                                         

The third approach uses logistic regression and aims for the interaction effects of the response variables with the indicator variable for the affiliation with the test group. Special emphasis is placed on the selection of the variables: Mostly, the response variable is replaced by a special link between response variable and group variable. This modified outcome variable is 1 if a response has taken place and the data set belongs to the test group, or if no response has taken place and the data set belongs to the control group; in all other cases, it is 0. This has the advantage that, depending on the size of the control group, a more homogenous distribution of the target variable can be achieved despite realistic conversion or take rates within the range of 1 - 3 %. In this process, common variable selection methods (wrapper or embedded procedures) are used. The actual model is then estimated by means of the initially encoded target variable. This comprises two parts: 

 

model-formula

Requirements on Uplift Modelling

  1. Group Size and Take Rates: Despite the recoding of the target variable, the requirements on the size of the control group, the conversion or take rate in target and control group are considerably higher than in case of a response model.
  2. Relative Signal Strength: Usually, the main effects are so strong that only a small explanatory contribution for the interaction remains. Ultimately, uplift modelling thus provides a similar result to the classic response model. 

Conclusion

Uplift modelling can definitely generate added value in campaign management because it shows the incremental difference which a campaign has made and tries to explain its origin. However, this approach should only be used in addition to instruments familiar to the campaign context in order to gain further information on selection and management. 

Stefan Seltmann
Your Contact
Stefan Seltmann
Lead Expert
Stefan loves programming, particularly when data engineering and data science are involved. He's turned his hobby into a career and has earned a reputation as a "phone a friend" whenever there's a tricky Python or Spark software development problem.
#CodeFirst, #TestMore, #CodeDoctor