Does Your Predictive Analytics Need A Lift?

You’ll probably already be familiar with the term predictive analytics (you are reading this blog after all), but just in case you’re not: predictive analytics involves mining data about behaviours and characteristics of a system in order to make predictions about future behaviour of that system. Often, in marketing automation, we’re interested in the behaviors of users, looking at their past behavior and attempting to predict what that user is going to do next - and what they might respond to.

Specifically, at any given time you’ll have a set of marketing campaigns you’re rolling out for your users, and for each user you’re trying to determine the best campaign to use. The last thing you want to do is to spam your users with all your campaigns (i.e. the blunderbuss approach).

A better strategy is to segment your users into specific target groups that behave similarly, and use an appropriate campaign that best ‘matches’ that group. There are very many different characteristics you could choose to use as the basis for the segmentation; and a particularly interesting set of characteristics are those based on what you predict your users will do next.

Let’s focus in on mobile marketing (it is, after all, what we do here at Swrve). To give two examples, users may be about to abandon your app after a short period of using it, or they may be just about to convert into a paying customer, having experienced some aspect of the app that they value. If we could distinguish between these types of users, we’d definitely want to use different messages to encourage the former set from leaving, and to encourage the latter set to commit to their decision to subscribe, or make a purchase.

This is what we’re trying to do with predictive analytics. We’re receiving signals from the users as they interact with the mobile application, based on their activity within the app, their social interactions and their transactions. Using machine learning tools, we can analyse these signals, and correlate them with users that historically have actually abandoned the app, or have subsequently converted into paying customers.

Having made these associations between prior patterns of behavior and subsequent actions, we can apply this learning to spot users who are about to abandon the app, or about to convert, based on their current behaviour. Once spotted, you can apply your marketing experience to craft just the right message, or just the right call to action, to promote the behavior you’re hoping for.

This all sounds great, right? Gather the signals, learn from the historic behavior of users, and use this to predict the future behaviour of other users. In practise, it’s quite tricky. The data is noisy, users can change behaviour over time, the learning algorithms are complex and temperamental and often the interpretation of the results can be misleading.  And it’s this last point I wanted to briefly talk about.

Measuring Prediction Quality

There are a number of different measures of the “quality” of a prediction algorithm. One of the most common is the lift of the algorithm. This compares the performance of the algorithm to a random population selection; it’s the ratio of the number of users correctly classified by the algorithm versus the number of users, chosen randomly, who behaved in the targeted way. In reality we use our prediction algorithm to rank the users (i.e. to sort them from most likely to least likely to behave as predicted). We take the top 10% (the first or top decile) and compare the number of correctly identified users with a random 10% sampling of the users. This is the top-decile lift.

Let’s use some numbers to make this a little more concrete. Let’s say we’re looking to predict users who will convert at some point in the near future.  We’ve implemented our predictive algorithm, and out of 1,000 users determined the 100 that are most likely to convert. We then wait a while, and count the number of users who actually convert; let’s say that’s 28. We know now that our algorithm correctly identified 28 out of our total population of 1,000 users.

We’ve observed the actual behavior of our 1,000 users, so we know that the true population conversion rate is something like 5% (i.e. 50 users in total actually converted within the observed period). If we were to simply select 100 random users, rather than using our clever prediction algorithm (which is the same as saying let’s use the dumbest prediction algorithm we can) we’d expect, on average, to get a result close to 5% of the 100 random users, or 5.  This gives us a measure of the quality of our prediction.

By using our algorithm we correctly identified 28 users, but a random sample would be expected to find 5, therefore our top decile lift is that ratio of these two results which is 28 / 5 = 5.6. 5.6 sounds like a pretty decent number; we’re basically saying that our algorithm is 5.6 times better than random choice at identifying users who are about to convert.

This works well as a measure when the true population rates of the behavior we’re looking to predict is relatively low (typically single digit percentage rates). But you might (and you really should) ask the question: what is a good lift value? The lift measure is highly dependent on the true population rate. Let’s look at another simple result.

This time we’ll look at the rate of user churn (users leaving the app and not returning).  Our app has a pretty high 90-day churn rate (this is the percentage of users who have been inactive for more than 90 days) of 88%.  We use our clever prediction algorithm and determine the top 100 users most likely to churn.  We count the number that have actually churned and find we have 94. That looks pretty decent; out of the 100 we identified, 94 actually churned. But then calculate the top decile lift: if we sampled 100 users at random, we’d expect to find 88 that would have churned. So our lift is 94 / 88 =  1.07 (approximately), which means our clever algorithm is only 7% better than a random sampling.  That doesn’t sound very clever at all, and is purely a consequence of our true population rate being so high.

So when looking at lift scores for a prediction algorithm, make sure you first understand the true population rate (or at least have a strong expectation of what this is). The maximum top decile lift you can possibly achieve is when you perfectly predict all the users that will behave in the manner you’re predicting.  Let’s say the true population rate is r, then the maximum decile lift is ( 1 - 10r) / r  if r is less that 10%, or 1 / r if r is greater than 10%.

For our conversion prediction this maximum is (1 - 10 * 0.05) / 0.05 = 10. For our churn prediction, the maximum possible top decile lift is 1.0 / 0.88 = 1.14 (approximately).  So maybe our algorithm wasn’t so bad, or maybe in this case, there’s little point in using predictive algorithms at all, because a random sampling gives you nearly as good a result!

The moral of this story? There’s very little point in trying to predict something that is pretty likely to happen.