Research from Dresner Advisory Services reveals that 53 per cent of organisations are using predictive analytics to help them enhance marketing communications, reduce risk, detect fraud and optimise operations.

2019 - data will be the new IP-small

Airlines use it to set ticket prices, insurers use it to calculate the likelihood of water ingress, credit providers use it to determine spend limits, banks use it to identify account take overs, retailers use it to predict next likely purchase and so on. Clearly predictive modelling is an incredibly useful tool to determine future behaviour.

There are two types of predictive (or supervised) modelling, classification and regression.

Fundamentally, classification is about predicting a label and regression is about predicting a quantity. So classification models calculate class membership e.g. determining whether a customer is likely to leave, how someone might respond to an offer or what someone might want to buy next. Whilst regression models predict numbers such as lifetime value of a customer or how many months it will take to acquire a certain amount of customers.

The most widely used predictive modelling techniques include decision trees, regression analysis, neural networks, bayesian analysis and KNN (K nearest neighbour). However, there are many more. What is important to understand about all these techniques, is the fact that today it is possible to download open source algorithms and packages such as keras, scikit-learn, KNIME, Orange : or buy them in from commercial sources such as SPSS Modeler or KXEN.

What this means is that ultimately the building block architecture of most predictive models used commercially or otherwise is the same. Therefore what separates one from another is the data that trains it.

Training data is literally just that. It is the data that data scientists use to ‘teach’ the model so that the predictions it makes are as accurate as possible.

Training a model is no different to teaching a child. If you keep showing a child a ball and say ‘ball’ eventually the child will understand that the spherical object is a ball. The same is true for a model – only they have a much greater capacity for learning than humans. So it is possible to model algorithms to find relationships, detect patterns, understand complex problems and make decisions all at speed.

Eventually, the quality, variety, and quantity of the training data will determine the success of the predictive model. To do this training data has to be correctly labelled and classified so that the model can learn from it. To make more accurate predictions the more training data there is available the better. This is why organisations such as Google and Amazon have such a great competitive advantage – because they have so much quality data they can use to train their algorithms. They are walled gardens and have never let their valuable data leave their ecosystems or be shared with competitors. 

The selection and labelling of training data tends to be a long and laborious process.

Each piece of data has to be checked that it is relevant, up-to-date and annotated correctly. The more effort that is put in at this stage the more accurate the predictions will be at the end. As with all data-based applications – rubbish in, rubbish out.

It has already been proved that organisations that use predictive models are more successful that those that don’t. However, as increasing number of organisations turn to predictive modelling it is clear that competitive advantage will lie in training data, not the model itself.

Emma Duckworth is a Data Scientist at Outra.

The following article was published by Marketing Industry News, 11th December 2018.

Related Posts