Six factors for modelling with confidence in 2019
As 2019 draws near there has already been a plethora of predictions about what the New Year will have in store for us.
In the marketing arena the increased use of big data is a dead cert with UK investment in analytics set to double by the start of according to new research by OC&C. The problem with data modelling in the new regulatory environment is that marketers must be able to explain how they arrived at certain predictions or decisions, which is why in 2019 confidence scores are going to become increasingly important.
For the uninitiated in analytics a confidence score is simply a figure that indicates the confidence level of that piece of data i.e. how accurate is it?
Traditionally confidence scores described the provenance of the data – if it is known where the data came from and if it is fully verified. For instance, using a very simplistic example home movers data from Royal Mail would have a high confidence score because the data has a legitimate source and it is validated because the data subject themselves has filled out a postal redirections form.
By comparison home mover data compiled from redirects or ‘not known at this address’ notifications would be considered low confidence as it hasn’t been verified. If an address was flagged by both methods the confidence score would be even higher as both systems are in agreement.
As data has become more complex and unstructured (80 per cent of data is now unstructured) confidence scores have also become more sophisticated, incorporating six key factors that help establish the reliability of the data.
These factors are:
- System integrity – how many systems have the same data value?
- Governance– is the data compliant?
- Correctness– is the correct?
- Completeness – is the data complete or incomplete?
- Security – is the data safe from breach and loss?
- Timely – how old is the data?
These factors help paint a more detailed picture about the data subject and there are many more factors that can be applied. Additionally each can be weighted in terms of their relative importance to the overall score.
With models becoming ever more widely used in business it is crucial for executives to have confidence in the data they are using to ground their decisions.
However, creating confidence scores can often be just as complex as creating your predictive model. But they are just as important – if not more so. It requires judgement, statistics and experience. Moreover, accurate confidence scores are vital when providing data that will underpin business process and an important part of building trust both with consumers and regulators which is why we believe confidence scores will be a hot topic in 2019.
In the meantime, we wish you a very Happy New Year!