By law, organisations must take accountability for how they are using and processing data. Exlainability is key to this.

Explainability of predictive modelsPicture this: you are a data scientist and you are reviewing a neural network architecture built by one of your colleagues. The model is highly predictive and would cause significant uplift to the business. “That’s great”, you say.  “How does the model arrive at that decision?” Before finishing the question, you already know the answer: “I’m not quite sure, it just does.” Unfortunately, this is a common scenario in data science labs the world over. And hence the reason that Explainability has risen to the fore.

What exactly is Explainability?

Its definition is quite straightforward – it is the ability to explain in human terms what is going on within the internal mechanics of a machine learning system. However, unfortunately its application is anything but simple, but more on that later.

Explainability is one of the Five Important Considerations for Data Scientists.

Why is Explainability so important?

There are a number of reasons, but the most significant is the recent change in data protection legislation. Under GDPR companies are now required to know the provenance of the data they hold and process. Furthermore, consumers have the right to know exactly what their personal information is being used for, and why decisions have been made as a result of their data. Given that customer data is a primary source of fuel for the algorithms constructed using machine learning, organisations have a legal responsibility to understand these models.

Moreover, GDPR notwithstanding, Explainability is a case of plain and simple ethics. Understanding a model to anticipate any unintended consequences and potential bias that could impact vulnerable customers (or indeed any customer) is morally the right thing to do. Data ethics are increasingly coming under scrutiny and several Think Tanks and organisations around the world are creating ethical frameworks to enable data science to move forwards, but in a responsible and answerable manner.

The problem is models are becoming more complex, with larger numbers of features and increased feature engineering which make Explainability much more of an issue. Transparency is not just important legally and ethically but also practically, for instance, for software debugging or certification. In recent years, Neural Networks have made great advances in multiple application areas, however, they are notoriously opaque and therefore it often isn’t clear which feature is important or has had an impact on the outcome of a model.

Explainability can be achieved in the following ways:

One is through feature replacement and reverse engineering. Models can be retrained without a key feature and the impact of removing this variable on the outcome of the model is analysed. Alternatively, the use of a simpler model, such as a regression model, can be used to predict the outcome of the more complex one.  If your simpler model can achieve a relatively high ROC (Receiver Operating Characteristic) score, it is possible to evaluate the most important drivers within the model, even if the detailed permutations are out of reach.

Both methods take time, patience and skill and the temptation will always be to skip this step, in order to move on to the more exciting job of building the next impressive predictive model. However, if data science is to truly become the power house that is being predicted, it is crucial that it does so in a responsible way. And Explainability will be key to this.

This article was published by AIThority on 7, April 2019.