As the adoption of AI by businesses gathers momentum, addressing the growing issue of Algorithmic Bias is becoming increasingly important. Recognising that it isn’t just the problem of data scientists, but of society as a whole is key.

algorithmic biasFour years ago, the software engineer Jack Alciné caused a storm by pointing out to Google that their algorithm had the unsavoury tendency to classify his black friends as Gorillas. Following a public outcry for blatant racism, the giant apologised and diligently ‘fixed’ the problem. Last year Amazon got into hot water by finding its advanced AI hiring software heavily favoured men for technical positions. Again, retraction followed the outcry. In a more newsworthy style, an unfortunate translation from Facebook accidentally got a Palestinian man arrested in Israel by mis-translating a caption he had posted on a photo of himself. Posing next to a bulldozer, the caption read ‘Attack them!’ instead of ‘Good morning!’. The man underwent questioning for several hours until the mistake came to light.

But the GAFA aren’t the only ones struggling to navigate the dangers of at-scale AI and one can easily find a plethora of examples of discriminatory data science. Take the work coming out of the MIT Media Lab for example, where Joy Buolamwini showed in early 2018 that three of the latest gender-recognition AIs, from IBM, Microsoft and Megvii, could infer a person’s gender from a photograph 99 per cent of the time as proclaimed… provided it was a white man. For dark-skinned women, accuracy dropped to a mere 35 per cent. You can imagine their public relations troubles.

Algorithmic Bias

One could easily think that it is solely a matter of writing smarter code: better translation algorithms, better image recognition software, etc… But closer inspection shows that waving the magic wand of perfect software engineering wouldn’t get us very far with regards to these issues, as their roots are deeper than code. The fact is that AI algorithms learn from given (often external) sources of data, and therefore their actions will naturally reflect the leanings or affinities of the information these sources contain. For Amazon’s hiring bot, it perpetuated the preference of the HR department, for Google’s recognition tool, it just had mainly seen pictures of white people. Even an autonomous AI gathering data dynamically would only be able to learn from the environment it would be subject to. And that leads to discrimination. Often for the simple reason that learning is by nature discriminatory and suffers from the unknown unknowns problem.

This effect is known as ‘Algorithmic Bias’ and is becoming a common issue for data scientists. Google didn’t go out of its way to be racist, Facebook didn’t intend to get users arrested and IBM et al didn’t decide to make their facial recognition software blind to black women. They were ‘victim’ of the environment their AI learned from and the negative impact on people’s lives was collateral damage of this limitation.

As data science becomes increasingly mainstream, managing algorithmic bias cannot become the elephant in the room. It is crucial that organisations implement fair protocols that lead to fair outcomes and decisions. Doing so will become part of the social contract. However, as educated on the issue as people are, it is commonly assumed that this work will be done at a data level, confined in data science labs. But in reality, it must inscribe itself in a broader societal effort. Not only because training models to be consistent and robust is very difficult, but because many key considerations belong outside the lab:

1. Bias is embedded everywhere

Discrimination factors are plenty: sex, race, national origin, salary, employment, criminal history, colour, religion, age, disability, to name but a few. It is not difficult to ban variables directly measuring these quantities. But they have the tendency to appear in innocuous places and data that doesn’t contain them can easily give birth to unfair models. A fantastic example was given by Kleinberg et al , showing that an algorithm selecting prospects for enrolment in a special care programme for US healthcare was enrolling people of colour much later in their illness than white people. The reason being that they tend to spend less on preventive treatment and, as a whole avoid private establishments, and since health spending was found to be a good proxy for severity of condition, the software ended up assisting those that were already in a privileged position.

2. Who defines discrimination and who gets to claim it?

Establishing fairness is a highly political endeavour. It is the task of a political and legal system to define the ideal of fairness for a population. Discrimination laws are what sets the standards of ethical behaviour for individuals and businesses alike. No Data Scientist or Software Engineer should have to decide on these subjects. For the sake of illustration, take the car insurance industry for example. Women traditionally receive cheaper car insurance quotes than men because they historically have less accidents. But isn’t this discriminatory against the men that have never had a car accident? Arguably yes. But these cases must be argued.

3. Understanding the legislative landscape

Compounding the problem is the fact that many companies having the potential for Big AI are operating on a global scale. They will then confront many different anti-discrimination laws, which makes navigating the legislative terrain a judicial nightmare. Take the mortgage industry this time. In the US, the Fair Housing Act makes it illegal to base a mortgage decision on a person’s propensity to default on the payments. In the UK, the same model would need to account for FSA legislation and GDPR. In Australia it would need to adhere to Australian laws and so on. Whilst laws across borders will never be unified having an anti-discrimination framework in place for each country would at least be a start.

4. Managing representability

To check that an algorithm is discriminatory, one will need a data set of examples to test the hypothesis. Who will be in charge of doing so, and to what proportion will different populations be represented in it?

Call to Action

The issues of ethics and bias in data science are major in scale, and they are going to sneak in to everyone’s life, regardless of the attention they are paid. It is becoming painfully clear that many layers of society must come together to define the future of AI. It is a sin of arrogance to believe that tech can deal with this and we encourage all companies to engage in dialogue with lawmakers and politics. Data Scientists should have their own Hippocratic Oath but their responsibility should end (approximately) there. Their value lies in creating robust models that achieve their aim of enhanced decision making. The rest of society must assist so clear auditing protocols can (and will) be used widely by public and private companies alike.