I’ve really enjoyed reading Microsoft’s guidelines for responsible AI this week, especially as they chime so well with my own thoughts. It’s a clearly written paper, with real-world examples, so very accessible. To recap, their 10 recommendations are:
Assess & prepare:
1. Assess merit of developing the product considering organisational values and business objectives
2. Assemble team reflecting diverse perspectives and with clearly defined roles and responsibilities
3. Assess potential product impact by including input from domain experts and potentially impacted groups
Design, build, & document:
4. Evaluate data and system outcomes to minimise the risk of fairness harms
5. Design AI product to mitigate the potential negative impact on society and the environment
6. Incorporate features to enable human control
7. Take measures to safeguard data and AI products
8. Document throughout the development lifecycle to enable transparency
Validate & support:
9. Validate product performance and test for unplanned failures as well as foreseeable misuse unique to AI products
10. Communicate design choices, performance, limitations, and safety risks to end user
Within its limitations, which is to discuss what should be done, not recommend how, this paper does make some very important points, which I’d like to repeat and discuss a bit further here.
AI applications need to be fair by design. It’s not enough to measure accuracy against test datasets; it takes a good deal of investment and effort to build in fairness. Even the very simplest question of understanding what is meant by fairness requires an alignment of the organisation and its values, with the society it is serving and its values. Then can we design what to measure and how to measure it throughout the lifecycle of the AI app. Only after all this work, we can decide on whether people we are trying to serve with our AI app are being served fairly.
Responsible AI needs to acknowledge that humans are in the middle, that AI apps need to build in capability for the full range of interactions with the full range of stakeholders. Product Owners will not only need to design how to make predictions, but also design the very manual processes of oversight, comprehension, customisation, feedback and inclusivity.
The paper places emphasis on identifying where an AI app might be causing harms, and proposes a broad definition of harms to include indirect harms to privacy and the environment as well as direct harms. Harms are not necessarily exclusions; a business might legitimately underserve a customer segment (if we sell car insurance only to women then it’ll be cheaper for us, so we can offer cheaper premiums). But those being excluded must still be fairly excluded and without causing harm (offering car insurance only to women was found by ECJ in 2011 to be discriminatory towards men since the use of actuarial factors based on sex is incompatible with the fundamental principle of equal treatment https://uk.practicallaw.thomsonreuters.com/7-505-3080).
Harms may be due to potential unplanned uses, so graceful rejection in such cases must still be designed. More insidiously, harms may be caused by exploitation of vulnerabilities. As some vulnerabilities are unique to AI (as discussed in my previous blog post The rise of AI viruses), so InfoSec methods will need to be bolstered with new methods.
To my mind, the most welcome section was the acknowledgement that to develop Responsible AI applications that are fair, serve their human stakeholders appropriately and guard against causing harms, the enterprise needs a diverse representative team that really understands the different viewpoints. This is different from the accepted wisdom of agile software development which defines the ideal team as one that has all the functional skills to turn its backlog items into done items. The AI dev team needs to step outside its data scientist, business analyst, IT professional roles and really engage with the perspectives of the stakeholders they try to serve. This has always been a key responsibility of the Product Owner of the team, but too often it stops there. Diversity brings a greater range of lived experiences, so is equally applicable to all team roles.
Whilst a really good paper, I’d have liked it to go a little bit further, to look at and understand where reality already falls short, and recommend improvements.
A couple of examples of disappointing reality: Earlier this month (December 2021), I was chastened to read that Clearview AI is on track to win U.S. patent for facial recognition technology. To my mind, this is one of the most egregiously irresponsible applications of AI of modern times, second only to the wilfully immoral Facebook–Cambridge Analytica data scandal. Both these scandals have a digital half-life of many generations. They cannot be retrospectively fixed so must be prevented. They show that responsible AI must be scrutinised both by internal and external regulators, who must be up to the challenge to scrutinise effectively. That there can be two such examples within a couple of years shows how far ahead of the regulators these skilled AI practitioners are.
AI apps will continue get it wrong, with alarming frequency and impact, so there needs to built in a further human interaction of redress. A responsible AI app not only needs to know where it’s falling short of stakeholder expectations, but also have a way of fixing its mistakes. (Have a look at my paper on Engineering ML Ops into Oscar Enterprise AI for my view on how to engineer in this capability.)
Scrutiny is not just limited to the predictions made, but also to the data being analysed. A well-known example of this is the Boston housing prices dataset, which until recently was bundled into many data science libraries as a learning dataset. But as noted in the Sci-kit Learn documentation
"The Boston housing prices dataset has an ethical problem: the authors of this dataset engineered a non-invertible variable “B” assuming that racial self-segregation had a positive impact on house prices."
Horrifyingly, the dataset was modified to implement systemic bias. Responsible AI must use well-scrutinised data, and provide tools for understanding its data biases. My paper on Engineering Explainability into Oscar Enterprise AI looks into some techniques we’re building to do just this.
Microsoft has written some good rules for enterprises to live by. But like all rules, simply living within them isn’t enough; we must challenge ourselves to look beyond any one set of rules and do better.