Some believers in big data have claimed that, in big data sets, “the numbers speak for themselves.” Or in other words, the more data available to them, the closer machines can get to achieving objectivity in their decision-making.
But data researcher Kate Crawford says that’s not always the case. In fact, big data sets can perpetuate the same biases present in our culture, teaching machines to discriminate when scanning resumes or approving loans, for example.
“Big data is very exciting for lots of reasons. It gives us capacities that we simply didn't have before. But there is also this little trap that we can tend to look at large collections of data as somehow being more objective and more representative when this is not necessarily the case,” says Crawford, who is the principal researcher for Microsoft Research. “Because they're systematic, we assume that they're somehow more objective than humans. But it was stunning to see that we can have elements of bias in both the data sets that we're using, and in the algorithms themselves.”
Suresh Venkatasubramanian, an associate professor at the school of computing at the University of Utah in Salt Lake City, says algorithms are complicated and opaque. They are able to learn and pick up on patterns, which often means that they can discriminate in strange ways. One good example is the way algorithms are used to sort through resumes of potential hires.
“An algorithm that scans resumes might say for example, 'Oh, I notice when people use this kind of font, it has a high correlation with being productive, so this is the important feature.' Is it? I don't know, maybe it is, maybe it isn't, but [the algorithm] could do things like that and it's hard to understand why,” Venkatasubramanian says.
Letting an algorithm make hiring decisions leads to strange biases.
“You are being judged for things that you're probably not even thinking about in your resume, like for example your address. There was one HR department that has been using an algorithmically driven system that gives people extra credit if they live within a close radius of the workplace because the data showed that if you had a longer commute, you were more likely to to quit or to be fired within a year,” says Crawford. “So what that also means is that they're just starting to hire people who live nearby, behind which there is a whole range of other discriminatory functions.”
Many are just now beginning to wake up to the discriminatory problems associated with algorithms. Experts like Crawford and Venkatasubramanian are starting to look for solutions.
“I'm really interested in what we do about it because I'm concerned about the kind of discrimination we're seeing against entire groups, be they African-American, be they women, be they people who live in rural areas — you name it. And we're seeing a form of group discrimination often occur in these kinds of systems. But there are things we can do about it,” Crawford says. “How do you have sort of internal systems that are checking for discriminatory outcomes? A lot of technology companies are looking into that. Another thing you can do is external audits.”
Another solution, says Venkatasubramanian, is to screen algorithms more rigorously, testing them on subsets of data to see if they produce the same high-quality results for different populations of people. And Crawford says it might be worth training computer scientists differently, too, in order to raise their awareness of the pitfalls of machine learning in regards to race, gender, bias and discrimination.
“Education is incredibly important. I've been educated myself just by looking at this,” says Venkatasubramanian. “Essentially we're trying to formulate a mathematical way of of describing bias and describing how to be fair - how algorithms could be fair, and trying to implement that into the algorithms. So there are lots of things we can do. And I think we need a lot more study of this and there is more of a growing interest in the technical side of things and how to do this.”