AI systems to detect 'hate speech' could have 'disproportionate negative impact' on African Americans: Study

A recent Cornell study reveals that these types of AI systems might themselves be inherently biased against minorities.

Many universities have recently unveiled artificial intelligence systems to flag potentially offensive online content.

A new Cornell University study reveals that some artificial intelligence systems created by universities to identify “prejudice” and “hate speech” online might be racially biased themselves and that their implementation could backfire, leading to the over-policing of minority voices online.

A new study out of Cornell reveals that the machine learning practices behind AI, which are designed to flag offensive online content, may actually “discriminate against the groups who are often the targets of the abuse we are trying to detect,” according to the study abstract. 

The study involved researchers training a system to flag tweets containing “hate speech,” in much the same way that other universities are developing systems for eventual online use, by using several databases of tweets, some of which had been flagged by human evaluators for offensive content.

”The results show evidence of systematic racial bias in all datasets, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates. If these abusive language detection systems are used in the field they will, therefore, have a disproportionate negative impact on African-American social media users,” the abstract continues. 

[RELATED: Princeton study warns of robo-racism, sexism]

But Cornell’s machine learning added another variable not used by other universities. Using a combination of census data, tweet location data, and demographic-specific language, they also trained the system to quantify the same “black-aligned” or “white-aligned.” The researchers used five different databases of potential “hate speech” tweets. All five yielded the same results: tweets likely from African American’s were much more likely to be flagged as offensive than those that were likely to be from whites.

Along with the possible oversampling of tweets from African Americans, the researchers believe that this type of machine discrimination lies in the human error of those who are doing the original annotating and classification from which the machine learns.  

[RELATED: Research finds implicit bias training is ineffective]

“When we as researchers, or the people we pay online to do crowdsourced annotation, look at these tweets and have to decide, ‘Is this hateful or not hateful?’ we may see language written in what linguists consider African American English and be more likely to think that it’s something that is offensive due to our own internal biases,” study author Thomas Davidson said. “We want people annotating data to be aware of the nuances of online speech and to be very careful in what they’re considering hate speech.”

This new information may be crucial for understanding the capacity of these types of systems to do harm and stifle voices online, but initiatives at other universities to integrate these types of systems into existing social media platforms are already well underway.

University of Buffalo and Arizona State University professors have already created a system designed for “automatically detecting prejudice in social media posts.” Their system flags posts as “having the potential to spread misinformation and ill will.  

[RELATED: ‘Trigger warnings’ widely used for years...but at what cost?]

A similar project at the University of California Berkeley is using the same type of machine learning to create an “online hate index” which could help various social media platforms to identify and eliminate “hate speech” online.

Researchers are also using machine language learning to weed out “fake news.” A system under development at the University of California, Santa Barbara seeks to help identify whether information shared by individuals is “genuine or “misleading.” They hope their system will soon be “integrated into browsers on the client side,” to streamline the reporting of “content that causes hate, aversion, and prejudice.”

Follow the author of this article on Twitter: @celinedryan