AI Bias: When Algorithms Go Bad

Earlier this month researchers from the Massachusetts Institute of Technology and Stanford University reported that they had found that three commercial facial-analysis programs from major tech companies showed bias in both skin-type and gender. The error rates for determining the gender of light-skinned men were 0.8% compared with much higher error rates for darker-skinned women, which in some cases was as much as 20% and 34%.

This is not the first time an algorithm powering an AI application has delivered an erroneous — to say nothing of embarrassing — result. In 2015, Flickr, a photo-sharing site owned by Yahoo launched image-recognition software that automatically created tags for photos. The problem? Some of the tags being created were highly offensive — such as “sport” and “jungle gym” for pictures of concentration camps and “ape” for pictures of humans including an African American man. The service also tagged a white woman wearing face paint as an ape as well.

In 2016 when Microsoft unveiled Tay, a chatbot for Twitter, it took about 24 hours for Tay to pick up misogynistic and racist language from Twitter users and then repeat this language back to Twitter's users.

How do such events happen? Algorithms, after all, are formal rules that make predictions based on historical patterns. That would seem to be the antithesis of bias.

Why Algorithms Go Bad

There are several reasons how algorithms deliver unexpected results said Julia Stoyanovich, a professor in Drexel’s College of Computing & Informatics who studies the ethical development of algorithms and artificial intelligence, but it usually comes down to bias in the original data on which the algorithm is trained, validated and ultimately deployed.

Humans play a role too, she added, through the scoring methods developed for the data set and then decisions on how to weigh the different attributes in the data set. Here is one example she offered, “For college admissions, someone might say 'I’m going to give an equal importance to SAT scores and to GPAs,' and they may not realize that math SAT scores are lower systematically for women and that English SAT scores are systematically lower for African Americans.”

Or humans might give the algorithm faulty data against which to train. Eliezer Yudkowsky of the Machine Intelligence Research Institute, in a research paper, tells of a computer vision system that the US Army had set out to build with the goal of having it automatically detect camouflaged enemy tanks. The system was supposed to identify pictures of tanks, but in reality was identifying backgrounds of such images.

Yudkowsky explained that the researchers trained a neural net on 50 photos of camouflaged tanks in trees, and 50 photos of trees without tanks. “It turned out that in the researchers’ dataset, photos of camouflaged tanks had been taken on cloudy days, while photos of plain forest had been taken on sunny days. The neural network had learned to distinguish cloudy days from sunny days, instead of distinguishing camouflaged tanks from empty forest,” she wrote.

Societal Impact

While examples of African Americans being tagged as apes or concentration camps as gyms are highly offensive ultimately such incidents don’t have a societal impact, Stoyanovich said. “The algorithms that can do the most damage are both discriminating against groups of individuals systematically and are opaque so an individual cannot know that there’s something going,” she said. One of the first examples of this kind was documented by Latanya Sweeney, a professor of Government and Technology in Residence at Harvard University, in 2013 when she published work she conducted on the ads served against racially identifiable names.

Learning Opportunities

Prove the significant result not only in soccer

WebinarJul 14, 2026 · 9:00 AM PDT

Content Leaders Collective: Proving Content's Business Impact Starts With the Right CCMS

WebinarJul 22, 2026 · 11:00 AM PDT

Replacing Tasks, Not Roles: The Changing Nature of Contact Center Work

Birds sitting on a tree branch like a content team

WebinarJul 23, 2026 · 11:00 AM PDT

How Fast-Moving Content Teams Keep Up as Sites Grow

WebinarJul 30, 2026 · 11:00 AM PDT

From Automation to Intelligence: How Leading Teams Are Rethinking Operations

WebinarAug 19, 2026 · 9:00 AM PDT

How to Win the War for Agentic Citations: The AEO Playbook You Need Now

Promotional banner for CX Retail USA Exchange 2026, an invite-only customer experience and retail leadership conference in Atlanta on Sept. 14–15, 2026.

ConferenceSep 14, 2026 · 7:30 AM EDT

CX Retail Exchange USA Atlanta 2026

Gaylord Rockies Resort & Convention Center in Aurora, Colorado

ConferenceNov 4, 2026 · 9:00 AM MST

Gartner Customer Service & Support Conference Denver 2026

WebinarOn Demand

How Modern Marketing Is Exposing the Limits of Legacy CMS

Watch Now

View All

Sweeney, an African American, Googled her name and received an ad for a service that asked ‘Would you like to see the criminal record of Latanya Sweeney?’ Sweeney, who doesn’t have a criminal record, began a study comparing the incidents of these ads being served in the result of a Google search against racially identifiable names. She was finding these ads served in statistically significant numbers more than, say, a person searching for Mary Jones. “This, of course, is terrible because when a potential employer Googles your name, and they receive an ad offering to serve up that person’s criminal records they might assume that person actually has a criminal record,” Stoyanovich said.

Why Algorithms Go Bad

Societal Impact

About the Author