MIT extracts massive AI dataset on racist and misogynistic content


The Massachusetts Institute of Technology permanently recalled its data set of 80 million tiny images, a popular image database used to train machine learning systems to identify people and objects in an environment, because it used a variety of racist terms, misogynists and other offensive terms for tagging photos

In a letter published Monday to MIT’s CSAIL website, the three creators of the huge data set, Antonio Torralba, Rob Fergus and Bill Freeman, apologized and said they had decided to disconnect the data set.

“It has been pointed out to us that the Tiny Images dataset contains some derogatory terms such as offensive categories and images. This was a consequence of the automated data collection procedure that relied on WordNet nouns. We are very concerned about this and apologize who could have been affected with them, “they wrote in the letter.

According to the letter, the dataset was created in 2006 and contains 53,464 different names, copied directly from Wordnet. Those terms were used to download images of the corresponding name from Internet search engines at the time to collect the 80 million images (with a tiny resolution of 32×32).

HIGH-TECH GLOVE CAN TRANSLATE SIGNAL LANGUAGE WITH 99 PERCENT ACCURACY

MIT recalled a gigantic artificial intelligence dataset for use this week.

MIT recalled a gigantic artificial intelligence dataset for use this week.
(Gettty Images)

NEW YORK CITY MAY TEST CORONAVIRUS SEWERS: REPORT

“Prejudice, offensive and damaging imagery, and derogatory terminology alienate a significant part of our community, precisely those we are making efforts to include. It also contributes to harmful prejudice in artificial intelligence systems trained on such data,” they wrote.

“Furthermore, the presence of such damaging images undermines efforts to foster a culture of inclusion in the computer vision community. This is extremely unfortunate and runs counter to the values ​​we strive to uphold.”

Biased data sets can have a major impact on machine learning technologies and artificial intelligence programs that are used to train. Critics inside and outside Silicon Vallley have drawn attention to prejudice against black people specifically and people of color in general in various artificial intelligence systems.

The dataset will not be reloaded.