June 18 (Reuters) – Alphabet Inc’s Google (GOOGL.O) told Reuters this week it was developing an alternative to the industry standard method for classifying skin tones, which a growing number of technology researchers and dermatologists, is inadequate to assess whether products are biased against people of color.
The problem is a six-color scale known as the Fitzpatrick Skin Type (FST), which dermatologists have used since the 1970s. Tech companies now rely on it to categorize people and measure whether products such as systems facial recognition sensors or smart watch heart rate sensors work equally well on all skin tones. Read more
Critics say FST, which includes four categories for “white” skin and one for “black” and “brown,” ignores diversity among people of color. Researchers from the US Department of Homeland Security at a federal technology standards conference last October recommended dropping the FST for assessing facial recognition because it misrepresents the color gamut in various populations.
In response to Reuters questions about FST, Google, for the first time and ahead of its peers, said it was quietly pursuing better measures.
“We are working on alternative, more inclusive measures that could be useful in the development of our products, and will collaborate with scientific and medical experts, as well as with groups working with communities of color,” the company said, refusing to provide details. on effort.
The controversy is part of a larger calculation of racism and diversity in the tech industry, where the workforce is whiter than in industries like finance. Ensuring that technology works well for all skin colors, as well as different ages and genders, is increasingly important as new products, often powered by artificial intelligence (AI), extend to sensitive and regulated areas such as healthcare and law enforcement.
Companies know their products can be faulty for groups that are underrepresented in research and test data. The concern about FST is that its limited scale for darker skin tones could lead to technology that, for example, works for golden brown skin but fails for espresso red tones.
Many product types offer much richer palettes than FST. Last year Crayola released 24 flesh-colored pencils, and this year’s Barbie Fashionistas dolls from Mattel Inc (MAT.O) cover nine tones.
The question is far from being academic for Google. When the company announced in February that the cameras on some Android phones could measure the pulse rate with fingertips, it said the readings would on average be 1.8% in error whether users were light or dark skinned. .
The company went on to give similar assurances that skin type would not materially affect the results of a background filtering feature in Meet video conferences, or an upcoming web tool to identify skin conditions dubbed informally Derm Assist.
These conclusions are derived from testing with the six-tone FST.
The late Harvard University dermatologist Dr. Thomas Fitzpatrick invented the scale to personalize ultraviolet radiation treatment for psoriasis, an itchy skin condition. He grouped the skin of “whites” into Roman numerals I through IV by asking how many sunburns or tans they developed after certain periods of sun exposure.
A decade later came type V for “brown” skin and VI for “black”. The scale is still part of the US regulations for testing sunscreen products, and it remains a popular dermatological standard for assessing patients’ cancer risk and more.
Some dermatologists say the scale is a poor and overused measure for care, and often mistaken for race and ethnicity.
âA lot of people would assume I’m type V, which rarely or never burns, but I burn,â said Dr. Susan Taylor, a dermatologist at the University of Pennsylvania who founded the Skin of Color Society in 2004 to promote the research on marginalized communities. “Looking at my skin tone and saying I’m Type V doesn’t do me a favor.”
Tech companies, until recently, were indifferent. Unicode, an industry association overseeing emojis, referred to FST in 2014 as the basis for adopting five skin tones beyond yellow, saying the scale was “without negative associations.”
A 2018 study titled âGender Shades,â which found facial analysis systems more often confused people with darker skin, popularized using FST to assess AI. Research described FST as a “starting point,” but scientists from similar studies who came later told Reuters they were using the scale to stay consistent.
âAs a first step for a relatively immature market, it serves to help us identify red flags,â said Inioluwa Deborah Raji, a Mozilla member specializing in AI auditing.
In an April study testing AI for deepfakes, researchers at Facebook Inc (FB.O) wrote that FST “clearly does not encompass the diversity of brown and black skin tones.” Still, they posted videos of 3,000 people to use in evaluating AI systems, with FST tags attached based on the ratings of eight human reviewers.
The judgment of the evaluators is central. Last year, facial recognition software startup AnyVision gave reviewers celebrity examples: former baseball greats Derek Jeter as Type IV, model Tyra Banks as V, and rapper 50 Cent as VI. .
AnyVision told Reuters it agreed with Google’s decision to review the use of the TSP, and Facebook said it was open to better measures.
Microsoft Corp (MSFT.O) and smartwatch manufacturers Apple Inc (AAPL.O) and Garmin Ltd (GRMN.O) refer to FST when working on health-related sensors.
But the use of FST could fuel “false assurances” about smartwatch heart rate readings on darker skin, clinicians at the University of California at San Diego wrote, inspired by the movement for the. social equality Black Lives Matter, in the journal Sleep last year.
Microsoft has recognized the shortcomings of FST. Apple said it is testing on humans across skin tones using various measures, only sometimes FST among them. Garmin said that due to extensive testing, he believes the readings are reliable.
Victor Casale, who founded makeup company Mob Beauty and helped Crayola on the new pencils, said he had developed 40 shades of foundation, each different from the other by about 3%, enough for that most adults can tell.
Color accuracy on electronics suggests technology standards should have 12-18 tones, he said, adding, “You can’t just have six.”
Reporting by Paresh Dave; Editing by Jonathan Weber and Lisa Shumaker
Our Standards: The Thomson Reuters Trust Principles.