An advocacy group revealed that image generators have used billions of images of Brazilian kids to train their AI models without their consent. Human Rights Watch (HRW) carried out research that shows popular image generators like Stable Diffusion used images of kids “spanning their entire childhood” to train their models.

Also read: Tech companies urged to combat surge in AI-generated child sexual abuse material

The HRW study reveals these images were taken from about 10 Brazilian states. It reported that these pictures pose a huge “privacy risk to kids” because the act also increases the production of non-consensual images bearing their likeness.

Billions of Brazilian kids’ images used to train AI models

HRW researcher Hye Jung Han exposed the problem after analyzing a fraction (less than 0.0001%) of LAION-5B, a dataset built from Common Crawl snapshots of the public web. She revealed that the dataset did not have the actual photos, but contained “image text pairs” taken from nearly 6 billion pictures and captions posted since 2008.

Kids’ pictures from across 10 Brazilian states were found, most of them comprising family pictures uploaded on parenting and personal blogs. According to the report, these are pictures that internet users do not easily stumble upon.

Also read: UK to declare sexually explicit deepfakes creation a criminal act

HRW removed links to the images in collaboration with LAION, the German nonprofit that created the dataset. Concerns still remain that the dataset may still be referencing children’s images from around the world since removing links alone does not entirely solve the problem.

“This is a larger and very concerning issue and as a volunteer organization, we will do our part to help,” LAION spokesperson Nate Tyler told Ars.

Children’s identities are easily traceable

The HRW’s report further revealed that the identities of many Brazilian kids could be traceable as their names and locations were used in the captions that built the dataset. It also raised concerns the kids may be at risk of being targeted by bullies while their images may be used for explicit content.

“The photos reviewed span the entirety of childhood,” reads part of the report.

“They capture intimate moments of babies being born into the gloved hands of doctors, young children blowing out candles on their birthday cake or dancing in their underwear at home…”

HRW.

Han however revealed that “all publicly available versions of LAION-5B were taken down,” and therefore less risk of the Brazilian kids’ photos being used now.

According to HRW, the dataset will not be available again until LAION is certain all flagged content is removed. The decision was made after a Stanford University report also “found links in dataset pointing to illegal content on the public web,” including over 3,000 suspected instances of child sexual abuse content.

At least 85 girls in Brazil have also reported their classmates harassing them by using AI to generate sexually explicit deepfake content “based on photos taken from their social media content.”

Protecting children’s privacy

According to Ars, LAION-5B was introduced in 2022, reportedly to replicate OpenAI’s dataset, and was touted as the biggest “freely available image-text dataset.”

When HRW contacted LAION over the images, the organization responded by saying AI models trained on LAION-5B “could not produce kids’ data verbatim,” although they acknowledged the privacy and security risks.

The organization then started removing some images but also opined that parents and guardians were responsible for removing children’s personal photos from the internet. Han disagreed with their argument, saying:

“Children and their parents shouldn’t be made to shoulder responsibility for protecting kids against a technology that’s fundamentally impossible to protect against. It’s not their fault.”

Han.

HRW called for Brazilian lawmakers’ urgent intervention to protect the rights of children from emerging technologies. New laws must be in place to prohibit the scrapping of children’s data into AI models, as per HRW recommendations.

Cryptopolitan reporting by Enacy Mapakame