European AI Law Causes Controversy Over Data Transparency

The European Union's (EU) new AI law will force companies to disclose training data, creating a major debate over intellectual property rights.
The birth of ChatGPT marks an important milestone, opening up an explosive era of generative AI (Gen AI). In just 18 months, this technology has attracted a huge amount of investment and is widely applied in many fields. Generative AI is a collection of applications that can rapidly produce text, images, and audio content.
However, besides the great benefits, generative AI also raises many legal issues, especially the story of the origin of training data, according to Reuters.
Understanding these challenges, the European Union (EU) has pioneered the issuance of the AI ​​Law, expected to take effect within the next 2 years. One of the most notable points of this law is the requirement that organizations deploying general-purpose AI models, typically ChatGPT, must transparently disclose training data. Specifically, they must provide a “detailed summary” of the data sources used, including text, images and audio.
This regulation is expected to partly resolve concerns about copyright infringement when many AI companies are accused of illegally using books, movies and other works of art to train AI without permission. with author's consent.
However, this move faced strong opposition from technology companies. They believe that revealing training data is like "revealing a secret formula", putting them at a disadvantage in the fierce competition.
Mr. Matthieu Riouf, CEO of Photoroom, a company specializing in photo editing using AI, said: "Publicizing AI training data is like forcing a famous chef to reveal his secret recipes." That's me." This view also receives agreement from many other technology giants such as Google and Meta, who are betting the future on AI.
The level of detail in these transparency reports will have a big impact on small AI startups and big tech companies like Google and Meta, which have put this technology at the heart of their future operations. Surname.
Over the past year, several prominent tech companies, including Google, OpenAI, and Stability AI, have faced lawsuits from authors who claimed their content was improperly used for training. Models. Although US President Joe Biden has issued several executive orders focusing on the security risks of AI, questions about copyright have not been fully tested. Requirements forcing tech companies to pay rights holders have received bipartisan support in Congress.
Facing pressure from public opinion, technology giants have begun to "appease" with a series of content licensing agreements with media agencies. Typically, OpenAI has signed agreements with the Financial Times and The Atlantic, while Google has joined hands with NewsCorp's social network Reddit.
However, these moves are still not enough to appease public opinion. OpenAI continued to come under criticism when CTO Mira Murati refused to answer questions about whether the company used YouTube videos to train Sora – the AI ​​video creation tool – or not. The incident of actress Scarlett Johansson's AI voice in the latest version of ChatGPT has increased the wave of opposition to OpenAI.
In the midst of the controversy, Thomas Wolf, co-founder of Hugging Face, spoke out in support of data transparency, but admitted this view did not receive consensus in the industry.
Meanwhile, European lawmakers also have mixed views. Congressman Dragos Tudorache, one of the drafters of the AI ​​Law, said that making training data public is necessary to ensure the rights of content creators. “They have the right to know whether their work is being used to train AI,” he emphasized.
The battle between data transparency and trade secrets in the field of AI is becoming hotter than ever. Experts predict this will be one of the biggest challenges that policymakers and businesses will face in the near future.
Explore More From Creator

Latest News