According to ProofNews, many of the world's leading technology corporations, including Apple, Nvidia and Salesforce, are being accused of illegally using data from Youtube to train AI.
Specifically, the non-profit AI research group EleutherAI has collected subtitles from hundreds of thousands of Youtube videos to build a "huge" data set called Pile. This action is considered a serious violation of Youtube's terms of service.
It is known that Pile contains subtitles of more than 173,000 videos from more than 48,000 Youtube channels, including deleted videos. This data set is then used by many large technology companies to train AI models, from chatbots to natural language processing engines.
Although some companies such as Anthropic and Salesforce admit to using Pile, they argue that it is a “publicly available” data set and deny any allegations of copyright infringement. However, ProofNews said the list of companies using Pile may be longer, because many units do not publicly disclose the source of their AI training data.
It's worth noting that Pile not only contains content from regular Youtube channels, but also includes videos from famous cryptocurrency channels like Coinbase, Cointelegraph, and Bitcoin Magazine.
This incident once again raises concerns about data copyright issues in the field of AI. Legal experts believe that disputes related to the use of Youtube data for commercial purposes will increase, especially in the context of the race to develop AI.
ProofNews said that OpenAI and Google have also collected YouTube text in the past. While Google, the company that owns YouTube, affirmed that its actions were within the framework of its agreement with users, OpenAI neither confirmed nor denied the above report.
Up to now, law firm Baker Hoestler lists at least fifteen lawsuits related to technology companies such as: Anthropic, Meta, GitHub, Stability AI, Nvidia and Google. In particular, OpenAI is facing major lawsuits from the parent company of Mother Jones and The New York Times. In general, the legal situation on data copyright in the field of AI is becoming increasingly tense with complex and unpredictable developments.
Follow 24h News to not miss any of the latest blockchain developments at the Blockchain Popularization Forum.