According to Cointelegraph, the News Media Alliance (NMA) has claimed that artificial intelligence developers heavily rely on illegally scraping copyrighted material from news publications and journalists to train their models. In a 77-page white paper and accompanying submission to the United States Copyright Office, the NMA stated that the data sets used to train AI models contain significantly more news publisher content compared to other sources. This leads to AI-generated content infringing on publishers' copyrights and putting news outlets in competition with AI models.

The NMA argues that while news publishers invest and take risks, AI developers are rewarded in terms of users, data, brand creation, and advertising dollars. Reduced revenues, employment opportunities, and tarnished relationships with viewers are other setbacks publishers face. To address these issues, the NMA recommended that the Copyright Office declare that using a publication's content to monetize AI systems harms publishers. The group also called for various licensing models and transparency measures to restrict the ingestion of copyrighted materials and adopt measures to scrap protected content from third-party websites.

Despite acknowledging the benefits of generative AI, the NMA noted that publications and journalists can use AI for proofreading, idea generation, and search engine optimization. AI chatbots like OpenAI's ChatGPT, Google's Bard, and Anthropic's Claude have seen increased use over the last 12 months. However, the methods to train these AI models have been criticized, with all facing copyright infringement claims in court. Google has stated that it will assume legal responsibility if its customers are alleged to have infringed copyright for using its generative AI products on Google Cloud and Workspace, but its Bard search tool is not covered by this legal protection promise. OpenAI and Google did not immediately respond to a request for comment.