Amazon officials have said the company’s cloud division is investigating claims that AI startup Perplexity is scraping content from websites without any approvals. The development came after several news outlets accused Perplexity of illegally extracting content.
Perplexity AI is allegedly ignoring the web standard called Robots Exclusion Protocol. The protocol is commonly known as robots.txt, according to which websites define which pages search engines, AI, and other crawlers can access, according to a report by WIRED. Forbes has also accused the AI startup of plagiarizing journalists’ work.
Also read: SoftBank invests in Google alternative Perplexity AI at $3B valuation
Compliance with robots.txt is mandatory for sites that use Amazon Web Services (AWS) hostings or servers. While most websites follow the standard, the law does not force anyone to follow the protocol.
Perplexity is allegedly violating web standards
Amazon spokesperson Samantha Mayowa said on Friday that the firm was assessing the information about Perplexity’s violation of the protocol. A news outlet, WIRED, accused Perplexity of scraping content from news sites that prohibit such practices. It is important to remember that Perplexity uses AWS for its server needs.
Mayowa said in a statement that,
“[Amazon’s] terms of service prohibit abusive and illegal activities, and our customers are responsible for complying with those terms,” Source: AP.
The spokesperson said that the company routinely receives complaints from various sources about alleged misconduct. She also added that AWS engages with its customers to understand the reported issues.
Perplexity AI’s representative Sara Platnick said that the firm’s bots are not accessing websites in a way that goes against AWS’s terms of use. WIRED published an investigative report earlier this month saying that it fed the Perplexity AI model headlines from its news site along with prompts about specific topics. The results showed closely paraphrased stories, including false claims attributed to WIRED.
Multiple publications have accused the AI unicorn of scraping content
Forbes has also made allegations against Perplexity, saying that it is plagiarizing through a new feature called Perplexity Pages. The publication said that pages curated by Perplexity on its platform show content similar to original posts by news outlets like Bloomberg and CNBC.
The concern is that Perplexity has gained a name for properly attributing content to its creators. Forbes claims that the posts that have received thousands of views do not mention the original publishers in the text. It said that Perplexity only attributes the creators with small logos for links that are easy to miss.
Also read: OpenAI and Microsoft sued over copyright infringement
The Associated Press also reported that Perplexity invented fake quotes and attributed them to real people, which is concerning. Perplexity has garnered attention from tech investors and was recently valued at $3 billion.
The startup has raised over $100 million from investors like Amazon founder Jeff Bezos, Meta chief scientist Yann LeCun, Google Chief Scientist Jeff Dean. Recently Japan’s SoftBank also pledged to invest $10 million to $20 million in the startup.
Perplexity CEO Aravind Srinivas said that the question raised by WIRED reflects a “deep and fundamental misunderstanding” of how Perplexity carries out its operations. He said that the IP address the publication found scraping content is operated by a third party that provides web crawling and indexing services.
Cryptopolitan reporting by Aamir Sheikh