On June 21, 2024, Katie Drummond, Wired’s global editorial director, joined CNBC’s “Squawk Box” to discuss the magazine’s recent in-depth investigation into the AI search startup Perplexity. The conversation shed light on some troubling aspects of Perplexity’s operations and raised important questions about the future of AI and its impact on journalism and information accuracy.

The Rise of Perplexity

Perplexity has rapidly gained attention in the tech world, securing significant investments from notable figures like Jeff Bezos and garnering a valuation close to a billion dollars. This meteoric rise, however, has come with a fair share of controversy and skepticism. Wired’s investigation aimed to dissect what Perplexity actually does, how it operates, and the validity of its search results.

Investigation Findings: Unclear Mechanisms and Inaccurate Responses

Drummond highlighted that one of the primary findings of Wired’s investigation was the ambiguity surrounding Perplexity’s operational mechanisms. Drummond claims that the AI startup, which markets itself as a sophisticated search tool, often delivers responses that are not only inaccurate but also potentially misleading. She says users have reported receiving answers that seem to lack logical synthesis, relying heavily on a basic aggregation of web data without deeper analysis or understanding.

Ethical and Legal Concerns: Scraping and Data Gathering

A significant portion of the investigation focused on how Perplexity gathers its data. Drummond explained that Perplexity has been found scraping content from Wired’s parent company, Condé Nast, and other publishers without proper authorization. Despite the implementation of the robots.txt protocol—a standard used to prevent bots from accessing certain parts of a website—Perplexity allegedly used undisclosed IP addresses to bypass these restrictions. If this is true, it raises serious ethical and legal concerns about how AI companies should interact with journalistic content and the boundaries of data usage.

Biases and the Quality of AI Responses

The discussion also touched on the inherent biases and the quality of information provided by AI tools like Perplexity. The discussion emphasized the problem of “garbage in, garbage out.” This concept underscores that AI outputs are only as good as the inputs they receive. Perplexity’s reliance on easily accessible web data, which can often be biased or inaccurate, means that its responses can reflect these same issues.

Drummond argued that, currently, the best way to obtain accurate information is to go directly to the source—reading journalism from reputable outlets instead of relying on AI aggregations. This highlights a fundamental challenge in the AI industry: developing systems that can discern and prioritize high-quality, accurate information over the myriad of unreliable sources available online.

The Disconnect Between Technology and Journalism

The conversation also delved into the broader implications of AI on journalism. Drummond pointed out that the technology industry has often been detached from the values and principles that underpin news and journalism. She says that accurate and ethical information dissemination has sometimes become collateral damage in the race to develop the most advanced and valuable AI tools.

According to an article by Mark Sullivan for Fast Company, in response to Wired’s allegations, Perplexity’s cofounder and CEO, Aravind Srinivas, said that Perplexity does not solely rely on its own web crawlers but also uses third-party services for web crawling and indexing. Srinivas pointed out that the web crawler identified by Wired was not owned by Perplexity but by an unnamed third-party provider due to a Nondisclosure Agreement.

Srinivas acknowledged the complexity of immediately stopping the third-party crawler from accessing Wired’s content, stating, “It’s complicated.” He also highlighted that the Robots Exclusion Protocol, established in 1994, is not a legally binding framework and suggested that the rise of AI necessitates a new form of collaboration between content creators and platforms like Perplexity.

Further criticism arose as Wired demonstrated that Perplexity’s answer engine could closely paraphrase their articles, sometimes inaccurately. One notable error included a false claim about a California police officer committing a crime. Srinivas contended that such results were likely induced by prompts designed to provoke such responses, and emphasized that normal users wouldn’t encounter these issues. He admitted that the system was not immune to errors.

Featured Image via Pixabay