The Tony Blair Institute for Global Change (TBI), a non-profit think tank, recently published research indicating that artificial intelligence could streamline the United Kingdom’s workforce, reduce government costs by billions, and automate more than 40% of worker tasks. 

According to the research, however, these benefits would require the government “to invest in AI technology, upgrade its data systems, train its workforce to use the new tools and cover any redundancy costs associated with early exits from the workforce.”

This would cost approximately $4 billion per year for the next five years and $7 billion per year after that, write the researchers.

But the real problem with the research, according to outside researchers who’ve read the paper, is in its reliance on ChatGPT.

Oxford University's Mohammad Amir Anwar opined on X that the Tony Blair Institute was “making shit up,” meanwhile the University of Washington’s Emily Bender, told 404 Media’s Emanuel Maiberg that the researchers “might as well be shaking at Magic 8 ball and writing down the answers it displays.”

The problem

TBI researchers set out to provide a high-level overview of the entire workforce so that they could then predict what potential impact automation could have on the market going forward.

They determined that AI could save the UK billions of dollars almost immediately. Per the research paper, the costs of investment vs the potential savings “implies the net savings from fully utilising AI in the public sector to be nearly 1.3 per cent of GDP each year, equivalent to £37 billion a year in today’s terms.”

The researchers even go so far as to claim that “this equates to a benefit-cost ratio of 9:1 in aggregate” up front and, “after five years we estimate the programme could cumulatively save 0.5 per cent of annual GDP (or £15 billion in today’s terms), implying a benefit-cost ratio of 1.8:1 is possible if the technology is rolled out quickly.”

While those numbers are certainly exciting, it's unclear if they have any actual meaning.

What’s in question is how the researchers came to their conclusions. Rather than conduct an exhaustive study with workers and employers to determine how automation would affect a given position, they used the O*NET dataset to identify 20,000 tasks performed by workers and then fed the data to ChatGPT. The team then prompted the AI to determine what tasks were suitable for automation and what tools could be used to automate them.

According to the researchers, using human experts to go over each task would have made their work “intractable,” which in science means it’s too difficult to perform.

This also means, ostensibly, that it would be “intractable” for the researchers to evaluate each of ChatGPTs outputs — the team says they used the AI system to categorize nearly 20,000 tasks.

If we can assume the AI made mistakes (according to both the TBI research and ChatGPT maker OpenAI’s website, the models are prone to error), then we can also assume that the research contains faulty information, and that peer review would be intractable as well.

Automation isn’t easy

So, what’s the real number? Technically speaking, it wouldn’t be possible for ChatGPT to understand the nuances of automation on a task-by-task basis because the necessary data is almost completely unlikely to be in its dataset due to the intractability of creating it by hand. 

When it comes to solving novel problems that an AI system hasn't been trained on, generative systems tend to fail. 

For example, automatic coffee makers have existed for decades but general automation — teaching an AI system to make coffee anywhere, in any room — is considered an outstanding problem in the fields of artificial intelligence and robotics.

Simply put, automation is difficult and requires a nuanced approach to each individual task.

Back in 2017, for example, as the generative AI frenzy began picking up steam, it was assumed that autonomous driving would be solved within a matter of years. Elon Musk even famously predicted that Tesla would operate one million robotaxis by the year 2020.

But, as of July 2024, the vast majority of automakers, startups, and big tech outlets who were working on self-driving cars as of 2021 have shut down their respective programs. It turns out that 99% of driving is able to be automated, but so far, no engineering team has figured out how to safely automate the edge cases that make up that last one percent. 

While it’s easy to imagine any simple task being automated, context is important. ChatGPT may be capable of outputting text indicating that any job can be automated if you throw enough money at the problem, but the reality has so far proven antithetical to these claims.

Related: Intuit lays off 10% of staff to focus on AI