![图片](https://public.bnbstatic.com/image/pgc/202411/a441d3367744a2f26d3f4e2d31b0adcb.png)
The Web3 community is very open, experimental, and often supports projects that attempt to advance computer science research. However, one area where we do not do well is the clarity of thought or communication. This article aims to help explain the necessary conditions for deploying AI models on smart contracts.
In simple terms: after reading this article, you should have a straightforward intuition about what is possible as of October 2024 and what remains to be addressed.
Has AI been brought on-chain? If not, what is missing?
Smaller models can also achieve on-chain AI, but current limitations in memory, computational power, and consensus mechanisms hinder the effective deployment of large AI models like LLMs on the blockchain, just as one would expect if they were deployed on traditional cloud servers.
However, many innovations are currently underway to help bridge this gap.
What is the main takeaway from this?
The computational cost of AI is high, and the computational cost of decentralized computing is also high; thus, combining two expensive things... can complicate matters.
Andreas Rossberg, co-founder of WebAssembly, perfectly articulated this:
But in my view, the reason AI on the blockchain is "difficult" is that both technologies are already very expensive resource-wise (blockchain due to replication and consensus, AI because LLMs and similar things are essentially massive brute-force methods). Designing hardware to run AI is entirely focused on cost reduction, but when combined with blockchain, hardware costs actually increase—not just add up, but multiply—so in terms of resource usage, this is a worst-case scenario of abuse over use.
Source:
forum.dfinity.org/t/what-makes-ai-on-blockchain-hard-request-for-feedback-on-post/32686/3
![图片](https://public.bnbstatic.com/image/pgc/202411/eb5b2fcd7ede699e5e17584870af5087.png)
Useful background information
To understand this article, there are several concepts worth quickly explaining.
1. Training vs. Inference
When people mention AI, they typically refer to either training models or inference (using models, such as asking ChatGPT). Training is several orders of magnitude more complex than inference and requires more resources, so I focus here on inference as it is the first significant hurdle before addressing the more complex challenges of training.
2. CPU vs. GPU
In simple terms, GPUs are computers optimized for AI models, processing models at speeds 1000 times faster than traditional general-purpose computers (CPUs), which is important because most AI bottlenecks in the Web2 space can be solved by "using GPUs"; most blockchains run on CPUs, so they currently lack available solutions, and this article explains why.
3. Memory of Smart Contracts
The memory of smart contracts includes storage and heap memory, both of which are important for running AI models, and both are limiting factors today.
4. My narrow definition of AI
I admit my definition of AI is narrow: I focus on deploying models in smart contracts; I do not refer to the broader AI ecosystem, for example, I have not written about tokenizers or vector databases, which are key to RAG and the broader AI ecosystem (in fact, many have already found ways to host vector databases in smart contracts), so yes, my goal is narrow: to host AI models on smart contracts.
![图片](https://public.bnbstatic.com/image/pgc/202411/90e0f39687cfdbd88b2c61f4091df046.png)
Necessary factors for AI on-chain
Introduction
Three necessary factors for AI to be hosted on smart contracts:
Memory - Models require a large amount of computer memory, while blockchain memory is less than centralized cloud memory.
Computation - Models require significant computation (e.g., thinking/speed/processing), and blockchain has less than centralized cloud models.
Hardware - Most centralized providers improve AI performance by investing in more hardware, while blockchain finds it more challenging to do so; in fact, many protocols are not designed to scale through hardware investment.
![图片](https://public.bnbstatic.com/image/pgc/202411/e4489ec59b87718f94c7ff5b1a123230.png)
1. Memory
What AI models need
The memory requirements for AI inference of different AI models can vary significantly; for example, small machine learning (ML) models might only need a few megabytes (MB), while large language models (LLMs) may require several thousand megabytes (GB) of memory.
Today’s World
I want to give readers a useful overview, but I deliberately do not provide tables or charts comparing different blockchains. From my experience, this can lead to two things:
In the best case, some honest errors, such as, "Hey, Diego, you miscalculated! Our smart contract platform executes 600 instructions per second, not 550."
In the worst case, it will trigger blockchain tribalism, causing the rest to be ignored.
Thus, I will write articles about AI needs, Ethereum (general-purpose language), and ICP (the blockchain I am very familiar with), and I encourage readers to undertake their analyses for other chains!
Ethereum Smart Contracts
The stack memory of Ethereum smart contracts is measured in KB, which means Ethereum cannot support most AI models that I know of; there may be some AI models measured in KB, but simply put: Ethereum smart contracts cannot support the majority of AI models that people refer to.
ICP Smart Contracts
ICP smart contracts have 400 GB of stable memory (e.g., storage) and 4 GB of heap memory, which means ICP smart contracts can support many but not all AI models, more specifically, the models that ICP smart contracts can run:
① ICP smart contracts can run AI models similar to the ML model for image classification used in this demonstration, which only requires about 10 MB of memory, thus fully within the memory resources of ICP.
② ICP smart contracts can host LLM models, see community examples:
Llama 3 8b is running on-chain!
Llama.cpp on the Internet Computer
Models that ICP smart contracts currently cannot run: ICP smart contracts cannot yet run larger versions of Llama, such as the 70B parameter model.
Currently, ICP smart contracts provide 4 GB of heap memory and will soon have more memory, so this is already very close to normal service.
Rule of Thumb #1
Whenever someone says, "X is on-chain AI," you should ask, "How much memory can the smart contracts on X have?"
If the answer is...
In Kb, it cannot host any real AI models;
In MB, it can host small models (and there are many small models), but cannot host LLMs;
In GB, it can accommodate some smaller LLMs;
In tens of GB, the host can accommodate more, but not the main LLMs;
In hundreds of GB, it can support almost all LLMs.
For ICP, most AI models can be hosted on-chain (with some restructuring of the models), but the issue lies in how long users are willing to wait for answers, which leads to the next question: computation.
![图片](https://public.bnbstatic.com/image/pgc/202411/7503472dda030a9227e077fa6b84435e.png)
2. Computation
What AI models need
The computational power required for AI inference is measured in floating-point operations per second (FLOPS); the complexity and size of AI models can vary greatly, affecting the computational power required. However, in the context of blockchain protocols, it makes more sense to use a more general term of operations per second, so we will use this term, as it often falls within the same order of magnitude in practice.
Smaller models may only require several billion operations per second, while large language models (LLMs) and other advanced AI models may require more operations; for example, the quantized (basically optimized for size) Llama3 7B model may require hundreds of billions of operations to perform inference (answering user prompts).
From the user’s perspective
From the user's perspective, the time needed for LLM responses varies from a few seconds to hours, days, weeks, or months, depending on the amount of computational resources the smart contract possesses.
Today’s World
Ethereum Smart Contracts
Ethereum smart contracts mainly rely on the EVM, which is not optimized for high-performance computing tasks. More accurately, the computational load of ETH smart contracts is significantly lower than the gigaFLOPS required by most AI models.
DFINITY estimates that, based on the block gas limit, the maximum number of instructions per second is about 5 million per second, so Ethereum cannot provide the computational power needed to run complex AI models (especially large language models LLM).
ICP Smart Contracts
ICP smart contracts have better computational resources, capable of executing 2 billion operations per second. Notably, unlike Ethereum, which only handles integer operations, ICP smart contracts can also handle floating-point operations as well as integer operations.
Models that ICP smart contracts can run: ICP can run AI models that require up to billions of operations per second and perform inference within the time expected by users (a few seconds or less). This includes many smaller models, such as the image classification model used in this demonstration, which can run efficiently with only a few billion operations per second.
Models that ICP smart contracts cannot run as quickly as users expect: a quantized Llama3 7B model requires several hundred billion inferences (answering user prompts), while ICP smart contracts can support 2 billion operations/second, theoretically requiring several seconds to minutes to execute one inference request, i.e., to answer one prompt.
Coming soon: The DFINITY research team is exploring ways to enhance the computational capabilities of ICP smart contracts, with potential improvements including the integration of dedicated hardware or optimizing the execution environment to handle higher per-second operational demands.
Rule of Thumb #2
Whenever someone says, "X is on-chain AI," you should ask, "How much computational power can the smart contracts on X provide?"
If the answer is...
Measured in operations performed in a few million seconds or less, AI inference takes a long time, to the point that users may think it doesn't work at all.
Measured in hundreds of millions of operations per second, very small models can perform inference in a few minutes.
Measured in billions, smaller LLMs can perform inference in a few minutes, or much more slowly than users expect.
Measured in tens of billions, LLM inference might be the expectation of modern users for LLMs.
Measured in trillions of operations per second, it can support nearly all AI models, including state-of-the-art LLMs, and provide an excellent user experience.
![图片](https://public.bnbstatic.com/image/pgc/202411/16ec50f1b17ed838e0d1d3e9f6b1d34a.png)
3. Hardware Issues (Hint: This is about determinism)
In the Web2 world, increasing computational resources for models typically means using GPUs, as GPUs are faster, which is why there is strong global demand for GPUs.
Why can't blockchains just use GPUs?
Technical reasons: Since GPUs are inherently designed for multi-threading, there’s no guarantee that all operations are deterministic, whereas blockchains require deterministic computation to reach consensus. In practice, there are ways to make GPUs act deterministically, but it requires careful consideration and configuration. First, let me explain the importance of determinism.
A simpler explanation: the way blockchains work is by having multiple computers perform the same computations and then reach consensus on the results using consensus protocols. Blockchains have a security threshold, usually between 25% and 49%, which determines how many faulty or dishonest nodes they can tolerate while reaching consensus. However, when using GPUs, even if all nodes use the same model, even honest nodes may return different answers for LLMs, which poses problems for consensus protocols.
Example: Imagine a blockchain with three computers, each running an LLM smart contract, and a user asks, "What is an LLM?"
Computer 1: "LLMs, or large language models, are advanced AI models designed to understand and generate human language, typically with a large number of parameters and trained on a vast amount of text data."
Computer 2: "LLMs, or large language models, are powerful AI systems trained on extensive text that can perform tasks such as understanding, generating, and translating human language."
Computer 3: "LLMs, or large language models, are AI models that excel at processing and generating human language through extensive training on large datasets."
Even if all three computers are honest and use the same model, they may return different answers; this non-determinism can arise for various reasons and is problematic, as the consensus protocol cannot determine which answer is correct, contrasting sharply with simpler, deterministic computations such as "1 + 1," where all computers agree on "2."
Given the above, I should add some details; even with a model temperature set to 0, non-determinism can arise. The tricky part is that the non-determinism comes from the GPU, not the model itself. The really tricky part is that if the temperature is 0, the GPU will often return the same answer in most cases, giving a false sense of security. However, this determinism cannot be guaranteed, and if it cannot be guaranteed, it can lead to a lack of consensus on the blockchain.
To give a fictional but concrete number: if GPUs are deterministic 99.99% of the time, it means that out of 10,000 prompts, one might return a different answer. Imagine if one block out of 10,000 blocks on a blockchain cannot reach consensus... Most blockchains would fail to reach consensus, which is dangerous for consensus.
Key Points
Blockchains rely on replicating computations and reaching consensus on results;
GPUs introduce non-determinism, making it difficult for blockchains to reach consensus;
Therefore, current blockchains cannot utilize GPUs in the same way as Web2 systems.
Possible solutions
This is a new challenge, but several potential solutions are being explored (as of the writing of this article, they are not fully resolved):
Achieving determinism with GPUs: Developing methods to make GPU computations deterministic is possible, though a bit tricky, and has not yet been widely adopted.
Modifying consensus protocols: Adjusting consensus mechanisms to handle non-determinism requires serious protocol work.
Accepting non-determinism and using zero-knowledge proofs: Running LLMs on a single machine without replication, this method is several orders of magnitude slower than using CPUs or GPUs; it is theoretically feasible but challenging to implement and remains an open question.
The entire AI and blockchain ecosystem (including DFINITY) is actively exploring and researching these three methods to determine the best solutions.
Rule of Thumb #3
If someone claims, "My blockchain runs on GPUs," then one of the following statements is true:
They run GPUs deterministically or apply approximate consensus mechanisms;
Their blockchains lack strong consensus protocols (and are not secure);
They are not being truthful.
![图片](https://public.bnbstatic.com/image/pgc/202411/3cd8e74a8288f516bfc666295977393b.png)
Conclusion
On-chain AI has not been fully realized; although some promising progress has been made in integrating AI inference, there remain significant gaps in memory, computational power, and consensus mechanisms. These challenges are not insurmountable, but they require focused research, development, and innovation. By understanding and addressing these barriers, the dream of combining the power of AI with the security and decentralization of blockchain can become a reality.
Hope this helps everyone!
![图片](https://public.bnbstatic.com/image/pgc/202411/9431cf76970400d33ab9e408d4e94b24.png)
IC content you care about
Technical progress | Project information | Global events
![](https://public.bnbstatic.com/image/pgc/202411/6288ba59e67544b21cd18d3864762275.jpg)
Follow the IC Binance channel
Stay updated