Kolejny krok w DeAI: wnioskowanie w łańcuchu do rozpoznawania twarzy

IC · 2024-07-16T02:45:03.000Z

Zatwierdzony przez społeczność klon e4eeb3 w propozycji 13094 uzupełnia kamień milowy Cyklotronu w planie działania ICP. Celem tego kamienia milowego jest umożliwienie wnioskowania w łańcuchu modeli AI z milionami parametrów, co stanowi pierwszy krok w kierunku bardziej ambitnego celu, jakim jest szkolenie w łańcuchu i wnioskowanie na temat wielkoskalowych modeli AI. Jak wszyscy wiemy, obciążenia AI wymagają dużej mocy obliczeniowej, a wnioskowanie na modelach AI zawierających miliony parametrów wymaga miliardów operacji arytmetycznych, takich jak mnożenie i dodawanie, co oznacza, że aby wspierać wnioskowanie w łańcuchu, blockchain potrzebuje możliwości przetwarzać miliardy operacji na sekundę.

The forked version e4eeb3 approved by the community in proposal 13094 completes the Cyclotron milestone in the ICP roadmap.
The goal of this milestone is to enable on-chain inference for AI models with millions of parameters, which is the first step towards the larger goal of on-chain training and inference of large AI models.
As we all know, AI workloads are computationally intensive. Inference on AI models with millions of parameters requires billions of arithmetic operations, such as multiplication and addition. This means that in order to support on-chain reasoning, blockchains need the ability to process billions of operations per second.
The Cyclotron milestone increases ICP’s computing power by an order of magnitude (approximately 10x), making it the only blockchain with a working example of a smart contract performing facial recognition entirely on-chain, as well as other use cases such as image classification and running GPT2 (developed by DecideAI).
Check out Dominic Williams’ facial recognition demonstration video at the top of this article.
The foundation of on-chain AI computing
The virtual machine is part of the blockchain and is crucial for AI computing because it can execute the code of smart contracts. The functionality and performance of the virtual machine directly affects how much AI computing the smart contract can perform.
For example, EVM is Ethereum's virtual machine, which is tailor-made for DeFi smart contracts and lacks features such as floating-point operations required for AI calculations. In contrast, ICP uses WebAssembly as a virtual machine, which supports floating-point numbers and is designed from scratch to achieve near-native performance.
The idea of ​​the Cyclotron milestone is to squeeze as much floating point performance as possible out of the ICP virtual machine.
Optimization 1: Deterministic floating point operations
Most AI libraries and frameworks rely on floating-point operations, which in the context of ICP must be deterministic, meaning they should produce the same predictable results with the same input operands.
This deterministic property is important because ICP executes the same code on multiple nodes and then runs its consensus algorithm to establish the correct result. If floating point operations are uncertain, nodes may disagree, thus preventing the blockchain from progressing.
DFINITY engineers found a way to make deterministic floating point operations faster in a WebAssembly virtual machine implementation called Wasmtime, a low-level compiler optimization that produces faster code, an optimization that benefits not only ICP but also other platforms and blockchains that use Wasmtime.
Optimization 2: Single Instruction, Multiple Data (SIMD)
SIMD is a technology supported by all modern CPUs that allows the CPU to perform multiple arithmetic operations with a single instruction. For example, WebAssembly can perform four parallel floating-point additions with a single instruction, as shown in the following figure.
WebAssembly SIMD can also handle integers, for example, it can perform 16 parallel arithmetic operations on small 8-bit integers, which may increase performance by 4x to 16x depending on the type of numbers and operations.
Smart contracts running on ICP can now use deterministic SIMD instructions and benefit from parallel computing. Learn how to compile smart contracts with SIMD:
github.com/dfinity/examples/tree/master/rust/simd
Optimization 3: SIMD support in the AI ​​inference engine
The final piece of the Cyclotron puzzle is adding WebAssembly SIMD support to AI libraries, and DFINITY engineers contributed a WebAssembly SIMD implementation to the open source Sonos Tract inference engine.
The new code implements matrix multiplication and other numerical algorithms using SIMD instructions, similar to the first optimizations in Wasmtime, a contribution that benefits not only ICP but also the broader developer community.
result
Together, these optimizations speed up numerical microbenchmarks by 28x, and in end-to-end AI inference workloads, observed improvements range from 5x to 19x depending on the model, as shown in the figure below.
The source code for the smart contracts containing these AI models is available on GitHub, so anyone can reproduce and verify the results:
Image Classification: This is a MobileNet model that classifies an input image and returns the most likely label out of 1,000 known labels, reducing the number of Wasm instructions required to run a single inference from 24.7 billion to 3.7 billion.
Face Detection: This is an Ultraface model that finds bounding boxes of faces in input images, reducing the number of Wasm instructions required to run a single inference from 6.1 billion to 1.2 billion.
Face Recognition: This is a model that computes vector embeddings for input images of faces. The number of Wasm instructions required to run a single inference has been reduced from 77 billion to 9 billion. The execution limit on the mainnet is 40 billion instructions, which means that previously face recognition could not be run on the mainnet and could only be run locally in a patched copy.
GPT2: This is the GPT2 model that DecideAI converted into a smart contract using their rust-connect-py-ai-to-ic framework. The details of the benchmark are described on GitHub.
Benchmarks were run in dfx version 0.20.1 (Baseline) and version 0.22.0-beta.0 (Cyclotron).
in conclusion
The Cyclotron milestone brings AI computing performance on ICP close to native CPU performance by optimizing floating-point operations and enabling WebAssembly SIMD instructions. It supports on-chain AI inference for models with millions of parameters, such as image classification, face recognition, and GPT2.
This is the first step to running large AI models completely on-chain to solve the AI ​​trust problem. The next AI milestone in the ICP roadmap aims to go beyond CPU limitations. In order to perform AI reasoning and large model training on-chain, smart contracts need a way to run compute- and memory-intensive calculations on specialized hardware such as GPUs.
Stay tuned for Gyrotron milestones.
#AI模型 #DEAI🤖🤖🤖 #gpt4 $BTC $ETH $ICP 

IC content you care about
Technology Progress | Project Information | Global Activities
Collect and follow IC Binance Channel
Get the latest news

Explore More From Creator

Latest News