Author: Nickqiao & Wuyue, Geekweb3

In April this year, Vitalik visited the Hong Kong Blockchain Summit and delivered a speech titled "Reaching the Limits of Protocol Design", in which he once again mentioned the potential of ZK-SNARKs in the Ethereum Danksharding roadmap and looked forward to the great help of ASIC chips in ZK acceleration.

Previously, Scroll co-founder Zhang Ye also pointed out that the application space of ZK in traditional fields may be larger than that in Web3. There is a huge demand for ZK in trusted computing, databases, verifiable hardware, content anti-counterfeiting and zkML. If ZK proves that real-time generation can be implemented, both Web3 and traditional industries are expected to usher in a paradigm-level change. However, from the perspective of efficiency and economic cost, it is still a long way to go for ZK to be adopted on a large scale.

In fact, as early as 2022, top venture capital firms a16z and Paradigm publicly published reports that clearly expressed their emphasis on ZK hardware acceleration. Paradigm even asserted that in the future, ZK miners' income may be comparable to that of Bitcoin or Ethereum miners, and hardware acceleration solutions based on GPU, FPGA, and ASIC will have huge market space. Since then, with the popularity of mainstream ZK Rollups such as Scroll and Starknet, hardware acceleration has become a hot concept sought after by the market, and this popularity has become more intense as projects such as Cysic are about to go online.

We have reason to believe that based on the huge demand space for ZK, the ZK mining pool and the SaaS model of real-time ZKP generation can open up a brand new industrial chain. In this new world with great potential, ZK hardware manufacturers with strong support and first-mover advantage are likely to become the next generation of Bitmain and dominate the fertile ground for hardware acceleration.

In the field of hardware acceleration, Cysic may be one of the most watched teams. The team has won important awards from the well-known ZKP technology competition platform ZPrize, and began to serve as a mentor for ZPrize in 2023. The ToB ZK mining pool and ToC ZK-Depin hardware included in its roadmap have attracted the attention of top VCs such as Polychain, ABCDE, OKX Ventures and Hashkey, and completed a large amount of financing totaling nearly US$20 million.

With the upcoming launch of the Cysic testnet at the end of July and the opening of its ZK mining pool, discussions about Cysic in various communities are becoming increasingly heated. This article aims to let more people understand the product principles and business model of Cysic and to provide a simple popularization of the principles of ZK hardware acceleration. In the following, we will briefly summarize the relevant knowledge of Cysic to help more people lower the threshold for understanding.

Understanding the ZK Proof System from the Workflow

The ZK proof system is actually very complex, but if you want to have a simple understanding of its general structure, you can break it down from the perspective of functions and workflows. For a system that ZKs ordinary computations, its core process can be summarized as follows:

First, we need to interact with the ZK system through the front end and submit the content to be proved. The front end will convert the content into a format that is easy to be processed by the ZK proof system. After that, the system will generate a ZK Proof through a specific proof system or framework (such as Halo2, Plonk, etc.). This process can be broken down into the following steps:

1. Problem setting: First, we need to determine what is to be proved. For example, the prover declares that he has some data, "I know a solution N to the equation F(x)=w", but he does not want others to see the value of N.

2. Arithmeticization and CSP: After the prover submits the content to be proved, the system will establish a special mathematical model/program to express the content to be proved equivalently, and then convert the format to facilitate processing by the system being proved. Specifically, the aforementioned statement "I know a solution N to the equation F(x)=w" will be converted from the original mathematical equation to the form of logic gate circuits and polynomials.

3. After that, the system will select a suitable proof system such as Halo, Plonk, etc., and compile the content generated in the previous steps into a usable ZKP program. The prover uses the ZKP program to generate a proof and submit it to the verifier for verification.

ZK systems such as zkEVM, which are frequently used in Ethereum's second layer, essentially compile smart contracts into EVM's underlying opcodes, then convert the format of each opcode into the form of logic gate circuits/polynomial constraints, and then hand it over to the back-end ZK proof system for further processing.

It is worth mentioning that the ZKP technical solution currently widely used in blockchain is mainly zk-SNARK (zero-knowledge succinct non-interactive argument of knowledge), while most ZK Rollups use the simplicity of SNARK rather than zero-knowledge. Simplicity means that ZKP takes up very little space, can compress a large amount of content into a few hundred bytes, and the verification cost is very low.

In this way, the workload between Prover and Verifier is asymmetric. The cost of Prover generating ZKP is very high, but the verification cost of Verifier is very low. As long as we make good use of this asymmetry and adopt ZK in the scenario of "single Prover, multiple Verifiers", we can concentrate the overall cost on the Prover side and greatly reduce the cost of Verifier. This model is extremely beneficial to decentralized verification, which is the idea of ​​Ethereum Layer 2.

However, this model of shifting the verification cost to the ZK generation end is not a silver bullet. For the ZK Rollup project, the high cost of generating ZKP will eventually be passed on to UX and transaction fees again, which is not conducive to the long-term development of ZK Rollup.

Even though ZK has great potential in trustless and decentralized verification scenarios, it is limited by the bottleneck of generation time. Neither zkEVM, zkVM, ZK Rollup, nor ZK Bridge currently have the economic basis for large-scale adoption.

In response to this, ZK acceleration projects represented by Cysic, Ingonyama, and Irreducible have emerged, trying to reduce the cost of generating ZKP from different directions. Below, we will briefly introduce the main costs and acceleration methods of ZKP generation from a technical perspective, as well as why Cysic has great potential in the ZK acceleration track.

Computational overhead: MSM and NTT

Many people know that it takes a lot of time for ZKP's Prover to generate proofs. In the ZK-SNARK protocol, it often happens that the Verifier only needs one second to verify the proof, but the proof may take half a day or even a day for the Prover to generate. In order to use ZKP proof calculations efficiently, it is necessary to convert the calculation format from classic programs to ZK-friendly.

There are currently two ways to do this: one is to write circuits using some proof system frameworks, such as Halo2; the other is to use a domain-specific language (DSL), such as Cairo or Circom, to convert the calculation into an intermediate expression for subsequent submission to the proof system. The proof system will generate a ZK proof based on the written circuit or the intermediate expression compiled by the DSL.

The more complex the program operation, the longer it takes to generate the proof. In addition, some operations are inherently ZK-unfriendly and require additional work to implement. For example, SHA or Keccak hash functions are ZKP-unfriendly and using them will result in longer proof generation times. Even operations that are cheap to execute on a classical computer may be ZKP-unfriendly.

Putting aside the computational tasks that are not ZK-friendly, although the ZK proof generation process may vary depending on the chosen proof system, the bottlenecks are essentially similar. In the generation of ZK proofs, there are two computational tasks that consume the most computing resources: MSM (Multi-Scalar Multiplication) and NTT (Number Theoretic Transform). These two computational tasks can account for 80-95% of the proof generation time, depending on the ZKP commitment scheme and specific implementation.

MSM mainly processes multi-scalar multiplication on elliptic curves, while NTT is FFT (Fast Fourier Transform) on finite fields, which is used to accelerate polynomial multiplication. Using different scheme combinations will bring different FFT/MSM load ratios.

Take Stark as an example, its PCS (Polynomial Commitment Scheme) uses FRI, a hash-based commitment, instead of elliptic curves like KZG or IPA, so there is no MSM calculation at all. The higher the table is, the more FFT operations are required, and the lower the table is, the more MSM operations are required.

Optimization

Since MSM operations involve predictable memory access, they can be massively parallelized, but they consume a lot of memory resources. In addition, MSM also has scalability challenges, and even if it is parallelized, it may be slow. Therefore, although MSM may be accelerated on hardware, they require huge memory and parallel computing resources.

NTTs often involve random memory access, which makes them hardware-unfriendly and difficult to handle on distributed infrastructure. This is because of the random access characteristics of NTTs. If they run in a distributed environment, they will inevitably have to access data from other nodes. Once network interaction is involved, performance will be greatly reduced.

Therefore, access to stored data and data movement become a major bottleneck, limiting NTT's ability to parallelize operations. Most of NTT's work on accelerating computing is focused on managing how computing interacts with storage.

In fact, the simplest way to solve the efficiency bottleneck of MSM and NTT is to completely eliminate these operations. Some newly proposed algorithms, such as Hyperplonk, modify Plonk to eliminate NTT operations. This makes Hyperplonk easier to accelerate, but introduces new bottlenecks; another example is the sumcheck protocol, which has a high computational cost. There is also the STARK algorithm, which does not require MSM, but its FRI protocol introduces a lot of hash calculations.

The ultimate goal of ZK hardware acceleration and Cysic

While optimization at the software and algorithm level is important and valuable, there are clear limitations. In order to fully optimize the efficiency of ZKP generation, hardware acceleration must be used, just as ASICs and GPUs eventually dominated the BTC and ETH mining markets.

So the question is: what is the best hardware to accelerate ZKP generation? There are many types of hardware that can achieve ZK acceleration, such as GPU, FPGA or ASIC, of ​​course, they all have their own advantages and disadvantages.

We can compare these types of hardware:

First, let's use a simple example to illustrate the difference between them at the development level. For example, now we want to implement a simple parallel multiplication:

  • On the GPU, using the API provided by the CUDA SDK, we can develop like writing native code, thus gaining the ability of parallel computing;

  • On FPGA, we need to relearn the hardware description language, which is used to control the connections at the hardware level to implement parallel algorithms;

  • On ASIC, the connection arrangement of transistors is fixed directly at the hardware level during the chip design phase and cannot be modified afterwards.

These solutions have their own advantages and disadvantages and are applicable to different development stages of the ZK track. Cysic is committed to becoming the ultimate solution for ZK hardware acceleration, and its step-by-step strategy is:

  1. Develop SDK based on GPU to provide solutions for ZK applications and integrate GPU resources across the network;

  2. Take advantage of FPGA's flexibility and balanced features to quickly implement customized ZK hardware acceleration.

  3. Independently developed ASIC-based ZK Depin hardware

  4. Cysic Network will integrate all the computing power of ZK Depin and GPU as a SAAS platform/mining pool to provide computing power and verification solutions for the entire ZK industry.

Let us now interpret multiple sub-tracks to fully understand the sub-division differences of ZK acceleration solutions and Cysic's development ideas.

ZK Mining Pool and SaaS Platform: Cysic Network

In fact, whether it is Scroll or Polygon zkEVM, well-known ZK Rollup, they have clearly proposed the concept of "decentralized Prover" in their roadmaps, which is actually to build a ZK mining pool. This market-oriented approach can reduce the burden on ZK Rollup project parties and encourage miners and mining pool operators to continuously optimize the ZK acceleration solution.

In Cysic's roadmap, a ZK mining pool and SaaS platform called Cysic Network has been clearly proposed. It will not only integrate Cysic's own computing power, but also absorb third-party computing power resources through mining incentives, including idle GPUs and zk DePIN devices in the hands of ordinary users.

The entire verification workflow diagram is as follows:

  1. The zk project team submits the proof generation task to the agent, whose job is to forward the proof task to the verification network. These agents will be officially operated by Cysic at the beginning, and asset pledge will be introduced later, allowing anyone to become an agent;

  2. Prover accepts proof tasks and uses hardware to generate ZK proofs. Provers need to pledge tokens to participate in the contracting of proof tasks and will receive rewards after completing the proof tasks.

  3. The validator committee is responsible for checking the validity of the proof generated by the Prover and voting. When a certain number of votes is reached, the proof will be considered valid. Validators join the committee by staking tokens, participate in voting and receive rewards. This process can be combined with the AVS concept of EigenLayer and reuse the existing Restaking facilities.

The detailed interaction process is as follows:

In fact, there is a point in the above process. Whether it is asset pledge or incentive distribution, as well as the submission of computing tasks, etc., actions need to rely on a certain exclusive platform, which requires blockchain as a dedicated facility.

To this end, Cysic Network has built a dedicated public chain and adopted a unique consensus algorithm called Proof of Compute (PoC). Its basic principle is to select block producers based on the VRF function and Prover's historical performance, such as device availability, number of proof submissions, Proof accuracy, etc. (Note: the blocks here should be used to record the information of each device and distribute Token incentives).

Of course, in addition to the ZK mining pool and SaaS platform, Cysic has also made a lot of arrangements for ZK acceleration solutions based on different hardware. Next, let’s take a look at its achievements in the three routes of GPU, FPGA and ASIC.

GPUs, FPGAs, and ASICs

The core of ZK hardware acceleration is to parallelize some key operations as much as possible. From the perspective of hardware functional characteristics, in order to achieve maximum flexibility and versatility, a large part of the CPU chip area is used to provide control functions and caches at all levels, which leads to its weak parallel computing capabilities.

In GPU, the proportion of chip area used for computing is greatly increased, which enables it to support large-scale parallel processing. Now GPU is very popular, such as Nvidia Cuda and other libraries can help developers take advantage of GPU parallelism without understanding the underlying hardware, and CUDA SDK can encapsulate CUDA ZK library to accelerate MSM and NTT operations.

FPGAs, on the other hand, are composed of an array of many small processing units. To program an FPGA, you need to use a specialized hardware description language, which is then compiled into a combination of transistor circuits. So FPGAs actually implement specific algorithms directly using transistor circuits without the need for instruction system compilation. This customization and flexibility is far superior to GPUs.

The current price of FPGA is only about one-third of that of GPU, and its energy efficiency can be more than ten times higher than that of GPU. This significant energy efficiency advantage is due in part to the fact that the GPU needs to be connected to a host device, which typically consumes a lot of power. It can be said that FPGA can add more computing modules to meet the needs of MSM and NTT without increasing energy consumption. This makes FPGAs particularly suitable for ZK proof scenarios that are computationally intensive and require high data throughput and low response time.

However, the biggest problem with FPGA is that few developers have programming experience. For the ZK project, it is extremely difficult to organize a team with both cryptography expertise and FPGA engineering expertise.

ASIC is equivalent to using hardware to implement a program. Once the design is completed, the hardware cannot be changed. Accordingly, the program that ASIC can execute cannot be changed and can only be used for specific tasks. ASIC also has the hardware acceleration advantages of FPGA in MSM and NTT mentioned above. And because it is a dedicated circuit design, ASIC has the highest performance and the lowest energy consumption among all solutions.

For the current mainstream ZK Circuit, Cysic hopes to achieve a proof time of 1 to 5 seconds. To achieve this goal, only ASIC can achieve it.

Although these advantages sound very attractive, ZK technology is developing rapidly, and the design and production cycle of ASIC usually takes 1-2 years and costs up to 10-20 million US dollars. Therefore, it is necessary to wait until ZK technology is stable enough before it can be put into large-scale production to avoid the chips produced becoming obsolete quickly.

In this regard, Cysic has made full arrangements in the three fields of GPU, FPGA and ASIC;

At the GPU acceleration level, with the emergence of various new ZK proof systems, Cysic has adapted them based on its self-developed CUDA acceleration SDK, and by gathering community resources, it has linked hundreds of thousands of top-level computing graphics cards in Cysic's GPU computing network. At the same time, Cysic CUDA SDK is 50%-80% faster or even higher than the latest open source framework.

On FPGA, Cysic has completed the implementation of the world's fastest MSM, NTT, Poseidon Merkle tree and other modules through self-developed solutions, covering the most important part of ZK computing, and the solution has been prototyped by multiple top ZK projects.

Cysic's self-developed SolarMSM can complete MSM calculations on a scale of 2^30 in 0.195 seconds, while SolarNTT can complete NTT calculations on a scale of 2^30 in 0.218 seconds, both of which have the highest performance among all currently publicly available FPGA hardware acceleration results.

In the ASIC field, although there is still some distance to the large-scale application of ZK ASIC, Cysic has already laid out this track in advance and launched its independently developed ZK DePIN chips and equipment.

In order to attract C-end users and meet the performance and cost requirements of different ZK project parties, Cysic will launch two ZK hardware products: ZK Air and ZK Pro.

ZK Air is similar in size to a power bank or laptop power bank. Ordinary users can directly connect it to a laptop, iPad or even a mobile phone through the Type-C interface to provide computing power support for specific ZK projects and receive rewards. Currently, ZK Air computing power still exceeds that of consumer-grade graphics cards and can accelerate small-scale ZK proof generation tasks.

ZK Pro is similar to a traditional mining machine. Its computing power reaches the effect of a GPU server connected by multiple top-level consumer-grade graphics cards. It can greatly accelerate the generation of ZK proofs and is suitable for large-scale ZK projects such as ZK-Rollup and ZKML (Zero knowledge machine learning).

Through these two devices, Cysic will eventually build a stable and reliable ZK-DePIN network. Currently, these two devices are still under development and are expected to be available in 2025.

In addition, through Cysic Network, C-end users can join the zk hardware acceleration market with a very low threshold. Coupled with the ZK project's huge demand for computing power, this may once again set off a wave of enthusiasm in the market like Bitcoin mining. The market size of the ZK computing field may once again usher in explosive growth.

reference

https://medium.com/amber-group/need-for-speed-zero-knowledge-1e29d4a82fcd

https://figmentcapital.medium.com/accelerating-zero-knowledge-proofs-cfc806de611b