Original title: Exploring circle STARKs

Written by: Vitalik Buterin, Co-founder of Ethereum

Compiled by: Chris, Techub News

The premise for understanding this article is that you already know the basic principles of SNARKs and STARKs. If you are not familiar with them, it is recommended that you read the first few parts of this article to understand the basics.

In recent years, the trend in STARKs protocol design has been to move towards using smaller fields. The earliest STARKs production implementations used 256-bit fields, which is modular operations on large numbers (such as 21888...95617), which makes these protocols compatible with elliptic curve-based signatures. However, this design is inefficient. In general, there is no practical use for processing these large numbers, and a lot of computing power is wasted. For example, processing X (representing a number) and processing four times X only takes one-ninth of the computing time to process X. To solve this problem, STARKs began to move towards using smaller fields: first Goldilocks, then Mersenne31 and BabyBear.

This shift has resulted in faster proofs, with Starkware able to prove 620,000 Poseidon2 hashes per second on a single M3 laptop, for example. This means that the hard problems of developing an efficient ZK-EVM can be solved, as long as we are willing to trust Poseidon2 as the hash function. So how do these techniques work? How are these proofs built on smaller fields? How do these protocols compare to solutions like Binius? This post will explore these details, with a particular focus on a scheme called Circle STARKs (implemented in Starkware’s stwo, Polygon’s plonky3, and my own version in Python), which has the unique property of being compatible with Mersenne31 fields.

Common problems when using small math fields

A very important trick when creating hash-based proofs (or any kind of proof) is to be able to indirectly verify the properties of a polynomial by proving the result of evaluating the polynomial at a random point. This approach can greatly simplify the proof process, because evaluating at a random point is usually much easier than processing the entire polynomial.

For example, suppose a proof system requires you to generate a commitment to a polynomial, A, that A^3 (x) + x - A (\omega*x) = x^N (a very common type of proof in ZK-SNARK protocols). The protocol can require you to choose a random coordinate ? and prove that A (r) + r - A (\omega*r) = r^N. Then inverting A (r) = c, you have proved that Q = \frac {A - c}{X - r} is a polynomial.

If you know the details or internal mechanisms of the protocol in advance, you may find ways to bypass or crack these protocols. The specific operations or methods to achieve this may be mentioned next. For example, in order to satisfy the A (\omega * r) equation, you can set A (r) to 0 and then let A be a straight line passing through the two points.

To prevent these attacks, we need to choose r after the attacker provides A (Fiat-Shamir is a technique used to enhance the security of protocols by setting certain parameters to some kind of hash value to avoid attacks. The parameters selected need to come from a large enough set to ensure that the attacker cannot predict or guess these parameters, thereby improving the security of the system.

In elliptic curve based protocols and 2019 era STARKs, this problem is easy: all our math is done on 256 bit numbers, so we can just choose r to be a random 256 bit number and be fine. However, in STARKs over smaller fields, we run into a problem: there are only about 2 billion possible values ​​of r to choose from, so an attacker who wants to forge a proof only needs to try 2 billion times, which is a lot of work, but still completely doable for a determined attacker!

There are two solutions to this problem:

  • Conduct multiple random checks

  • no

The simplest and most effective approach is to perform multiple random checks: instead of checking at one coordinate, repeat the check at four random coordinates. This is theoretically possible, but there are efficiency issues. If you are dealing with a polynomial with a degree less than a certain value and operating on a field of a certain size, an attacker can actually construct a malicious polynomial that looks normal at these positions. Therefore, whether they can successfully break the protocol is a probabilistic problem, so the number of rounds of checks needs to be increased to enhance the overall security and ensure that it can effectively defend against attackers.

This leads to another solution: extended fields, which are similar to complex numbers, but based on finite fields. We introduce a new value, denoted by α\alphaα, and declare that it satisfies certain relations, such as α2=some value\alpha^2 = \text {some value}α2=some value. In this way, we create a new mathematical structure that enables more complex operations on finite fields. In such an extended field, the computation of multiplication becomes a multiplication with a new value α\alphaα. We can now operate on pairs of values ​​in the extended field, not just single numbers. For example, if we are working on a field like Mersenne or BabyBear, such an extension allows us to have more choices of values, thus increasing security. To further increase the size of the field, we can apply the same technique repeatedly. Since we are already using α\alphaα, we need to define a new value, which in Mersenne31 is manifested as choosing α\alphaα such that α2=some value\alpha^2 = \text {some value}α2=some value.

Code (you can improve it with Karatsuba)

By extending the domain, we now have enough values ​​to choose from to meet our security requirements. If we are dealing with polynomials of degree less than ddd, we can provide 104 bits of security per round. This means we have enough security. If we need to reach a higher security level, such as 128 bits, we can add some additional computational work to the protocol to enhance security.

The extended fields are only actually used in the FRI (Fast Reed-Solomon Interactive) protocol and other scenarios where random linear combinations are needed. Most of the math is still done on the base fields, which are usually modulo ppp or qqq. Also, almost all of the data hashed is done on the base fields, so only four bytes are needed per value. This allows the efficiency of small fields to be exploited while allowing larger fields to be used when security enhancements are needed.

Regular FRI

When building a SNARK or STARK, the first step is usually arithmetization. This is the transformation of an arbitrary computational problem into an equation where some of the variables and coefficients are polynomials. Specifically, this equation will usually look like P (X,Y,Z)=0P (X, Y, Z) = 0P (X,Y,Z)=0, where P is a polynomial, X, Y, and Z are given parameters, and the solver needs to provide the values ​​of X and Y. Once you have such an equation, the solution to the equation corresponds to the solution to the underlying computational problem.

To prove that you have a solution, you need to show that the values ​​you come up with are indeed reasonable polynomials (not fractions, or look like a polynomial in some places and a different polynomial in others, etc.), and that these polynomials have a certain maximum degree. To do this, we use a trick called random linear combinations, applied step by step:

  • Suppose you have an evaluation of a polynomial A and you want to prove that its degree is less than 2^{20}

  • Consider the polynomial B (x^2) = A (x) + A (-x), C (x^2) = \frac {A (x) - A (-x)}{x}

  • D is a random linear combination of B and C, that is, D=B+rCD = B + rCD=B+rC, where r is a random coefficient.

Essentially, what happens is that B isolates the even coefficients of A, and ? isolates the odd coefficients. Given B and C, you can recover the original polynomial A: A(x) = B(x^2) + xC(x^2). If the degree of A is indeed less than 2^{20}, then the degrees of B and C will be the degree of A and the degree of A minus 1, respectively. This is because the combination of even and odd terms does not increase the degree. Since D is a random linear combination of B and C, the degree of D should also be the degree of A, which allows you to verify that the degree of A meets the requirements by the degree of D.

First, FRI simplifies the verification process by reducing the problem of proving that a polynomial has degree d to the problem of proving that a polynomial has degree d/2. This process can be repeated multiple times, each time reducing the problem by half.

FRI works by repeating this simplification process. For example, if you start by proving a polynomial of degree d, through a series of steps you will eventually prove a polynomial of degree d/2. After each simplification, the degree of the resulting polynomial decreases. If the output of a stage is not the expected polynomial degree, then the check will fail for that round. If someone tries to push a polynomial that is not of degree d through this technique, then in the second round the output will have a certain probability of not being the expected degree, and in the third round it will be more likely to be not the expected degree, and the final check will fail. This design can effectively detect and reject non-compliant inputs. If a dataset is equal to a polynomial of degree d in most positions, this dataset has the potential to pass FRI verification. However, in order to construct such a dataset, the attacker needs to know the actual polynomial, so even a slightly flawed proof shows that the prover was able to generate a "real" proof.

Let's look at what's going on here in a little more detail, and the properties needed to make this all work. At each step, we cut the degree of the polynomial in half, and we also cut the point set (the set of points we care about) in half. The former is key to making the FRI (Fast Reed-Solomon Interactive) protocol work. The latter makes the algorithm run extremely fast: since each round is half the size of the previous one, the total computational cost is O (N) instead of O (N*log (N)).

To achieve this gradual reduction of the domain, a two-to-one mapping is used, where each point is mapped to one of two points. This mapping allows us to reduce the size of the dataset by half. An important advantage of this two-to-one mapping is that it is repeatable. That is, when we apply this mapping, the resulting set still retains the same properties. Suppose we start with a multiplicative subgroup. This subgroup is a set S where every element x has its multiple 2x in the set. If we square the set S (i.e. map each element x in the set to x^2), the new set S^2 will also retain the same properties. This operation allows us to continue reducing the size of the dataset until we are left with only one value. Although in theory we can reduce the dataset to only one value, in practice we usually stop before reaching a smaller set.

You can think of this operation as stretching a line (e.g., a line segment or arc) on the circumference of a circle, so that it rotates around the circumference twice. For example, if a point is at x degrees on the circumference, then after this operation, it moves to 2x degrees. For every point on the circumference from 0 to 179 degrees, there is a corresponding point at 180 to 359 degrees, and these two points coincide. This means that when you map a point from x degrees to 2x degrees, it coincides with a point at x+180 degrees. This process is repeatable. That is, you can apply this mapping operation multiple times, each time moving the points on the circumference to their new positions.

For the mapping technique to be effective, the size of the original multiplicative subgroup needs to have a large power of 2 as a factor. BabyBear is a system with a modulus such that its largest multiplicative subgroup contains all non-zero values, so the subgroup size is 2k−1 (where k is the number of bits in the modulus). A subgroup of this size is well suited for the above technique because it allows the degree of the polynomial to be efficiently reduced by repeatedly applying the mapping operation. In BabyBear, you can choose a subgroup of size 2^k (or just use the entire set), then apply the FRI method to reduce the degree of the polynomial step by step to 15, and check the degree of the polynomial at the end. This method exploits the properties of the modulus and the size of the multiplicative subgroup, making the computation very efficient. Mersenne31 is another system with a modulus such that the size of the multiplicative subgroup is 2^31 - 1. In this case, the size of the multiplicative subgroup has only one power of 2 as a factor, which allows it to be divided by 2 only once. Subsequent processing is no longer amenable to the above techniques, i.e., it is not possible to use FFT for efficient polynomial degree reduction as in BabyBear.

Mersenne31 fields are well suited for arithmetic operations on existing 32-bit CPU/GPU operations. Because of its modulus properties (such as 2^{31} - 1), many operations can be completed using efficient bit operations: if the result of adding two numbers exceeds the modulus, the result can be shifted by the modulus operation to reduce it to an appropriate range. The displacement operation is a very efficient operation. In multiplication operations, special CPU instructions (often called high-order shift instructions) are used to process the result. These instructions can efficiently calculate the high-order part of the multiplication, thereby improving the efficiency of the operation. In the Mersenne31 domain, arithmetic operations are approximately 1.3 times faster than in the BabyBear domain due to the properties described above. Mersenne31 provides greater computational efficiency. If FRI can be implemented in the Mersenne31 domain, this will significantly improve computational efficiency and make FRI applications more efficient.

Circle FRI

This is the cleverness of Circle STARKs. Given a prime number p, you can find a group of size p that has a similar two-to-one property. This group consists of all the points that satisfy certain conditions, for example, the set of points where x^2 mod p equals a certain value.

These points follow an addition rule that may look familiar to you if you’ve done any trigonometry or complex number multiplication recently.

(x_1, y_1) + (x_2, y_2) = (x_1x_2 - y_1y_2, x_1y_2 + x_2y_1)

The double form is:

2 * (x, y) = (2x^2 - 1, 2xy)

Now, let's focus on the odd-numbered points on this circle:

First, we converge all points to a straight line. The equivalent form of the B(x²) and C(x²) formulas we use in regular FRI is:

f_0 (x) = \frac {F (x,y) + F (x,-y)}{2}

We can then perform random linear combinations to obtain a one-dimensional polynomial P(x) defined over a subset of the x lines:

Starting from the second round, the mapping has changed:

f_0 (2x^2-1) = \frac {F (x) + F (-x)}{2}

This mapping actually reduces the size of the above set by half each time. What happens here is that each x represents two points in a sense: (x, y) and (x, -y). And (x → 2x^2 - 1) is the point doubling rule above. Therefore, we convert the x coordinates of two relative points on the circle to the x coordinates of the doubled point.

For example, if we take the second value from the right on the circle, 2, and apply the mapping 2 → 2 (2^2) - 1 = 7, we get 7. If we go back to the original circle, (2, 11) is the third point from the right, so after doubling it, we get the sixth point from the right, which is (7, 13).

This could have been done in two dimensions, but operating in one dimension makes the process more efficient.

Circle FFTs

An algorithm closely related to FRI is the FFT, which converts n evaluations of a polynomial of degree less than n into n coefficients of the polynomial. The process of the FFT is similar to that of the FRI, except that instead of generating random linear combinations f_0 and f_1 at each step, the FFT recursively performs a half-sized FFT on them and takes the output of the FFT (f_0) as the even coefficients and the output of the FFT (f_1) as the odd coefficients.

The Circle group also supports FFTs, which are constructed in a similar way to the FRI. However, a key difference is that the objects that the Circle FFT (and Circle FRI) operate on are not strictly polynomials. Instead, they are what is mathematically called a Riemann-Roch space: in this case, the polynomials are modulo the circle ( x^2 + y^2 - 1 = 0 ). That is, we treat any multiple of x^2 + y^2 - 1 as zero. Another way to think of it is that we only allow one power of y: whenever a y^2 term appears, we replace it with 1 - x^2 .

This also means that the coefficients of the Circle FFT output are not monomials like regular FRI (for example, if the regular FRI output is [6, 2, 8, 3], then we know that this means P (x) = 3x^3 + 8x^2 + 2x + 6 ). Instead, the coefficients of the Circle FFT are a basis specific to the Circle FFT: {1, y, x, xy, 2x^2 - 1, 2x^2y - y, 2x^3 - x, 2x^3y - xy, 8x^4 - 8x^2 + 1...}

The good news is that as a developer you can almost completely ignore this. STARKs never require you to know the coefficients. Instead, you just store the polynomial as a set of evaluations over a particular domain. The only place you need to use the FFT is to do a (similar) low-degree expansion of the Riemann-Roch space: given N values, generate k*N values ​​that are all over the same polynomial. In this case, you can use the FFT to generate the coefficients, append (k-1)n zeros to them, and then use the inverse FFT to get a larger set of evaluations.

Circle FFTs are not the only special type of FFT. Elliptic curveFFTs are more powerful because they work over any finite field (prime field, binary field, etc.). However, the ECFFT is more complicated and less efficient, so since we can use the Circle FFT on p = 2^{31}-1, we choose to use that.

From here we’ll dive into some of the more obscure details of how Circle STARKs are implemented differently than regular STARKs.

Quotienting

A common operation in the STARK protocol is to calculate the quotient of specific points, which can be deliberately or randomly selected. For example, if you want to prove that P (x) = y, you can do it as follows:

Compute the quotient: Given a polynomial P(x) and a constant y, compute the quotient Q ={P - y}/{X - x}, where X is a point of your choice.

Prove the polynomial: Prove that Q is a polynomial, not a fractional value. In this way, we prove that P (x) = y.

In addition, in the DEEP-FRI protocol, the random selection of evaluation points is to reduce the number of Merkle branches, thereby improving the security and efficiency of the FRI protocol.

When dealing with circle group STARK protocols, since there is no linear function that can pass through a single point, different techniques need to be used to replace the traditional quotient operation method. This usually requires designing new algorithms using the unique geometric properties of circle groups.

In a circle group, you can construct a tangent function that is tangent to a point (P_x, P_y), but the function will have multiplicity 2 through that point, meaning that for a polynomial to be a multiple of the linear function, it must satisfy a stricter condition than just being zero at that point. Therefore, you can't prove the evaluation result at just one point. So how do we deal with this?

We had to take up this challenge and prove it by evaluating it at two points, thus adding a dummy point that we don't need to care about.

Line function: ax + by + c. If you turn this into an equation, forcing it to equal 0, then you might recognize it as a line in what's called standard form in high school math.

If we have a polynomial P that equals y1 at x_1 and y2 at x_2, we can choose an interpolation function L that equals y1 at x_1 and y2 at x_2. This can be simply expressed as L = \frac {y_2 - y_1}{x_2 - x_1} \cdot (x - x_1) + y_1.

We then show that P equals y1 at x1 and y2 at x2 by subtracting L (so that P - L is zero at both points) and then showing that the quotient Q is a polynomial by dividing L (which is a linear function between x2 - x1).

Vanishing polynomials

In a STARK, the polynomial equation you are trying to prove usually looks like C (P (x), P ({next}(x))) = Z (x) ⋅ H (x) , where Z (x) is a polynomial that is equal to zero over the entire original evaluation domain. In a regular STARK, this function is simply x^n - 1 . In a circular STARK, the corresponding function is:

Z_1 (x,y) = y

Z_2 (x,y) = x

Z_{n+1}(x,y) = (2 * Z_n (x,y)^2) - 1

Note that you can derive the vanishing polynomial from the folding function: in a regular STARK, you repeatedly use x → x^2, while in a circular STARK, you repeatedly use x → 2x^2 - 1. However, you do it differently for the first round, because the first round is special.

Reverse bit order

In STARKs, polynomial evaluations are typically not arranged in a “natural” order (e.g., P(1),P(ω),P(ω2),…,P(ωn−1), but in what I call reverse bit order:

P (\omega^{\frac {3n}{8}})

If we set n=16, and focus only on which powers of ω we evaluate, the list looks like this:

{0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15}

This ordering has a key property that values ​​that were grouped together early in the FRI evaluation process end up next to each other in the ordering. For example, the first step of FRI groups x and -x. In the case of n = 16, ω^8 = -1, which means that P (ω^i) ) and ( P (-ω^i) = P (ω^{i+8}) \. As we can see, these are exactly the pairs that are next to each other. The second step of FRI groups P (ω^i) , P (ω^{i+4}) , P (ω^{i+8}) , and P (ω^{i+12}) . These are exactly the groups of four we saw, and so on. Doing this makes FRI more space-efficient because it allows you to provide a Merkle proof for the values ​​that are folded together (or for all 2^k values ​​if you fold k rounds at a time).

In circle STARKs, the folding structure is slightly different: in the first step, we pair (x, y) with (x, -y); in the second step, we pair x with -x; in the subsequent step, we pair p with q, choosing p and q such that Q^i (p) = -Q^i (q), where Q^i is the mapping x → 2x^2-1 i times. If we think of the points in terms of their positions on the circle, then at each step this looks like the first point is paired with the last point, the second point is paired with the second-to-last point, and so on.

To adjust the reverse bit order to reflect this folded structure, we need to flip every bit except the last. We keep the last bit and use it to decide whether to flip the other bits.

A folded reverse bit sequence of size 16 is as follows:

{0, 15, 8, 7, 4, 11, 12, 3, 2, 13, 10, 5, 6, 9, 14, 1}

If you look at the circle in the previous section, you'll see that the points 0, 15, 8, and 7 (going counterclockwise from the right) are of the form (x, y), (x, -y), (-x, -y), and (-x, y), which is exactly what we need.

efficiency

In Circle STARKs (and 31-bit prime STARKs in general), these protocols are very efficient. A computation proven in a Circle STARK typically involves several types of computation:

1. Native arithmetic: used for business logic, such as counting.

2. Raw arithmetic: used in cryptography, for example hash functions like Poseidon.

3. Lookup parameters: A general and efficient calculation method that implements various calculations by reading values ​​from a table.

The key measure of efficiency is whether you are fully utilizing the entire space in the computation trace to do useful work, or whether you are leaving a lot of free space. In SNARKs with large fields, there tends to be a lot of free space: business logic and lookup tables often involve computations on small numbers (which tend to be less than N, where N is the total number of computation steps, so in practice less than 2^{25}), but you still pay the cost of using a field of size 2^{256} bits. Here, the field size is 2^{31}, so the wasted space is not that large. Low-complexity hashes designed for SNARKs (such as Poseidon) fully utilize every bit of every number in the trace in any field.

Binius is a better solution than Circle STARKs because it allows you to mix fields of different sizes, allowing for more efficient bit packing. Binius also provides the option to do 32-bit additions without the overhead of a lookup table. However, these advantages come at the cost (in my opinion) of more conceptual complexity, whereas Circle STARKs (and regular STARKs based on BabyBear) are much simpler conceptually.

Conclusion: My thoughts on Circle STARKs

Circle STARKs are no more complex than STARKs for developers. In implementation, the three issues mentioned above are basically the differences from regular FRI. Although the math behind the polynomials that Circle FRI operates on is very complex and takes some time to understand and appreciate, this complexity is actually hidden so that developers will not feel it directly.

Understanding Circle FRI and Circle FFTs can also be a gateway to understanding other special FFTs: most notably binary domain FFTs such as those used by Binius and the previous LibSTARK, but also more complex constructions such as elliptic curve FFTs, which use a few-to-many mapping and combine well with elliptic curve point operations.

Combined with Mersenne31, BabyBear, and binary domain technologies like Binius, we do feel like we are approaching the efficiency limits of the STARKs base layer. At this point, I expect that the optimization directions for STARKs will focus on the following:

Maximize efficiency of hash functions and signatures: Optimize basic cryptographic primitives such as hash functions and digital signatures to the most efficient form so that they can achieve the best performance when used in STARK proofs. This means that these primitives should be optimized specifically to reduce the amount of calculation and improve efficiency.

Recursive construction to enable more parallelization: Recursive construction refers to recursively applying the STARK proof process to different levels to increase parallel processing capabilities. In this way, computing resources can be used more efficiently and the proof process can be accelerated.

Arithmetically process the virtual machine to improve the developer experience: Arithmetically process the virtual machine to make it more efficient and simpler for developers to create and run computing tasks. This includes optimizing the virtual machine for use in STARK proofs and improving the developer experience when building and testing smart contracts or other computing tasks.