零知识证明的先进形式化验证：如何证明零知识内存

Odaily星球日报 · 2024-07-31T13:03:11.000Z

在关于零知识证明的先进形式化验证的系列博客中，我们已经讨论了如何验证 ZK 指令以及对两个 ZK 漏洞的深度剖析。正如在公开报告和代码库中所显示的，通过形式化验证每一条 zkWasm 指令，我们找到并修复了每一个漏洞，从而能够完全验证整个 zkWasm 电路的技术安全性和正确性。尽管我们已展示了验证一条 zkWasm 指令的过程，并介绍了相关的项目初步概念，但熟悉形式化验证读者可能更想了解 zkVM 与其他较小的 ZK 系统、或其他类型的字节码 VM 在验证上的独特之处。在本文中，我们将深入讨论在验证 zkWasm 内存子系统时所遇到的一些技术要点。内存是 zkVM 最为独特的部分，处理好这一点对所有其他 zkVM 的验证都至关重要。形式化验证：虚拟机（VM）对 ZK 虚拟机（zkVM）我们的最终目标是验证 zkWasm 的正确性，其与普通的字节码解释器（VM，例如以太坊节点所使用的 EVM 解释器）的正确性定理相似。亦即，解释器的每一执行步骤都与基于该语言操作语义的合法步骤相对应。如下图所示，如果字节码解释器的数据结构当前状态为 SL，且该状态在 Wasm 机器的高级规范中被标记为状态 SH，那么当解释器步进到状态 SL'时，必须存在一个对应的合法高级状态 SH'，且 Wasm 规范中规定了 SH 必须步进到 SH'。同样地，zkVM 也有一个类似的正确性定理：zkWasm 执行表中新的每一行都与一个基于该语言操作语义的合法步骤相对应。如下图所示，如果执行表中某行数据结构的当前状态是 SR，且该状态在 Wasm 机器的高级规范中表示为状态 SH，那么执行表的下一行状态 SR'必须对应一个高级状态 SH'，且 Wasm 规范中规定了 SH 必须步进到 SH'。由此可见，无论是在 VM 还是 zkVM 中，高级状态和 Wasm 步骤的规范是一致的，因此可以借鉴先前对编程语言解释器或编译器的验证经验。而 zkVM 验证的特殊之处在于其构成系统低级状态的数据结构类型。首先，如我们在之前的博客文章中所述，zk 证明器在本质上是对大素数取模的整数运算，而 Wasm 规范和普通解释器处理的是 32 位或 64 位整数。zkVM 实现的大部分内容都涉及到此，因此，在验证中也需要做相应的处理。然而，这是一个“本地局部”问题：因为需要处理算术运算，每行代码变得更复杂，但代码和证明的整体结构并没有改变。另一个主要的区别是如何处理动态大小的数据结构。在常规的字节码解释器中，内存、数据栈和调用栈都被实现为可变数据结构，同样的，Wasm 规范将内存表示为具有 get/set 方法的数据类型。例如，Geth 的 EVM 解释器有一个`Memory`数据类型，它被实现为表示物理内存的字节数组，并通过`Set 32 `和`GetPtr`方法写入和读取。为了实现一条内存存储指令，Geth 调用`Set 32 `来修改物理内存。 func opMstore(pc *uint 64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) { // pop value of the stack mStart, val := scope.Stack.pop(), scope.Stack.pop() scope.Memory.Set 32(mStart.Uint 64(), val) return nil, nil } 在上述解释器的正确性证明中，我们在对解释器中的具体内存和在规范中的抽象内存进行赋值之后，证明其高级状态和低级状态相互匹配，这相对来说是比较容易的。然而，对于 zkVM 而言，情况将变得更加复杂。 zkWasm 的内存表和内存抽象层在 zkVM 中，执行表上有用于固定大小数据的列（类似于 CPU 中的寄存器），但它不能用来处理动态大小的数据结构，这些数据结构要通过查找辅助表来实现。zkWasm 的执行表有一个 EID 列，该列的取值为 1、 2、 3 ……，并且有内存表和跳转表两个辅助表，分别用于表示内存数据和调用栈。以下是一个提款程序的实现示例： int balance, amount; void main () { balance = 100; amount = 10; balance -= amount; // withdraw } 执行表的内容和结构相当简单。它有 6 个执行步骤（EID 1 到 6），每个步骤都有一行列出其操作码（opcode），如果该指令是内存读取或写入，则还会列出其地址和数据：内存表中的每一行都包含地址、数据、起始 EID 和终止 EID。起始 EID 是写入该数据到该地址的执行步骤的 EID，终止 EID 是下一个将会写入该地址的执行步骤的 EID。（它还包含一个计数，我们稍后详细讨论。）对于 Wasm 内存读取指令电路，其使用查找约束来确保表中存在一个合适的表项，使得读取指令的 EID 在起始到终止的范围内。（类似地，跳转表的每一行对应于调用栈的一帧，每行均标有创建它的调用指令步骤的 EID。）这个内存系统与常规 VM 解释器的区别很大：内存表不是逐步更新的可变内存，而是包含整个执行轨迹中所有内存访问的历史记录。为了简化程序员的工作，zkWasm 提供了一个抽象层，通过两个便捷入口函数来实现。分别是： alloc_memory_table_lookup_write_cell 和 Alloc_memory_table_lookup_read_cell 其参数如下：例如，zkWasm 中实现内存存储指令的代码包含了一次对’write alloc’函数的调用： let memory_table_lookup_heap_write1 = allocator .alloc_memory_table_lookup_write_cell_with_value( "store write res 1", constraint_builder, eid, move |____| constant_from!(LocationType::Heap as u 64), move |meta| load_block_index.expr(meta), // address move |____| constant_from!( 0), // is 32-bit move |____| constant_from!( 1), // (always) enabled ); let store_value_in_heap1 = memory_table_lookup_heap_write1.value_cell; `alloc`函数负责处理表之间的查找约束以及将当前`eid`与内存表条目相关联的算术约束。由此，程序员可以将这些表看作普通内存，并且在代码执行之后 `store_value_in_heap1`的值已被赋给了 `load_block_index` 地址。类似地，内存读取指令使用`read_alloc`函数实现。在上面的示例执行序列中，每条加载指令有一个读取约束，每条存储指令有一个写入约束，每个约束都由内存表中的一个条目所满足。 mtable_lookup_write(row 1.eid, row 1.store_addr, row 1.store_value) ⇐ (row 1.eid= 1 ∧ row 1.store_addr=balance ∧ row 1.store_value= 100 ∧ ...) mtable_lookup_write(row 2.eid, row 2.store_addr, row 2.store_value) ⇐ (row 2.eid= 2 ∧ row 2.store_addr=amount ∧ row 2.store_value= 10 ∧ ...) mtable_lookup_read(row 3.eid, row 3.load_addr, row 3.load_value) ⇐ ( 2 mtable_lookup_read(row 4.eid, row 4.load_addr, row 4.load_value) ⇐ ( 1 mtable_lookup_write(row 6.eid, row 6.store_addr, row 6.store_value) ⇐ (row 6.eid= 6 ∧ row 6.store_addr=balance ∧ row 6.store_value= 90 ∧ ...) 形式化验证的结构应与被验证软件中所使用的抽象相对应，使得证明可以遵循与代码相同的逻辑。对于 zkWasm，这意味着我们需要将内存表电路和“alloc read/write cell”函数作为一个模块来进行验证，其接口则像可变内存。给定这样的接口后，每条指令电路的验证可以以类似于常规解释器的方式进行，而额外的 ZK 复杂性则被封装在内存子系统模块中。在验证中，我们具体实现了“内存表其实可以被看作是一个可变数据结构”这个想法。亦即，编写函数 `memory_at type`，其完整扫描内存表、并构建相应的地址数据映射。（这里变量 `type` 的取值范围为三种不同类型的 Wasm 内存数据：堆、数据栈和全局变量。）而后，我们证明由 alloc 函数所生成的内存约束等价于使用 set 和 get 函数对相应地址数据映射所进行的数据变更。我们可以证明：对于每一 eid，如果以下约束成立 memory_table_lookup_read_cell eid type offset value 则 get (memory_at eid type) offset = Some value 并且，如果以下约束成立 memory_table_lookup_write_cell eid type offset value 则 memory_at (eid+ 1) type = set (memory_at eid type) offset value 在此之后，每条指令的验证可以建立在对地址数据映射的 get 和 set 操作之上，这与非 ZK 字节码解释器相类似。 zkWasm 的内存写入计数机制不过，上述的简化描述并未揭示内存表和跳转表的全部内容。在 zkVM 的框架下，这些表可能会受到攻击者的操控，攻击者可以轻易地通过插入一行数据来操纵内存加载指令，返回任意数值。以提款程序为例，攻击者有机会在提款操作前，通过伪造一个$ 110 的内存写入操作，将虚假数据注入到账户余额中。这一过程可以通过在内存表中添加一行数据，并修改内存表和执行表中现有单元格的数值来实现。这将导致其可以进行“免费”的提款操作，因为账户余额在操作后将仍然保持在$ 100 。为确保内存表（和跳转表）仅包含由实际执行的内存写入（和调用及返回）指令生成的有效条目，zkWasm 采用了一种特殊的计数机制来监控条目数量。具体来说，内存表设有一个专门的列，用以持续追踪内存写入条目的总数。同时，执行表中也包含了一个计数器，用于统计每个指令预期进行的内存写入操作的次数。通过设置一个相等约束，从而确保这两个计数是一致的。这种方法的逻辑十分直观：每当内存进行写入操作，就会被计数一次，而内存表中相应地也应有一条记录。因此，攻击者无法在内存表中插入任何额外的条目。上面的逻辑陈述有点模糊，在机械化证明的过程中，需要使其更加精确。首先，我们需要修正前述内存写入引理的陈述。我们定义函数`mops_at eid type`，对具有给定`eid`和`type`的内存表条目计数（大多数指令将在一个 eid 处创建 0 或 1 个条目）。该定理的完整陈述有一个额外的前提条件，指出没有虚假的内存表条目：如果以下约束成立 (memory_table_lookup_write_cell eid type offset value) 并且以下新增约束成立 (mops_at eid type) = 1 则 (memory_at(eid+ 1) type) = set (memory_at eid type) offset value 这要求我们的验证比前述情况更精确。仅仅从相等约束条件中得出内存表条目总数等于执行中的总内存写入次数并不足以完成验证。为了证明指令的正确性，我们需要知道每条指令对应了正确数目的内存表条目。例如，我们需要排除攻击者是否可能在执行序列中略去某条指令的内存表条目，并为另一条无关指令创建一个恶意的新内存表条目。为了证明这一点，我们采用了由上至下的方式，对给定指令对应的内存表条目数量进行限制，这包括了三个步骤。首先，我们根据指令类型为执行序列中的指令预估了所应该创建的条目数量。我们称从第 i 个步骤到执行结束的预期写入次数为`instructions_mops i`，并称从第 i 条指令到执行结束在内存表中的相应条目数为`cum_mops (eid i)`。通过分析每条指令的查找约束，我们可以证明其所创建的条目不少于预期，从而可以得出所跟踪的每一段 [i... numRows] 所创建的条目不少于预期： Lemma cum_mops_bound': forall n i, 0 ≤ i -> i + Z.of_nat n = etable_numRow -> MTable.cum_mops (etable_values eid_cell i) (max_eid+ 1) ≥ instructions_mops' i n. 其次，如果能证明表中的条目数不多于预期，那么它就恰好具有正确数量的条目，而这一点是显而易见的。 Lemma cum_mops_equal' : forall n i, 0 ≤ i -> i + Z.of_nat n = etable_numRow -> MTable.cum_mops (etable_values eid_cell i) (max_eid+ 1) ≤ instructions_mops' i n -> MTable.cum_mops (etable_values eid_cell i) (max_eid+ 1) = instructions_mops' i n. 现在进行第三步。我们的正确性定理声明：对于任意 n，cum_mops 和 instructions_mops 在表中从第 n 行到末尾的部分总是一致的： Lemma cum_mops_equal : forall n, 0 <= Z.of_nat n < etable_numRow -> MTable.cum_mops (etable_values eid_cell (Z.of_nat n)) (max_eid+ 1) = instructions_mops (Z.of_nat n) 通过对 n 进行归纳总结来完成验证。表中的第一行是 zkWasm 的等式约束，表明内存表中条目的总数是正确的，即 cum_mops 0 = instructions_mops 0 。对于接下来的行，归纳假设告诉我们： cum_mops n = instructions_mops n 并且我们希望证明 cum_mops (n+ 1) = instructions_mops (n+ 1) 注意此处 cum_mops n = mop_at n + cum_mops (n+ 1) 并且 instructions_mops n = instruction_mops n + instructions_mops (n+ 1) 因此，我们可以得到 mops_at n + cum_mops (n+ 1) = instruction_mops n + instructions_mops (n+ 1) 此前，我们已经证明了每条指令将创造不少于预期数量的条目，例如 mops_at n ≥ instruction_mops n. 所以可以得出 cum_mops (n+ 1) ≤ instructions_mops (n+ 1) 这里我们需要应用上述第二个引理。（用类似的引理对跳转表进行验证，可证得每条调用指令都能准确地产生一个跳转表条目，这个证明技术因此普遍适用。然而，我们仍需要进一步的验证工作来证明返回指令的正确性。返回的 eid 与创建调用帧的调用指令的 eid 是不同的，因此我们还需要一个附加的不变性质，用来声明 eid 数值在执行序列中是单向递增的。）如此详细地说明证明过程，是形式化验证的典型特征，也是验证特定代码片段通常比编写它需要更长时间的原因。然而这样做是否值得？在这里的情况下是值得的，因为我们在证明的过程中的确发现了一个跳转表计数机制的关键错误。之前的文章中已经详细描述了这个错误——总结来说，旧版本的代码同时计入了调用和返回指令，而攻击者可以通过在执行序列中添加额外的返回指令，来为伪造的跳转表条目腾出空间。尽管不正确的计数机制可以满足对每条调用和返回指令都计数的直觉，但当我们试图将这种直觉细化为更精确的定理陈述时，问题就会凸显出来。使证明过程模块化从上面的讨论中，我们可以看到在关于每条指令电路的证明和关于执行表的计数列的证明之间存在着一种循环依赖关系。要证明指令电路的正确性，我们需要对其中的内存写入进行推理；即需要知道在特定 EID 处内存表条目的数量、以及需要证明执行表中的内存写入操作计数是正确的；而这又需要证明每条指令至少执行了最少数量的内存写入操作。此外，还有一个需要考虑的因素，zkWasm 项目相当庞大，因此验证工作需要模块化，以便多位验证工程师分工处理。因此，对计数机制的证明解构时需要特别注意其复杂性。例如，对于 LocalGet 指令，有两个定理如下： Theorem opcode_mops_correct_local_get : forall i, 0 <= i -> etable_values eid_cell i > 0 -> opcode_mops_correct LocalGet i. Theorem LocalGetOp_correct : forall i st y xs, 0 <= i -> etable_values enabled_cell i = 1 -> mops_at_correct i -> etable_values (ops_cell LocalGet) i = 1 -> state_rel i st -> wasm_stack st = xs -> (etable_values offset_cell i) > 1 -> nth_error xs (Z.to_nat (etable_values offset_cell i - 1)) = Some y -> state_rel (i+ 1) (update_stack (incr_iid st) (y :: xs)). 第一个定理声明 opcode_mops_correct LocalGet i 展开定义后，意味着该指令在第 i 行至少创建了一个内存表条目（数字 1 是在 zkWasm 的 LocalGet 操作码规范中指定的）。第二个定理是该指令的完整正确性定理，它引用 mops_at_correct i 作为假设，这意味着该指令准确地创建了一个内存表条目。验证工程师可以分别独立地证明这两个定理，然后将它们与关于执行表的证明结合起来，从而证得整个系统的正确性。值得注意的是，所有针对单个指令的证明都可以在读取/写入约束的层面上进行，而无须了解内存表的具体实现。因此，项目分为三个可以独立处理的部分。总结逐行验证 zkVM 的电路与验证其他领域的 ZK 应用并没有本质区别，因为它们都需要对算术约束进行类似的推理。从高层来看，对 zkVM 的验证需要用到许多运用于编程语言解释器和编译器形式化验证的方法。这里主要的区别在于动态大小的虚拟机状态。然而，通过精心构建验证结构来匹配实现中所使用的抽象层，这些差异的影响可以被最小化，从而使得每条指令都可以像对常规解释器那样，基于 get-set 接口来进行独立的模块化验证。

In our blog series on advanced formal verification of zero-knowledge proofs, we have discussed how to verify ZK instructions and a deep dive into two ZK vulnerabilities. By formally verifying every zkWasm instruction, we found and fixed every vulnerability, allowing us to fully verify the technical safety and correctness of the entire zkWasm circuit, as shown in the public report and code repository.
Although we have shown the process of verifying a zkWasm instruction and introduced the initial concepts of the project, readers familiar with formal verification may be interested in understanding the uniqueness of zkVM compared to other smaller ZK systems or other types of bytecode VMs in verification. In this post, we will discuss in depth some technical points encountered when verifying the zkWasm memory subsystem. Memory is the most unique part of zkVM, and getting it right is critical to verifying all other zkVMs.
Formal Verification: Virtual Machine (VM) vs. ZK Virtual Machine (zkVM)
Our ultimate goal is to verify the correctness of zkWasm in a manner similar to the correctness theorems of a normal bytecode interpreter (VM, such as the EVM interpreter used by Ethereum nodes). That is, each step of the interpreter's execution corresponds to a legal step based on the operational semantics of the language. As shown in the figure below, if the bytecode interpreter's data structure is currently in state SL, and this state is marked as state SH in the high-level specification of the Wasm machine, then when the interpreter steps to state SL', there must be a corresponding legal high-level state SH', and the Wasm specification stipulates that SH must step to SH'.
Similarly, zkVM has a similar correctness theorem: each new row in the zkWasm execution table corresponds to a legal step based on the operational semantics of the language. As shown in the figure below, if the current state of the data structure in a row of the execution table is SR, and this state is represented as state SH in the high-level specification of the Wasm machine, then the next row of the execution table, state SR', must correspond to a high-level state SH', and the Wasm specification stipulates that SH must step to SH'.
As can be seen, the specifications of the high-level state and Wasm steps are consistent in both VM and zkVM, so previous verification experience of programming language interpreters or compilers can be used. The special feature of zkVM verification lies in the type of data structures that constitute the low-level state of the system.
First, as we described in a previous blog post, zk provers are essentially integer arithmetic modulo large primes, and the Wasm specification and normal interpreters deal with 32-bit or 64-bit integers. Much of the zkVM implementation deals with this, so the verification needs to handle this as well. However, this is a "local" problem: each line of code becomes more complex because it needs to handle arithmetic operations, but the overall structure of the code and proof does not change.
Another major difference is how dynamically sized data structures are handled. In a regular bytecode interpreter, memory, the data stack, and the call stack are all implemented as mutable data structures. Similarly, the Wasm specification represents memory as a data type with get/set methods. For example, Geth's EVM interpreter has a `Memory` data type that is implemented as a byte array representing physical memory and is written to and read from via `Set 32` and `GetPtr` methods. To implement a memory store instruction, Geth calls `Set 32` to modify the physical memory.
func opMstore(pc *uint 64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
// pop value of the stack
mStart, val := scope.Stack.pop(), scope.Stack.pop()
scope.Memory.Set 32(mStart.Uint 64(), val)
return nil, nil
}
In the correctness proof of the above interpreter, it is relatively easy to prove that the high-level state matches the low-level state after assigning values ​​to the concrete memory in the interpreter and the abstract memory in the specification.
However, with zkVM, the situation becomes more complicated.
zkWasm's memory table and memory abstraction layer
In zkVM, the execution table has columns for fixed-size data (similar to registers in a CPU), but it cannot handle dynamically sized data structures, which are accessed through auxiliary tables. The zkWasm execution table has an EID column with values ​​of 1, 2, 3, etc., and two auxiliary tables, the memory table and the jump table, for representing memory data and the call stack, respectively.
The following is an example implementation of a withdrawal program:
int balance, amount;
void main () {
balance = 100;
  amount = 10;
balance -= amount; // withdraw
}
The contents and structure of the execution table are fairly simple. It has 6 execution steps (EID 1 to 6), each with a row listing its operation code (opcode) and, if the instruction is a memory read or write, its address and data:
Each row in the memory table contains the address, data, start EID, and end EID. The start EID is the EID of the execution step that wrote the data to that address, and the end EID is the EID of the next execution step that will write to that address. (It also contains a count, which we will discuss in more detail later.) For the Wasm memory read instruction circuit, it uses a lookup constraint to ensure that there is a suitable table entry in the table so that the read instruction's EID is in the range from start to end. (Similarly, each row of the jump table corresponds to a frame of the call stack, and each row is labeled with the EID of the call instruction step that created it.)
This memory system is very different from a regular VM interpreter: the memory table is not a mutable memory that is updated gradually, but contains the history of all memory accesses in the entire execution trace. In order to simplify the programmer's work, zkWasm provides an abstraction layer implemented through two convenient entry functions. They are:
alloc_memory_table_lookup_write_cell
and
Alloc_memory_table_lookup_read_cell
Its parameters are as follows:
For example, the code implementing the memory storage instruction in zkWasm contains a call to the 'write alloc' function:
let memory_table_lookup_heap_write1 = allocator
    .alloc_memory_table_lookup_write_cell_with_value(
"store write res 1",
constraint_builder,
eid,
move |____| constant_from!(LocationType::Heap as u 64),
move |meta| load_block_index.expr(meta),   // address
move |____| constant_from!( 0),             // is 32-bit
move |____| constant_from!( 1),             // (always) enabled
    );
let store_value_in_heap1 = memory_table_lookup_heap_write1.value_cell;
The `alloc` function is responsible for handling the lookup constraints between the tables and the arithmetic constraints that associate the current `eid` with the memory table entry. Thus, the programmer can view these tables as normal memory, and after the code executes, the value of `store_value_in_heap1` has been assigned to the `load_block_index` address.
Similarly, memory read instructions are implemented using the `read_alloc` function. In the example execution sequence above, each load instruction has a read constraint and each store instruction has a write constraint, and each constraint is satisfied by an entry in the memory table.
mtable_lookup_write(row 1.eid, row 1.store_addr, row 1.store_value)
                       ⇐ (row 1.eid= 1 ∧ row 1.store_addr=balance ∧ row 1.store_value= 100 ∧ ...)
mtable_lookup_write(row 2.eid, row 2.store_addr, row 2.store_value)
                      ⇐ (row 2.eid= 2 ∧ row 2.store_addr=amount ∧ row 2.store_value= 10 ∧ ...)
mtable_lookup_read(row 3.eid, row 3.load_addr, row 3.load_value)
                      ⇐ ( 2
mtable_lookup_read(row 4.eid, row 4.load_addr, row 4.load_value)
                      ⇐ ( 1
mtable_lookup_write(row 6.eid, row 6.store_addr, row 6.store_value)
                      ⇐ (row 6.eid= 6 ∧ row 6.store_addr=balance ∧ row 6.store_value= 90 ∧ ...)
The structure of the formal verification should correspond to the abstractions used in the software being verified, so that the proof can follow the same logic as the code. For zkWasm, this means that we need to verify the memory table circuit and the "alloc read/write cell" functions as a module, with an interface like mutable memory. Given such an interface, verification of each instruction circuit can be done in a manner similar to a regular interpreter, with the additional ZK complexity being encapsulated in the memory subsystem module.
In the verification, we specifically implemented the idea that "the memory table can actually be regarded as a mutable data structure". That is, we wrote a function `memory_at type`, which completely scans the memory table and builds the corresponding address data mapping. (Here, the value range of the variable `type` is three different types of Wasm memory data: heap, data stack, and global variables.) Then, we proved that the memory constraints generated by the alloc function are equivalent to the data changes made to the corresponding address data mapping using the set and get functions. We can prove that:
For each eid, if the following constraints hold
memory_table_lookup_read_cell eid type offset value
but
get (memory_at eid type) offset = Some value
And if the following constraints hold
memory_table_lookup_write_cell eid type offset value
but
memory_at (eid+ 1) type = set (memory_at eid type) offset value
After this, verification of each instruction can be based on get and set operations on the address-data mapping, similar to a non-ZK bytecode interpreter.
zkWasm's memory write counting mechanism
However, the above simplified description does not reveal the full contents of the memory and jump tables. Under the framework of zkVM, these tables may be manipulated by an attacker, who can easily manipulate the memory load instruction to return an arbitrary value by inserting a row of data.
Taking the withdrawal program as an example, an attacker has the opportunity to inject false data into the account balance by forging a memory write operation of $110 before the withdrawal operation. This process can be achieved by adding a row of data to the memory table and modifying the values ​​of existing cells in the memory table and the execution table. This will result in a "free" withdrawal operation, because the account balance will still remain at $100 after the operation.
To ensure that the memory table (and jump table) only contain valid entries generated by memory writes (and calls and returns) that are actually executed, zkWasm uses a special counting mechanism to monitor the number of entries. Specifically, the memory table has a dedicated column that keeps track of the total number of memory write entries. At the same time, the execution table also contains a counter that counts the number of memory write operations expected for each instruction. An equality constraint is set to ensure that the two counts are consistent. The logic of this approach is very intuitive: every time a memory write operation is performed, it is counted once and there should be a corresponding record in the memory table. Therefore, an attacker cannot insert any additional entries into the memory table.
The logical statement above is a bit vague, and in the process of mechanical proof, we need to make it more precise. First, we need to correct the statement of the memory write lemma mentioned above. We define a function `mops_at eid type` that counts the number of memory table entries with a given `eid` and `type` (most instructions will create 0 or 1 entries at an eid). The full statement of the theorem has an additional precondition stating that there are no spurious memory table entries:
If the following constraints hold
 (memory_table_lookup_write_cell eid type offset value)
And the following new constraints hold
 (mops_at eid type) = 1 
but
 (memory_at(eid+ 1) type) = set (memory_at eid type) offset value
This requires our verification to be more precise than in the previous case. It is not enough to simply conclude from the equality constraint that the total number of memory table entries is equal to the total number of memory writes in the execution. In order to prove the correctness of the instructions, we need to know that each instruction corresponds to the correct number of memory table entries. For example, we need to rule out the possibility that an attacker could omit the memory table entry for an instruction in the execution sequence and create a malicious new memory table entry for another unrelated instruction.
To prove this, we take a top-down approach to constraining the number of memtable entries for a given instruction, which consists of three steps. First, we estimate the number of entries that should be created for the instructions in the execution sequence, based on the instruction type. We call the expected number of writes from the i-th step to the end of execution `instructions_mops i`, and the corresponding number of memtable entries from the i-th instruction to the end of execution `cum_mops (eid i)`. By analyzing the lookup constraints for each instruction, we can prove that it creates no fewer entries than expected, and thus that each segment [i...numRows] that is traced creates no fewer entries than expected:
Lemma cum_mops_bound': forall n i,
  0 ≤ i ->
  i + Z.of_nat n = etable_numRow ->
  MTable.cum_mops (etable_values eid_cell i) (max_eid+ 1) ≥ instructions_mops' i n.
Second, if one can show that the table has no more entries than expected, then it has exactly the right number of entries, and this is obvious.
Lemma cum_mops_equal' : forall n i,
    0 ≤ i ->
    i + Z.of_nat n = etable_numRow ->
    MTable.cum_mops (etable_values eid_cell i) (max_eid+ 1) ≤ instructions_mops' i n ->
    MTable.cum_mops (etable_values eid_cell i) (max_eid+ 1)  = instructions_mops' i n.
Now for step 3. Our correctness theorem states that for any n, cum_mops and instructions_mops are always consistent from row n to the end of the table:
Lemma cum_mops_equal : forall n,
    0 <= Z.of_nat n < etable_numRow ->
  MTable.cum_mops (etable_values eid_cell (Z.of_nat n)) (max_eid+ 1) = instructions_mops (Z.of_nat n)
Verification is done by inductively summing up on n. The first row in the table is the equality constraint for zkWasm, which states that the total number of entries in the memory table is correct, i.e., cum_mops 0 = instructions_mops 0. For the following rows, the inductive hypothesis tells us:
cum_mops n = instructions_mops n
And we hope to prove
cum_mops (n+ 1) = instructions_mops (n+ 1)
Note here
cum_mops n = mop_at n + cum_mops (n+ 1)
and
instructions_mops n = instruction_mops n + instructions_mops (n+ 1)
Therefore, we can get
mops_at n + cum_mops (n+ 1) = instruction_mops n + instructions_mops (n+ 1)
Earlier, we have shown that each instruction will create no less than the expected number of entries, e.g.
mops_at n ≥ instruction_mops n.
So it can be concluded
cum_mops (n+ 1) ≤ instructions_mops (n+ 1)
Here we need to apply the second lemma mentioned above.
(A similar lemma can be applied to jump tables to prove that every call instruction generates exactly one jump table entry, making the proof technique generally applicable. However, further work is needed to prove the correctness of return instructions. The return eid is different from the call instruction that created the call frame, so we need an additional invariant that states that the eid value is unidirectionally increasing in the execution sequence.)
Such detailed description of the proof is typical of formal verification and is the reason why verifying a particular piece of code often takes longer than writing it. But is it worth it? In this case, it is, because during the proof we did discover a critical bug in the jump table counting mechanism. This bug has been described in detail in a previous post - in summary, the old version of the code counted both call and return instructions, and an attacker could make room for a fake jump table entry by adding an extra return instruction to the execution sequence. Although the incorrect counting mechanism satisfies the intuition of counting every call and return instruction, problems become apparent when we try to refine this intuition into a more precise theorem statement.
Modularize the proof process
From the above discussion, we can see that there is a circular dependency between the proof about each instruction circuit and the proof about the count column of the execution table. To prove the correctness of the instruction circuit, we need to reason about the memory writes in it; that is, we need to know the number of memory table entries at a specific EID, and we need to prove that the memory write operation count in the execution table is correct; which in turn requires proving that each instruction performs at least the minimum number of memory write operations.
In addition, there is another factor to consider. The zkWasm project is quite large, so the verification work needs to be modularized so that multiple verification engineers can divide the work. Therefore, when deconstructing the proof of the counting mechanism, special attention needs to be paid to its complexity. For example, for the LocalGet instruction, there are two theorems as follows:
Theorem opcode_mops_correct_local_get : forall i,
  0 <= i ->
  etable_values eid_cell i > 0 ->
  opcode_mops_correct LocalGet i.
Theorem LocalGetOp_correct : forall i st y xs,
  0 <= i ->
  etable_values enabled_cell i = 1 ->
  mops_at_correct i ->
  etable_values (ops_cell LocalGet) i = 1 ->
  state_rel i st ->
  wasm_stack st = xs ->
  (etable_values offset_cell i) > 1 ->
  nth_error xs (Z.to_nat (etable_values offset_cell i - 1)) = Some y ->
  state_rel (i+ 1) (update_stack (incr_iid st) (y :: xs)).
First theorem statement
opcode_mops_correct LocalGet i
After the definition is expanded, it means that the instruction creates at least one memory table entry at line i (the number 1 is specified in the LocalGet opcode specification of zkWasm ).
The second theorem is the complete correctness theorem for this instruction, which refers to
mops_at_correct i 
As a hypothesis, this means that the instruction creates exactly one memory table entry.
Verification engineers can prove these two theorems independently and then combine them with the proof about the execution table to prove the correctness of the entire system. It is worth noting that all proofs for individual instructions can be done at the level of read/write constraints without knowing the specific implementation of the memory table. Therefore, the project is divided into three parts that can be processed independently.
Summarize
Verifying the circuits of a zkVM line by line is not fundamentally different from verifying other ZK applications, as they all require similar reasoning about arithmetic constraints. At a high level, verification of the zkVM requires many of the same techniques used in formal verification of programming language interpreters and compilers. The main difference here is the dynamically sized state of the VM. However, the impact of these differences can be minimized by carefully constructing the verification structure to match the level of abstraction used in the implementation, so that each instruction can be independently verified modularly based on a get-set interface, just like a regular interpreter.

Explore More From Creator

Latest News

Explore More From Creator

Latest News

Trending Articles