This report analyzes in detail the technical details, root causes, and possible attack methods of the core DoS vulnerability in the TON virtual machine, and also demonstrates the efficient solution proposed by the TonBit team.

Recently, the virtual machine system of the TON network has undergone a major security upgrade. TonBit, a security team under BitsLab, successfully discovered and assisted in fixing a core vulnerability that could cause the TON virtual machine to run out of resources. This vulnerability exploits the recursive mechanism of the virtual machine when processing Continuation nesting, which can be abused by malicious contracts, causing system crashes and network instability.

If the vulnerability is maliciously exploited, it may cause all verification nodes to crash without consuming a single TON, directly threatening the availability of the network. In this incident, TonBit quickly located the vulnerability with its outstanding technical capabilities, and proposed an innovative solution of replacing recursion with iteration by adjusting the internal control flow mechanism of the virtual machine, successfully creating a safer ecological environment for TON users. In its latest update announcement, the TON official team specially thanked TonBit for its outstanding contribution to ecological security.

In the following detailed security report, we will provide an in-depth analysis of the causes, technical details and solutions of this vulnerability. The report details how the vulnerability exploits the deep nesting of Continuations to build a recursive chain that triggers a resource exhaustion attack, and how the malicious contract exhausts the host's stack space by extending the call stack. At the same time, we will also introduce how the TonBit team helped to completely solve this problem by eliminating the design flaws of the recursive chain and switching to a collaborative iteration mechanism. This repair not only significantly improves the stability of the TON network, but also provides an important reference for the underlying security of the blockchain industry.

Case Study: DoS Vulnerability in TON VM and Related Mitigations

Introduction

This report describes a DoS (Denial of Service) vulnerability in the TON virtual machine and the mitigation to address it. The vulnerability is caused by the way the virtual machine handles nested Continuations during contract execution. The vulnerability allows a malicious contract to create Continuations and nest them deeply in a specific way, thereby triggering deep recursion during evaluation, exhausting the host's stack space and causing the virtual machine to stop running. To mitigate this issue, the virtual machine has modified its handling of Continuations and control flow. Now, instead of making sequential tail calls through the Continuation chain, the virtual machine actively iterates the chain. This approach ensures that only constant host stack space is used, preventing stack overflows.

overview

According to official documents, TON VM is a stack-based virtual machine that uses Continuation-Passing Style (CPS) as its control flow mechanism for internal processes and smart contracts. Control flow registers are accessible to contracts, providing flexibility.

Continuations in TVM can be divided into three categories in theory:

  • OrdCont (i.e. vmc_std), which contains the TON ASM fragment that needs to be executed, is a first-class object in TVM. Contracts can explicitly create and pass them at runtime to implement arbitrary control flows.

  • Extraordinary continuations, which usually contain OrdCont as a component, are created through explicit iteration primitives and special implicit operations to handle the corresponding control flow mechanism.

  • Additional ArgContExt, encapsulating other Continuations to hold control data.

During the contract execution, the virtual machine enters the main loop, decodes one word of the contract fragment each time, and dispatches the corresponding operation to the appropriate handler. The normal handler returns immediately after executing the corresponding operation.

In contrast, the iteration instruction creates a non-ordinary Continuation using the provided Continuation as a component and jumps to the non-ordinary Continuation in the appropriate context. The non-ordinary Continuation itself implements the logic when jumping and jumps to a component based on the condition. For example, when using the WHILE instruction, we can demonstrate this process in Figure 1 (with possible jumps omitted).

Figure 1: Non-trivial Continuation logic

root cause

In the vulnerable version of the VM, these jumps result in consecutive dynamic tail calls, which requires the host stack to maintain a stack frame for each jump (as shown in Figure 2).

Take WhileCont as an example, and the other parts are omitted for brevity.

Figure 2: Triple jump recursion to nest deeply

Ideally, this would not be a problem, since components are usually represented as OrdCont, whose jumps would simply save the current context and then instruct the VM to execute the fragment it holds, before the remaining contract fragments are executed, and without introducing more recursion. However, non-ordinary Continuations are theoretically designed to allow their components to be accessed through the cc (c0) register in TVM (i.e., the set_c0 branch above). Therefore, contracts can abuse this feature to perform deep recursion (described later). It is clearer and easier to eliminate recursion directly in the jump process of non-ordinary Continuations than to change the implementation of this regular feature.

By repeatedly using the obtained non-ordinary Continuation to build the previous level of non-ordinary Continuation, a deeply nested Continuation can be created through iteration. These deeply nested Continuations may exhaust the available stack space of the host when evaluated, causing the operating system to issue a SIGSEGV signal and terminate the virtual machine process.

Figure 3 provides a proof of concept (PoC) of the nesting process.

Figure 3: Nested process

We can see that in each iteration, the body extends a WhileCont{chkcond=true}. By executing the cc generated and saved in the previous iteration, a call stack similar to this will be obtained:

It can be seen that the stack space has a linear dependence on the nesting level (i.e., the number of iterations), which indicates that stack space exhaustion may occur.

About the use in real environment

In real blockchains, gas fee limits make it quite difficult to construct malicious contracts. Due to the linear complexity of the nesting process (the TVM design effectively prevents cheaper construction via self-references), it is not easy to develop a practical malicious contract. Specifically, one level of nesting generates a call sequence that consumes three host stack frames (320 bytes) in the debug binary and two in the release binary (256 bytes, the latter two calls are inlined into one). For validators running on modern POSIX operating systems, the default stack size is 8MiB, which is enough to support more than 30,000 levels of nesting in the release binary. Although it is still possible to construct a contract that can exhaust the stack space, it is much more difficult than the example in the previous section.

Mitigation

This patch modifies the behavior of jumps in the case of nested continuations. We can see that the signature of the continuation jump has changed.

Take UntilCont as an example, and the other parts are omitted for brevity.

Instead of calling VmState::jump to jump to the next Continuation, which meant recursively performing a triple jump on each Continuation and waiting for the return value to propagate backwards, Continuation jumps now only resolve the next level of Continuation and then return control to the VM.

The VM iterates and parses each level of continuation in a cooperative manner until it encounters a NullRef, indicating that the chain is resolved (such as implemented in OrdCont or ExuQuitCont). During this iteration, only one continuation jump is always allocated on the host stack, thus ensuring that the stack usage remains constant.

in conclusion

For services that require high availability, the use of recursion can be a potential attack vector. Forcing recursion to terminate can be challenging when user-defined logic is involved. This DoS vulnerability demonstrates an extreme case of accidental abuse of normal functionality under resource-constrained conditions (or other constraints). Similar issues can occur if recursion depends on user input, which is common in control flow primitives of virtual machines.

This report analyzes in detail the technical details, root causes and possible attack methods of the core DoS vulnerability in the TON virtual machine, and also shows the efficient solution proposed by the TonBit team. By adjusting the recursive jump mechanism of the virtual machine to iterative processing, TonBit successfully proposed a solution to fix the vulnerability, assisted in fixing this core vulnerability that could cause network paralysis, and provided a more robust security guarantee for the TON ecosystem. This incident not only reflects TonBit's deep accumulation in the field of blockchain underlying technology security, but also demonstrates its important role as the official Security Assurance Provider (SAP) of TON.

As an indispensable security partner of the TON ecosystem, TonBit has always been at the forefront of the industry in protecting the stability of the blockchain network and the security of user assets. From vulnerability discovery to solution design, TonBit has laid a solid foundation for the long-term development of the TON network with its strong technical capabilities and deep understanding of blockchain development. At the same time, the TonBit team has also continued to work hard in areas such as network security architecture, user data protection, and security improvement of blockchain application scenarios. In the future, TonBit will continue to drive security technology progress with innovation, and provide continuous support and guarantee for the healthy development of the TON ecosystem and the entire blockchain industry. This vulnerability discovery and assistance in repairing the vulnerability has been highly recognized by TON officials, further consolidating TonBit's industry position in the field of blockchain security, and demonstrating its firm commitment to promoting the development of a decentralized ecosystem.

TonBit official website: https://www.tonbit.xyz/

TonBit Official Twitter: https://x.com/tonbit_

Telegram:https://t.me/BitsLabHQ

Linkedin: https://www.linkedin.com/company/tonbit-team/

Blog: https://www.tonbit.xyz/#blogs

Telegram audit needs contact: @starchou