Author: Vitalik
Compiled by Nan Zhi, Odaily Planet Daily
One of the important attributes of a good blockchain user experience is fast transaction confirmation time. Today, Ethereum has improved a lot compared to five years ago. Thanks to EIP-1559 and the stable block time after the transition to PoS (The Merge), transactions sent by users on L1 can usually be confirmed within 5-20 seconds, which is roughly comparable to the experience of paying with a credit card. However, it is valuable to further improve the user experience, and some applications even require delays of hundreds of milliseconds or even shorter. This article will explore some practical options for Ethereum (improving transaction confirmation time).
Overview of existing ideas and technologies
Single slot finality
Currently, Ethereum's Gasper consensus uses a single slot and Epoch architecture. Every 12 seconds per slot, a portion of validators vote on the head of the chain, and within 32 slots (6.4 minutes), all validators have the opportunity to vote once. These votes are then reinterpreted as messages in a consensus algorithm similar to PBFT, giving a very strong economic guarantee called finality after two Epochs (12.8 minutes).
Over the past few years, we have become increasingly dissatisfied with the current approach. There are two main reasons for this. First, it is complex, with many interaction errors between the slot-to-slot voting mechanism and the epoch-to-epoch finality mechanism, and second, 12.8 minutes is too long and no one wants to wait that long.
Single Slot Finality (SSF) replaces this architecture with a mechanism similar to Tendermint consensus, where block N is finalized before block N+1 is produced. The main difference from Tendermint is that we retain the "inactivity leak" mechanism, which allows the chain to continue running and recover when more than 1/3 of validators are offline.
(Note: Inactivity leak is a mechanism in PoS that aims to punish validators who have been inactive for a long time. Once they are marked as inactive, their staked ETH will continue to be confiscated.
Tendermint is an efficient and secure Byzantine fault-tolerant consensus algorithm that allows for rapid transaction confirmation and ensures that the blockchain system can continue to operate normally even when some nodes are malicious or offline.
The main challenge with single-slot finality is that it means each Ethereum staker needs to publish two messages every 12 seconds, which is a significant load on the chain. There are some clever ideas to mitigate this problem, including the recent Orbit SSF proposal. While this significantly speeds up "finality" to improve user experience, it does not change the fact that users need to wait 5-20 seconds.
(Note: Finality is not the same as the transaction being packaged into a block and confirmed. If a transaction is confirmed but finality is not achieved, a fork or rollback may occur.)
Rollup Pre-Confirmation
Over the past few years, Ethereum has been following a rollup-centric roadmap, designing the Ethereum base layer (L1) to support data availability and other features, which can then be used by L2 protocols such as rollups, validiums, and plasmas to provide users with the same level of security as Ethereum at a larger scale.
This creates a separation of concerns within the Ethereum ecosystem: Ethereum L1 focuses on censorship resistance, reliability, stability, and maintaining and improving a certain base layer core functionality, while L2 focuses on reaching users more directly through different cultures and technologies. But if you go down this path, an inevitable problem arises: L2 wants to provide users with faster confirmations than 5-20 seconds.
So far, at least in theory, it has been the responsibility of L2 to create its own network of "decentralized sorters." A small group of validators might sign blocks every few hundred milliseconds and stake their stake behind those blocks. Eventually, the headers of these L2 blocks are published to L1.
But the L2 validator set can "cheat": they can sign block B 1, then sign a conflicting block B 2 and submit it to the chain before B 1. But if they do this, they will be detected and lose their staked assets. In fact, we have already seen practical examples of centralized versions, but on the other hand, rollups have been slow to develop decentralized sorting networks. You can say that it is unfair to require all L2s to do decentralized sorting: we are asking rollups to do almost the same work as creating a brand new L1. Therefore, Justin Drake has been promoting a way for all L2s (and L1s) to use a shared pre-confirmation mechanism across Ethereum: basic pre-confirmation.
Basic pre-confirmation
The preconfirmation-based approach assumes that Ethereum proposers are highly complex actors associated with MEV. The preconfirmation-based approach exploits this complexity by incentivizing these complex proposers to accept the responsibility of providing preconfirmation services.
The basic idea of the approach is to create a standardized protocol where users can offer an additional fee to secure an immediate guarantee that a transaction will be included in the next block, as well as a statement about the consequences of executing that transaction. If the proposer breaks any of the promises made to any user, they can be slashed.
As mentioned, L1 transactions are guaranteed based on preconfirmations. If rollups are “based on” then all L2 blocks are L1 transactions, so the same mechanism can be used to provide preconfirmations for any L2.
(Note: Ethereum proposers can use the fee mechanism to bundle a series of transactions into a bundle and pack them into a block, ensuring transaction execution and order. For example, the well-known clamp ensures buying before a transaction and selling after it. Vitalik's proposal here is conceptually consistent, and through this proposers can lock in transaction results in advance and speed up execution.)
What are we actually looking at?
Let’s say we achieve single-slot finality. We use techniques similar to Orbit to reduce the number of validators signing each slot, but not by too much so that we can also make progress on our key goal of reducing the 32 ETH stake minimum. Slot times might increase to 16 seconds, and then we use rollup preconfirmations or base preconfirmations to provide users with faster confirmations. What we end up with: an epoch-slot architecture.
There is a deep philosophical reason why the epoch-and-slot architecture seems so hard to avoid: it takes less time to reach rough agreement on something than to reach agreement on maximal “economic finality” about that thing.
One simple reason is the number of nodes. While the old linear decentralization/finality time/overhead tradeoff looks milder now thanks to hyper-optimized BLS aggregation and upcoming ZK-STARKs, the following reasons cannot be ignored:
“Approximate consensus” only requires a small number of nodes, while economic finality requires a majority of nodes.
Once the number of nodes exceeds a certain size, it will take more time to collect signatures.
In Ethereum today, the 12 second slot is divided into three sub-slots: block publishing and distribution, attestation, attestation aggregation. If the number of attesters is greatly reduced, we can reduce to two sub-slots and use 8 second slot times. Another, more practical, larger factor is the "quality" of the nodes. Another, larger factor is the "quality" of the nodes. If we can also rely on a specialized subset of nodes to reach approximate agreement (and still use the full validator set for finality), we can get it down to about 2 seconds.
So in my opinion, the epoch-and-slot architecture is clearly correct, but not all epoch-and-slot architectures are created equal, and there is value in exploring the design space more fully. A direction worth further investigation would be one that is not as tightly coupled as Gasper, but has a stronger separation of concerns between the two mechanisms.
What should L2 do?
In my opinion, there are currently three reasonable strategies for L2:
They are "based" both technically and spiritually. That is, they optimize the technical properties of the Ethereum base layer and its values (highly decentralized, censorship-resistant, etc.). In their simplest form, you can think of these rollups as "branded shards," but they can also have much greater ambitions, conducting extensive experiments on new virtual machine designs and other technical improvements.
Become a “server with blockchain scaffolding” and take advantage of it. If you start with a server, and then add STARK validity proofs to ensure that the server follows the rules; to ensure the rights of users to exit or force transactions; and to ensure the freedom of collective choice, either through coordinated mass exits or by changing the vote of the sorter, then you have gained most of the benefits of being on-chain while retaining most of the efficiency of a server.
(Note: Scaffolding refers to a tool or method that automatically generates the basic structure and code framework of a project so that developers can start coding quickly.)
The trade-off: a fast chain with a hundred nodes, with Ethereum providing additional interoperability and security. This is the current de facto roadmap for many L2 projects.
For some applications (e.g. ENS, key storage, some payment protocols), 12 second block time is sufficient. For those applications where it is not applicable, the only solution is the epoch-and-slot architecture. In three cases, "epoch" is Ethereum's SSF, but the slot is different in each of the above three cases:
An Ethereum-native epoch-and-slot architecture
Server Pre-confirmation
Committee Pre-confirmation
A key question is how good can we get at category 1? In particular, if it gets really good, then category 3 feels less relevant. Category 2 will always exist, since all “based” solutions are not suitable for off-chain data L2 like plasmas and validiums. If an Ethereum-native epoch-and-slot architecture can get down to 1 second slot times, then the space for category 3 becomes much smaller.
Today, we are far from final answers to these questions. A key question is how complex block proposers can become, which is still an area of considerable uncertainty. Designs like Orbit SSF are very novel, so the design space of options such as Orbit SSF as an epoch in epoch-and-slot is still worth exploring fully. The more options we have, the better we can do for users of both L1 and L2, and we can simplify the work of L2 developers.