Author: Vitalik, founder of Ethereum; Translation: 0xjs@Golden Finance

An important feature of a good blockchain user experience is fast transaction confirmation times. Today, Ethereum has made great progress compared to five years ago. Thanks to EIP-1559 and stable block times after Merge, transactions sent by users on L1 can be reliably confirmed within 5-20 seconds. This is roughly comparable to the experience of paying with a credit card. However, it is valuable to further improve the user experience, and some applications do require latency of hundreds of milliseconds or even less. This article will introduce some practical options for Ethereum.

Overview of existing ideas and technologies

Single-slot finality

Today, Ethereum's Gasper consensus uses a slot and epoch architecture. Every 12-second slot, a subset of validators publishes a vote on the chain block header, and all validators have a chance to vote once within 32 slots (6.4 minutes). These votes are then reinterpreted as messages in a consensus algorithm similar to PBFT, which provides very strict economic guarantees called finality after two epochs (12.8 minutes).

Over the past few years, we have become increasingly dissatisfied with the current approach. The main reasons are: (i) it is complex, with many interaction errors between the per-slot voting mechanism and the per-epoch finality mechanism; and (ii) 12.8 minutes is too long and no one wants to wait that long.

Single-slot finality replaces this architecture with a mechanism more similar to Tendermint consensus, where block N is finalized before block N+1 is produced. The main difference from Tendermint is that we retain the "inactivitly leak" mechanism that allows the chain to continue running and recover if more than 1/3 of validators are offline.

Final design of single slot

The main challenge with SSF is that it seems to naively imply that every Ethereum staker needs to publish two messages every 12 seconds, which is a significant burden on the blockchain. There are some clever ideas to mitigate this, including the recent Orbit SSF proposal. But even so, while this significantly improves the user experience by speeding up "finality", it doesn't change the fact that users need to wait 5-20 seconds.

Rollup Pre-Confirmation

Over the past few years, Ethereum has been following a rollup-centric roadmap, designing the Ethereum base layer (“L1”) around supporting data availability and other features that can then be used by layer 2 protocols like rollups (also validiums and plasmas), which can provide users with the same level of security as Ethereum, but at a much larger scale.

This creates a separation of concerns in the Ethereum ecosystem: Ethereum L1 can focus on censorship resistance, reliability, stability, and maintaining and improving certain core infrastructure features, while L2 can focus on reaching users more directly - through different cultural and technical trade-offs. But if you go down this path, an inevitable problem will arise: L2 wants to serve users who want faster confirmations in 5-20 seconds.

So far, at least verbally, L2 has been responsible for creating its own "decentralized ordering" network. A small group of validators will sign blocks, perhaps once every few hundred milliseconds, and they will place their "stake" behind these blocks. Eventually, the block headers of these L2 blocks will be published to L1.

The L2 validator set can cheat: they can sign block B1 first, then sign the conflicting block B2, and submit it to the chain before B1. But if they do this, they will be caught and lose their stake. In practice, we have seen centralized versions of this practice, but rollups have been slow to develop decentralized ordering networks. You could say that requiring L2s to all do decentralized ordering is a raw deal: we are asking rollups to essentially do the same work as creating a brand new L1. For this reason and others, Justin Drake has been promoting a way to give all L2s (and L1s) access to a shared Ethereum-wide pre-confirmation mechanism: based pre-confirmations.

Based on pre-confirmation

The Based pre-confirmation approach assumes that Ethereum proposers will become highly sophisticated actors for MEV-related reasons. The Based pre-confirmation approach exploits this complexity by incentivizing these sophisticated proposers to accept the responsibility of providing pre-confirmation as a service.

The basic idea is to create a standardized protocol by which users can offer additional fees in exchange for immediate guarantees that the transaction will be included in the next block, and possibly a statement about the results of executing the transaction. If the proposer breaks any promise made to any user, they are penalized.

As mentioned above, the Based pre-confirmation mechanism provides guarantees for L1 transactions. If the rollup is "based", then all L2 blocks are L1 transactions, so the same mechanism can be used to provide pre-confirmation for any L2.

What are we actually seeing here?

Suppose we achieve single-slot finality. We use techniques like Orbit to reduce the number of validators signing each slot, but not too much, so that we can also make progress on the key goal of reducing the 32 ETH minimum stake. As a result, slot times may gradually increase to 16 seconds. We then use Rollup pre-confirmations or Based pre-confirmations to provide users with faster guarantees. What do we have now? An epoch-and-slot architecture.

The “they’re the same diagram” meme has been overused at this point, so I’ll just throw together an old diagram I drew a few years ago alongside the L2 pre-confirmation diagram to describe Gasper’s slot and epoch architecture and hopefully that drives the point home.

There is a deep philosophical reason why it seems hard to avoid using an epoch-and-slot architecture: it takes inherently less time to reach approximate consensus on something than it does to reach maximally enforced “economic finality” consensus on something.

One simple reason is the number of nodes. While the old linear decentralization/finality/overhead tradeoff looks much milder now thanks to hyper-optimized BLS aggregation and ZK-STARKs in the near future, the following points are still fundamentally true:

1. “Approximate consensus” only requires a minority of nodes, while economic finality requires a significant portion of all nodes.

2. Once the number of nodes exceeds a certain size, you will need to spend more time collecting signatures.

In Ethereum today, the 12 second slot is divided into three sub-slots for (i) block publication and distribution, (ii) attestation, and (iii) attestation aggregation. With a much smaller number of attesters, we can reduce this to two sub-slots and get the slot time down to 8 seconds. Another factor that is actually more important is the "quality" of the nodes. If we can also rely on a specialized subset of nodes to reach approximate agreement (and still use the full validator set for finality), we can get this down to about 2 seconds.

Therefore, I argue that (i) the slot-and-epoch architecture is clearly correct, but (ii) not all slot-and-epoch architectures are equal, and it would be valuable to explore the design space more fully. In particular, it would be worthwhile to explore options that are not as tightly intertwined as Gasper, but instead have a stronger separation of concerns between the two mechanisms.

What should L2 do?

In my opinion, there are currently three reasonable strategies that L2 can adopt:

1. They are “based” both technically and spiritually. That is, they are optimized as a channel for delivering the technical properties of Ethereum’s base layer and its value (highly decentralized, censorship-resistant, etc.). In their simplest form, you can think of these rollups as “branded shards,” but they can also be more ambitious than that, and have conducted a lot of experiments with new virtual machine designs and other technical improvements.

2. Be proud to be “a server with blockchain scaffolding” and make the most of it. If you start with a server, and then add (i) STARK validity proofs to ensure that the server follows the rules, (ii) guaranteed rights for users to exit or force transactions, and potentially (iii) freedom of collective choice, whether through coordinated mass exits or the ability to vote to change the sorter, then you have gained a lot of the benefits of on-chain while retaining much of the efficiency of a server.

3. Compromise: A fast chain with 100 nodes, with Ethereum providing additional interoperability and security. This is the de facto current roadmap for many L2 projects.

For some applications (e.g. ENS, keystores, some payments), 12 second block times are enough. For those applications where they are not enough, the only solution is a slot-and-epoch architecture. In all three cases, the "epoch" is Ethereum's SSF (perhaps we can redefine the acronym to mean something other than "single slot", for example, it could be "Secure Speedy Finality"). But in the three cases above, the "slot" is different:

1. Ethereum’s native slot-and-epoch architecture

2. Server pre-confirmation

3. Committee Pre-confirmation

A key question is how good can we make something in category (1)? In particular, if it gets really good, then category (3) doesn’t seem to make that much sense anymore. Category (2) will always exist, at least because anything “based” doesn’t work with off-chain data L2s, like Plasma and validium. But if Ethereum’s native slot-and-epoch architecture can get down to 1-second “slots” (i.e. pre-confirmation) times, then the space for category (3) becomes much smaller.

Today, we are far from having final answers to these questions. One key question — how complex block proposers will become — remains an area of ​​considerable uncertainty. Designs like Orbit SSF are very new, which suggests that the design space for slot-and-epoch designs of the era that designs like Orbit SSF are is still underexplored. The more options we have, the better we can do for users on L1 and L2, and we can make life easier for L2 developers.