By: Vitalik Buterin

Compiled by Peng Sun, Foresight News

Two and a half years ago, I wrote in “Endgame” that different paths forward for blockchain looked very similar, at least technically. In both cases, there are a large number of transactions on the chain, and processing these transactions requires: (1) a large amount of computation; and (2) a large amount of data bandwidth. Even with powerful software engineering and Verkle trees, ordinary Ethereum nodes (such as the 2 TB reth archive node running on my computer right now) are not powerful enough to directly verify the huge amount of data and computation. Instead, in both “L1 sharding” and Rollup-centric solutions, ZK-SNARKs are used to verify computation and DAS is used to verify data availability. Whether it is L2 sharding or Rollup, DAS is the same, and ZK-SNARKs technology is the same. They are both smart contract code and a function of the protocol. In a real technical sense, Ethereum is sharding, and Rollup is sharding.

This naturally leads to the question: what is the difference between the two? One is that the consequences of code vulnerabilities are different: in Rollup, tokens can be stolen; in sharding, consensus can break down. But I expect that as the protocol becomes more robust and formal verification techniques improve, the impact of code vulnerabilities will become less and less. So, what are the other differences between these two solutions that may exist for a long time?

Diversity of execution environments

One idea we briefly discussed in Ethereum in 2019 was execution environments. Essentially, Ethereum would have different “zones” that could have different rules for accounts (including completely different approaches like UTXOs), how the virtual machine works, and other features. This would allow for diversity of approach in various parts of the stack, which would have been difficult to achieve if Ethereum tried to have multiple functions in one.

In the end, we abandoned some of the more ambitious plans and kept only the EVM. However, Ethereum L2 (including rollups, valdiums, and Plasmas) can be said to have ultimately served as an execution environment. Currently, we generally focus on EVM-equivalent L2, but in fact ignore the diversity brought by many other approaches:

  • Arbitrum Stylus, which adds a second WASM-based oracle in addition to the EVM;

  • Fuel, which uses a UTXO-based architecture similar to Bitcoin (but more fully featured);

  • Aztec, which introduces a new language and programming paradigm designed around privacy-preserving smart contracts based on ZK-SNARKs.

UTXO-based architecture, source: Fuel documentation

We can try to make EVM a super virtual machine that covers all possible paradigms, but doing so will greatly reduce the efficiency of each function. It is better to let these platforms do what they are good at.

Security tradeoff: scalability and transaction speed

Ethereum L1 provides very strong security guarantees. If certain data is included in a finalized block on L1, the entire consensus (including social consensus in extreme cases) works to ensure that this data cannot be modified, that any execution triggered by this data cannot be reverted, and that this data remains accessible. Ethereum L1 is willing to accept high costs to achieve this security guarantee. At the time of writing, transaction fees are relatively low: Layer2 charges less than 1 cent per transaction, and even on L1, basic ETH transfers are less than $1. If the technology improves quickly enough and the growth of available block space keeps up with the growth of demand, then these fees may remain low in the future, but they may not. For many non-financial applications (such as social media or games), even $0.01 per transaction is too high.

But social media and gaming don’t require the same security model as L1. It doesn’t matter if someone can pay a million dollars to undo their loss in a game of chess, or make one of your tweets look like it was posted three days after it actually was. Therefore, these applications shouldn’t pay the same security costs. L2 schemes achieve this by supporting a range of data availability methods from rollups to plasma to validiums.

Different L2 types are suitable for different use cases. Read more.

Another tradeoff arises around the problem of transferring assets from L2 to L2. I expect that in the next 5-10 years, all Rollups will be ZK Rollups, and super-efficient proof systems like Binius and Circle STARKs with lookups, coupled with a proof aggregation layer, will make it possible for L2 to provide final state roots at every slot. But for now, we can only use a complex mixture of Optimistic Rollup and ZK Rollup with different proof time windows. If we implement execution sharding in 2021, the security model for keeping the shards honest will be Optimitic Rollup, not ZK, so L1 will have to manage the complex fraud proof logic of the on-chain system and withdrawal times of up to a week to transfer assets between shards. But like the code vulnerability, I think this problem will ultimately be temporary.

Transaction speed is the third and more permanent aspect of the security tradeoff. Ethereum produces blocks every 12 seconds, and it won’t go faster without becoming too centralized. However, many L2s are exploring ways to compress block times to a few hundred milliseconds. 12 seconds isn’t too bad: users have to wait an average of about 6-7 seconds after submitting a transaction for it to be included in a block (not just 6 seconds, because the next block might not include them). This is comparable to how long I have to wait when I pay with a credit card. However, many applications need faster speeds, and L2 can do it.

To make it faster, L2 has a preconfirmation mechanism: L2’s own validators digitally sign a promise to include the transaction at a certain time, and if the transaction is not included, they will be punished. The StakeSure mechanism further promotes this mechanism.

L2 Pre-confirmation

Now, we could try to implement all of this at L1. L1 could include a system of "fast pre-confirmations" and "slow final confirmations". It could include different shards with different levels of security. However, this would increase the complexity of the protocol. In addition, doing all the work at L1 would risk overloading consensus, because many larger scale or higher throughput approaches have higher centralization risks or require stronger forms of "governance" that would have ripple effects to other parts of the protocol if done at L1. Ethereum can largely avoid these risks by providing a compromise through L2.

The benefits of Layer2 to organizations and culture

Imagine a country that is split in two, with one half becoming capitalist and the other half becoming highly state-dominated (unlike what happens in reality, assume that in this thought experiment, this is not the result of any traumatic war, but rather that a border just appears naturally one day and that's it). In the capitalist part, the restaurants are all made up of different decentralized ownership, blockchains, and electoral rights. In the state-dominated country, they are all branches of the government, just like the police department. On the first day, not much will change. People will basically follow existing habits, and what works and what doesn't depends on technical realities such as labor skills and infrastructure. However, after a year, you will see huge changes because different incentives and control structures will lead to huge changes in behavior, which will affect people coming and going, what gets built, what gets maintained, and what gets abandoned.

Industrial organization theory talks a lot about these distinctions: not only between a government-run economy and a capitalist economy, but also between an economy dominated by large franchises and one where each supermarket is run by an independent entrepreneur. I think the distinction between an L1-centric ecosystem and an L2-centric ecosystem is similar.

There is something wrong with the "core developers manage everything" architecture

As an L2-centric ecosystem, I believe Ethereum’s main advantages are as follows:

Since Ethereum is an L2-centric ecosystem, you have the freedom to independently build a sub-ecosystem with its own unique features while also being part of the larger Ethereum.

If you are just building an Ethereum client, then you are part of the larger Ethereum, and although you have some room for innovation, it is far less than L2. If you are building a completely independent chain, your creative space will be very large, but you also lose the benefits of shared security and shared network effects. L2 is a good balance point.

Not only does it provide a technical opportunity to try out new execution environments and security tradeoffs that enable scalability, flexibility, and speed, it also provides an incentive mechanism for both developers to build and maintain, and for the community to support.

The fact that each L2 is isolated also means that deploying new approaches is permissionless: there is no need to convince all the core developers that your new approach is "safe" for the rest of the chain. If your L2 fails, it's on you. Anyone can come up with a weird idea (e.g., Intmax's Plasma approach) and they can go ahead and build and eventually deploy it even if the Ethereum core developers are completely unfocused. This is not the case with L1 features and precompiles, and even in Ethereum, the success or failure of L1 development ultimately often depends on politics to a greater extent than we would like. Regardless of what can be built in theory, the different incentive mechanisms created by L1-centric ecosystems and L2-centric ecosystems will ultimately have a significant impact on what is actually built, the level of quality, and the order in which it is built.

What challenges does Ethereum’s L2-centric ecosystem face?

There is something wrong with the L1 + L2 architecture. Image source: Reddit

This L2-centric approach faces a key challenge that L1-centric ecosystems don’t face nearly as much: coordination. In other words, while Ethereum has a lot of L2, the challenge is how to make it still feel like “Ethereum” and have the network effects of Ethereum, rather than N independent chains. Today, this situation is unsatisfactory in many ways:

  • Cross-chain between L2 usually requires a centralized cross-chain bridge, which is very complicated for ordinary users. If you have tokens on Optimism, you can’t paste someone else’s Arbitrum address into your wallet to send funds.

  • Cross-chain smart contract wallet support is not very good for personal smart contract wallets and organizational wallets (including DAOs). If you change the key on one L2, you also need to change the key on each other L2.

  • Decentralized validation infrastructure is generally lacking. Ethereum is finally starting to get decent light clients, such as Helios. But it doesn't make sense if all activity happens on L2 and requires its own centralized RPC. In principle, it's not hard to build a light client for L2 once you have Ethereum block headers; but in practice, too little attention has been paid to this.

The community is working hard to improve all three of these areas. For cross-chain token swaps, the ERC-7683 standard is a new solution that, unlike existing "centralized cross-chain bridges," does not have any fixed centralized nodes, tokens, or governance. For cross-chain accounts, most wallets take the approach of using cross-chain replayable messages to update keys in the short term and keystore rollups in the long term. Light clients for L2 are beginning to emerge, such as Beerus for Starknet. In addition, recent improvements to the user experience through next-generation wallets have solved more basic problems, such as allowing users to access DApps without having to manually switch networks.

Rabby provides a comprehensive view of multi-chain asset balances, which was not possible with previous wallets!

But it is important to recognize that an L2-centric ecosystem will have a hard time trying to coordinate to some extent. Because a single L2 has no natural economic incentive to build infrastructure for coordination: small-scale L2s will not do so because they only want to gain a small part of the benefits; large-scale L2s will not do so because they can gain just as much or more benefits from strengthening their own local network effects. If each L2 only thinks about itself, and no one thinks about how to match the broader Ethereum system, then we will fail, just like the urbanized utopia in the pictures above.

It’s hard to say there’s a perfect solution to this problem. I can only say that the ecosystem needs to more fully recognize that cross-L2 infrastructure is a type of Ethereum infrastructure just like L1 clients, development tools, and programming languages, and therefore should be valued and funded. We have the Protocol Guild, maybe we need the Basic Infrastructure Guild.

Summarize

In various public discussions, “L2” and “sharding” are often seen as two opposing strategies for blockchain scaling. However, when you look at the underlying technology, you will find a puzzle: the actual underlying scaling methods are exactly the same. Whether it is data sharding, fraud validators or ZK-SNARK validators, or solutions for cross-“Rollup, shard” communication, the main difference is: who is responsible for building and updating these components, and how much autonomy do they have?

An L2-centric ecosystem is sharding in the true technical sense, but in sharding you can build your own shards with your own rules. This is extremely powerful, and allows for unlimited creativity and a lot of autonomous innovation. But it also presents some key challenges, especially around coordination. For an L2-centric ecosystem like Ethereum to succeed, it must understand these challenges and address them head-on in order to get as many benefits of an L1-centric ecosystem as possible and get as close to the best of both worlds as possible.