Author: Ethereum founder Vitalik; Translated by: Deng Tong, Golden Finance
Note: This article is the fifth part of the series of articles "Possible futures of the Ethereum protocol, part 5: The Purge" recently published by Ethereum founder Vitalik. The fourth part can be found in "Vitalik: The future of Ethereum The Verge". The third part can be found in "Vitalik: The key goals of Ethereum The Scourge phase", the second part can be found in "Vitalik: How should the Ethereum protocol develop during The Surge phase", and the first part can be found in "What else can be improved in Ethereum PoS". The following is the full text of the fifth part:
Special thanks to Justin Drake, Tim Beiko, Matt Garnett, Piper Merriam, Marius van der Wijden, and Tomasz Stanczak for their feedback and comments.
One of the challenges with Ethereum is that, by default, any blockchain protocol will grow in bloat and complexity over time. This happens in two ways:
Historical data: Any transaction and any account created at any historical moment needs to be permanently stored by all clients and downloaded by any new client that fully synchronizes with the network. This causes client load and synchronization time to increase over time, even if the capacity of the chain remains the same.
Protocol features: It is much easier to add new features than to remove old ones, causing code complexity to increase over time.
For Ethereum to be sustainable in the long term, we need to exert strong counter-pressure to both of these trends, reducing complexity and bloat over time. But at the same time, we need to preserve one of the key properties of blockchains: durability. You can put an NFT, a love letter in transaction call data, or a smart contract containing a million dollars on chain, go into a cave for ten years, and come out to find it is still there waiting for you to read and interact with it. In order for dapps to feel comfortable fully decentralizing and removing upgrade keys, they need to be confident that their dependencies will not be upgraded in a way that breaks them - especially L1 itself.
The Purge, 2023 Roadmap.
If we put our minds to it, it is absolutely possible to strike a balance between these two needs, minimizing or reversing expansion, complexity, and decay while maintaining continuity. Organisms can do this: while most organisms age over time, a fortunate few do not. Even social systems can have extremely long lifespans. Ethereum has already succeeded in some cases: proof of work has disappeared, the SELFDESTRUCT opcode has largely disappeared, and beacon chain nodes already store old data for up to six months. Finding this path for Ethereum more generally, and moving toward a long-term stable end result, is the ultimate challenge for Ethereum's long-term scalability, technical sustainability, and even security.
The Purge: Primary Objectives
Reduce client storage requirements by reducing or eliminating the need for each node to permanently store all history, and perhaps even eventually declare
Reduce protocol complexity by eliminating unnecessary functionality
Historical data is out of date
What problem does it solve?
As of this writing, a fully synced Ethereum node requires about 1.1 TB of disk space for the execution client, plus a few hundred GB for the consensus client. The vast majority of this is historical data: data about historical blocks, transactions, and receipts, much of which is many years old. This means that even if the gas limit didn’t increase at all, the size of the node would increase by hundreds of GB per year.
What is it and how does it work?
A key simplifying property of the history storage problem is that, since each block points to the previous block via hash links (and other structures), consensus on the present is sufficient to reach consensus on the history. As long as the network agrees on the latest block, any historical block, transaction, or state (account balance, random number, code, storage) can be provided by any single participant with an accompanying Merkle proof, and that proof allows anyone else to verify its correctness. While consensus is an N/2-of-N trust model, history is a 1-of-N trust model.
This opens up a lot of options for how we store history. A natural choice is a network where each node stores only a small fraction of the data. This is how torrent networks have worked for decades: while the network stores and distributes millions of files in total, each participant stores and distributes only a few of them. Perhaps counterintuitively, this approach doesn’t even necessarily reduce the robustness of the data. If, by making it cheaper to run nodes, we can achieve a network with 100,000 nodes where each node stores a random 10% of the history, then each piece of data will be replicated 10,000 times — exactly the same replication factor as a 10,000 node network where each node stores everything.
Today, Ethereum has begun to move away from a model where all nodes store all history permanently. Consensus blocks (i.e. the part related to proof-of-stake consensus) are only stored for about 6 months. Blobs are only stored for about 18 days. EIP-4444 aims to introduce a one-year storage period for historical blocks and receipts. The long-term goal is to have a coordinated period (probably about 18 days) during which each node is responsible for storing everything, and then have a peer-to-peer network of Ethereum nodes that store old data in a distributed manner.
Erasure coding can be used to improve robustness while keeping the replication factor the same. In fact, blobs are already erasure coded to support data availability sampling. The simplest solution might be to reuse this erasure coding and put the execution and consensus block data into the blob as well.
What research is available?
EIP-4444:https://eips.ethereum.org/EIPS/eip-4444
Torrents 和 EIP-4444:https://ethresear.ch/t/torrents-and-eip-4444/19788
Portal Network: https://ethereum.org/en/developers/docs/networking-layer/portal-network/
Portal Network and EIP-4444: https://github.com/ethereum/portal-network-specs/issues/308
Distributed storage and retrieval of SSZ objects for Portal: https://ethresear.ch/t/distributed-storage-and-cryptographically-secured-retrieval-of-ssz-objects-for-portal-network/19575
How to Raise the Gas Limit (Paradigm): https://www.paradigm.xyz/2024/05/how-to-raise-the-gas-limit-2
What’s left to do, and what are the trade-offs?
The main work remaining involves building and integrating a concrete distributed solution for storing history — at least execution history, but eventually consensus and blobs as well. The simplest solutions are (i) simply bringing in existing torrent libraries, and (ii) an Ethereum-native solution called the Portal network. Once either of these are brought in, we can enable EIP-4444. EIP-4444 itself does not require a hard fork, but it does require a new network protocol version. Therefore, it is valuable to enable it for all clients at the same time, because otherwise clients may fail to connect to other nodes that expect to download the full history but do not actually achieve it.
The main trade-off involves how hard we try to make the “previous” history available. The simplest solution is to stop storing past data tomorrow and rely on existing archive nodes and various centralized providers for replication. This is easy, but it weakens Ethereum’s position as a permanent record of data. The harder but safer approach is to first build and integrate a torrent network to store history in a distributed way. There are two dimensions to “how hard we try” here:
How hard do we have to work to ensure that the maximum number of nodes actually stores all the data?
How deeply do we integrate history storage into the protocol?
For (1), the most rigorous approach would involve proof of custody: effectively requiring each proof-of-stake validator to store a certain percentage of the history, and periodically cryptographically checking that they are doing so. A more moderate approach would be to set a voluntary standard for the percentage of history that each client stores.
For (2), the basic implementation involves only the work that’s already done today: Portal already stores an ERA file containing the entire Ethereum history. A more thorough implementation would involve actually connecting it to the sync process, so that if someone wants to sync a full history storage node or archive node, they can do so by syncing directly from the Portal network, even if no other archive nodes are online.
How does it interact with the rest of the roadmap?
If we want to make it extremely simple to run or spin up a node, then reducing history storage requirements is arguably more important than statelessness: of the 1.1 TB required for a node, ~300 GB is state, and the remaining ~800 GB is history. The vision of an Ethereum node running on a smartwatch and taking only minutes to set up is only possible if both statelessness and EIP-4444 are achieved.
Limiting historical storage also makes it more feasible for newer Ethereum node implementations to only support the latest version of the protocol, which makes them simpler. For example, since the empty storage slots created during the 2016 DoS attack have all been deleted, many lines of code can be safely deleted. Now that the switch to proof-of-stake is history, clients can safely delete all proof-of-work related code.
Status data expired
What problem does it solve?
Even if we eliminated the need for clients to store history, client storage requirements would continue to grow, by about 50 GB per year, as the state keeps growing: account balances and nonces, contract code, and contract storage. Users could pay a one-time fee that would forever burden current and future Ethereum clients.
State is harder to "expire" than history, because the fundamental assumption of the design of the EVM is that once a state object is created, it will exist forever and can be read by any transaction at any time. If we introduce statelessness, some people think that this problem may not be so bad: only a specialized class of block builders need to actually store state, and all other nodes (even list generation!) can run statelessly. However, some people think that we don't want to rely too much on statelessness, and eventually we may want state to expire to keep Ethereum decentralized.
What is it and how does it work?
Today, when you create a new state object (which can be done in one of three ways: (i) sending ETH to a new account, (ii) creating a new account using code, (iii) setting up a previously untouched storage slot), that state object will stay in that state forever. Instead, what we want is for objects to automatically expire over time. The key challenge is to do this in a way that achieves three goals:
Efficiency: No significant additional computation is required to run the expiration process.
User-friendliness: If someone goes into a cave and comes back five years later, they shouldn’t lose access to their ETH, ERC20, NFT, CDP positions…
Developer-Friendliness: Developers don’t have to switch to a completely foreign mental model. In addition, today’s rigid and unupdated applications should continue to work reasonably well.
The problem is also easy to solve without satisfying these goals. For example, you could have each state object also store a counter to record its expiration date (which can be extended by burning ETH, which can happen automatically when reading or writing), and have a process that loops through the state to remove expired state objects. However, this introduces additional computation (and even storage requirements), and certainly doesn't meet the user-friendliness requirement. It's also hard for developers to reason about corner cases involving stored values sometimes resetting to zero. If you make the expiration timer contract-wide, this makes the developer's job technically easier, but makes it more difficult economically: developers have to think about how to "pass on" the ongoing storage costs to their users.
These are issues that the Ethereum core development community has been working on for years, including proposals such as “blockchain rent” and “regeneration.” Ultimately, we combined the best parts of the proposals and focused on two categories of “least bad known solutions”:
Partial state expiration solution.
State expiration proposal based on address cycles.
Some status expired
The partial state expiration proposals all follow the same principle. We split the state into chunks. Each one permanently stores a "top-level map" of which chunks are empty or non-empty. The data in each chunk is only stored if it was recently accessed. There is a "resurrection" mechanism where if a chunk is no longer stored, anyone can recover that data by providing proof of what the data was.
The main differences between these proposals are: (i) how do we define “recently”, and (ii) how do we define “block”? One specific proposal is EIP-7736, which builds on the “stem-and-leaf” design introduced for Verkle trees (although compatible with any form of stateless tree, such as binary trees). In this design, headers, code, and storage slots that are adjacent to each other are stored under the same “stem”. The data stored under a stem can be at most 256 * 31 = 7,936 bytes. In many cases, the entire header and code of an account, as well as many key storage slots, will be stored under the same stem. If data under a given stem has not been read or written for 6 months, the data is no longer stored, and only a 32-byte commitment to the data (a “stub”) is stored. Future transactions that access that data will need to “resurrect” the data and provide a proof that it checks against the stub.
There are other ways to implement similar ideas. For example, if the account spacing is not enough, we can make a scheme where each 1/232 part of the tree is controlled by a similar stem-and-leaf mechanism.
This is trickier because of incentives: an attacker could force clients to store large amounts of state permanently by putting large amounts of data into a single subtree and sending a single transaction every year to "update the tree". If you make the update cost proportional to the tree size (or the update duration inversely proportional to the tree size), then someone could potentially harm another user by putting large amounts of data into the same subtree as them. One could try to limit both of these issues by making the account spacing dynamic with respect to the subtree size: for example, each consecutive 2^16 = 65536 state objects could be considered a "group". However, these ideas are more complex; stem-based approaches are simple and can align incentives because typically all data under a stem is related to the same application or user.
Address cycle-based state expiration proposal
What if we want to completely avoid any permanent state growth, even with a 32-byte stub? Here's a hard problem: what if a state object is deleted, and a later EVM execution puts another state object in the exact same place, but then someone who cares about the original state object comes back and tries to restore it? For partial state expiration, the "stub" prevents new data from being created. For full state expiration, we can't even store the stub.
An address cycle based design is the best way to solve this problem. Instead of storing the entire state in one state tree, we have a growing list of state trees, and any state read or written is saved in the latest state tree. A new empty state tree is added once per cycle (think: 1 year). Older state trees are fixed. Full nodes only need to store the two most recent trees. If a state object has not been touched in two cycles and thus falls into the expired tree, it can still be read or written, but transactions need to prove a Merkle proof for it - once this is done, a copy is saved in the latest tree again.
A key idea that makes this all user- and developer-friendly is the concept of address cycles. An address cycle is a number that is part of an address. A key rule is that an address with address cycle N can only be read from or written to during or after cycle N (i.e. when the state trie list reaches length N). If you are saving a new state object (e.g. a new contract or a new ERC20 balance), if you make sure you put the state object into a contract with address cycle N or N-1, then you can save it right away without proof that nothing was there before. On the other hand, any state additions or edits to old address cycles require proof.
This design retains most of Ethereum's current properties, requires very little additional computation, allows applications to be written almost as they are today (ERC20 would need to be rewritten to ensure that the balances of addresses with address period N are stored in a subcontract that itself has address period N), and solves the "users in caves for five years" problem. However, it has one big problem: addresses need to be expanded to more than 20 bytes to fit into the address period.
Address space extension
One proposal is to introduce a new 32-byte address format that includes a version number, an address period number, and an extended hash value.
0x01000000000157aE408398dF7E5f4552091A69125d5dFcb7B8C2659029395bdF
The red is the version number. The four orange zeros here represent a blank space to accommodate the shard number in the future. The green is the address cycle number. The blue is the 26-byte hash value.
The key challenge here is backward compatibility. Existing contracts are designed around 20-byte addresses, and often use tight byte packing techniques that explicitly assume addresses are exactly 20 bytes long. One idea to address this is to use a transition graph, where legacy contracts interacting with new-style addresses will see the 20-byte hash of the new-style address. However, making this secure requires considerable effort.
Address space shrinkage
Another approach does the opposite: we immediately ban some subrange of addresses of size 2128 (e.g. all addresses starting with 0xffffffff), and then use that range to introduce addresses with an address period and a 14-byte hash value.
0xffffffff000169125d5dFcb7B8C2659029395bdF
The key sacrifice this approach makes is that it introduces a security risk for counterfactual addresses: addresses that hold assets or permissions but whose code has not yet been published to the chain. The risk involves someone creating an address that claims to hold a piece of (as-yet-unpublished) code, but there is another valid piece of code that hashes to the same address. Computing such a collision requires 280 hashes today; address space shrinkage will reduce this number to a very accessible 256 hashes.
The key risk area, counterfactual addresses of wallets that are not held by a single owner, is a relatively rare situation today, but will likely become more common as we move into a multi-L2 world. The only solution is to simply accept this risk, but identify all common use cases where it could go wrong, and come up with effective workarounds.
What research is available?
Early Proposals
Blockchain rental fee: https://github.com/ethereum/EIPs/issues/35
Regenesis:https://ethresear.ch/t/regenesis-resetting-ethereum-to-reduce-the-burden-of-large-blockchain-and-state/7582
Ethereum state size management theory: https://hackmd.io/@vbuterin/state_size_management
Several possible paths for stateless and state expiration: https://hackmd.io/@vbuterin/state_expiry_paths
Some status expired proposals
EIP-7736:https://eips.ethereum.org/EIPS/eip-7736
Address Space Extension Documentation
Original proposal: https://ethereum-magicians.org/t/increasing-address-size-from-20-to-32-bytes/5485
Ipsilon Notes: https://notes.ethereum.org/@ipsilon/address-space-extension-exploration
Blog post comments: https://medium.com/@chaisomsri96/statelessness-series-part2-ase-address-space-extension-60626544b8e6
What would happen if we lost collision resistance: https://ethresear.ch/t/what-would-break-if-we-lose-address-collision-resistance/11356
What is left to do? What are the trade-offs?
I think there are four possible paths forward:
We are stateless, and never introduce state expiration. The state grows (albeit slowly: we probably won’t see it exceed 8 TB for decades), but only needs to be held by a relatively specialized class of users: not even PoS validators need state.
One feature that requires access to part of the state is inclusion list generation, but we can implement this in a decentralized way: Each user is responsible for maintaining the part of the state tree that contains their own account. When they broadcast a transaction, they broadcast proofs of the state objects that were accessed during the verification step (this works for both EOA and ERC-4337 accounts). Stateless validators can then combine these proofs into a proof of the entire inclusion list.
We implement partial state expiration and accept a much lower, but still non-zero, permanent state size growth rate. This outcome could be similar to the history expiration proposal involving peer-to-peer networks, which accepts a permanent history storage growth rate where each client must store a low but fixed percentage of history data, but with a much lower, but still non-zero growth rate.
We do have stated expiration dates and are extending the address space. This will involve a multi-year process to ensure that the address format conversion method is effective and safe, including for existing applications.
We did declare an expiration date and shrink the address space. This will involve a multi-year process to ensure that all security risks involving address conflicts (including cross-chain situations) are addressed.
An important point is that whether or not a state expiration scheme that relies on address format changes is implemented, the hard problem of address space expansion and contraction must eventually be solved. Today, it takes about 2^80 hashes to produce an address collision, and this computational load is already feasible for extremely resourceful actors: a GPU can do about 2^27 hashes, so it can calculate 2^52 in a year of operation, so all about 2^30 GPUs in the world can calculate a collision in about 1/4 of a year, and FPGAs and ASICs can accelerate this process further. In the future, such attacks will be open to more and more people. Therefore, the actual cost of implementing full state expiration may not be as high as it seems, because we have to solve this very challenging address problem anyway.
How does it interact with the rest of the roadmap?
Doing state expiration will likely make transitioning from one state tree format to another easier, since no conversion process is required: you can simply start making a new tree using the new format, and then do a hard fork later to convert the old tree. So while state expiration is complex, it does help simplify other aspects of the roadmap.
Function cleanup
What problem does it solve?
One of the key prerequisites for security, accessibility, and trusted neutrality is simplicity. If a protocol is beautiful and simple, it is less likely to have bugs. It increases the chances that new developers will be able to come on board and use any part of it. It is more likely to be fair and easier to defend against special interests. Unfortunately, protocols, like any social system, by default become more complex over time. If we don’t want Ethereum to fall into a black hole of increasing complexity, we need to do one of two things: (i) stop making changes and ossifying the protocol, or (ii) be able to actually remove features and reduce complexity. A middle path, where fewer changes are made to the protocol while removing at least a little complexity over time, is also possible. This section will discuss how to reduce or eliminate complexity.
What is it and how does it work?
There is no big single fix that will reduce protocol complexity; the nature of the problem is that there are many small fixes.
A mostly completed example that can serve as a blueprint for how to handle other issues is to remove the SELFDESTRUCT opcode. The SELFDESTRUCT opcode is the only opcode that can modify an unlimited number of storage slots within a single block, requiring the client to implement greater complexity to avoid DoS attacks. The original purpose of this opcode was to implement voluntary state cleanup, allowing the state size to be reduced over time. In fact, few people end up using it. This opcode was weakened to only allow self-destructing accounts created in the same transaction in the Dencun hard fork. This resolves DoS issues and allows for significant simplification of client code. In the future, it might make sense to eventually remove this opcode entirely.
Some key examples of protocol simplification opportunities that have been identified so far include the following. First, some examples outside of the EVM; these are relatively non-invasive and therefore easier to reach consensus on and implement in a shorter time.
RLP → SSZ conversion: Originally, Ethereum objects were serialized using an encoding called RLP. Today, the beacon chain uses SSZ, which is significantly better in many ways, including supporting not only serialization but also hashing. Eventually, we hope to get rid of RLP completely and move all data types to SSZ structures, which in turn will make upgrades much easier. The EIPs currently proposed for this include [1] [2] [3].
Removing legacy transaction types: There are too many transaction types today, and many of them could potentially be removed. A more modest alternative to complete removal is an account abstraction feature, whereby Smart Accounts could contain code to handle and validate legacy transactions if they so wish.
LOG Reform: Log creation bloom filters and other logic add complexity to the protocol, but are too slow for clients to actually use. We can remove these features and instead invest our efforts into alternatives, such as out-of-protocol decentralized log reading tools using modern techniques like SNARKs.
Eventually remove the Beacon Chain Sync Committee mechanism: The Sync Committee mechanism was originally introduced to implement light client verification for Ethereum. However, it adds complexity to the protocol. Eventually, we will be able to verify the Ethereum consensus layer directly using SNARKs, which will eliminate the need for a dedicated light client verification protocol. By creating a more "native" light client protocol that involves verifying signatures from a random subset of Ethereum consensus validators.
Data format harmonization: Today, execution state is stored in Merkle Patricia trees, consensus state is stored in SSZ trees, and blobs are committed in the form of KZG commitments. In the future, it makes sense to create a single unified format for block data and state. These formats will cover all important requirements: (i) simple proofs for stateless clients, (ii) serialization and erasure coding of data, (iii) standardized data structures.
Removing the Beacon Chain Committee: This mechanism was originally introduced to support a specific version of execution sharding. Instead, we ended up sharding via L2 and blobs. As such, committees are unnecessary, and work is underway to remove them.
Remove mixed endianness: EVM is big endian, consensus layer is little endian. It might make sense to reconcile and make everything big endian (probably big endian because EVM is harder to change).
Now, some examples inside the EVM:
Simplify gas mechanism: Current gas rules are not well optimized to clearly limit the amount of resources required to validate a block. Key examples of this include (i) storage read/write costs, which are intended to limit the number of reads/writes in a block, but are currently very arbitrary, and (ii) memory padding rules, where it is currently difficult to estimate the maximum memory consumption of the EVM. Suggested fixes include stateless gas cost changes, reconciling all storage-related costs into a simple formula, and memory pricing proposals.
Removing precompiles: Many of the precompiles Ethereum has today are both unnecessarily complex and relatively unused, and account for a large percentage of consensus failure near misses, yet are not actually used by any applications. Two ways to deal with this are to (i) remove the precompile outright, and (ii) replace it with (inevitably more expensive) EVM code that implements the same logic. This draft EIP proposes to do this first for identity precompiles; later, RIPEMD160, MODEXP, and BLAKE may be removed.
Remove gas observability: Make it so that an EVM execution can no longer see how much gas it has left. This will break some applications (most notably sponsored transactions), but will make it easier to upgrade in the future (e.g. to more advanced multi-dimensional gas versions). The EOF spec already makes gas unobservable, but to help with protocol simplicity, EOF needs to be mandatory.
Improvements to static analysis: Today's EVM code is difficult to statically analyze, especially because jumps can be dynamic. This also makes it harder to make optimized EVM implementations (precompile EVM code into other languages). We can fix this by removing dynamic jumps (or making them more expensive, e.g. gas cost linear in the total number of JUMPDESTs in the contract). EOF does this, but getting the protocol simplification benefits from it requires making EOF mandatory.
What research is available?
Next steps for the Purge: https://notes.ethereum.org/I_AIhySJTTCYau_adoy2TA
SELF DESTRUCT:https://hackmd.io/@vbuterin/selfdestruct
SSZ-ification EIPS:[1] [2] [3]
Stateless gas cost changes: https://eips.ethereum.org/EIPS/eip-4762
Linear memory pricing: https://notes.ethereum.org/ljPtSqBgR2KNssu0YuRwXw
Precompile removal: https://notes.ethereum.org/IWtX22YMQde1K_fZ9psxIg
Bloom filter removal: https://eips.ethereum.org/EIPS/eip-7668
A method for off-chain secure log retrieval using incremental verifiable computation (read: recursive STARKs): https://notes.ethereum.org/XZuqy8ZnT3KeG1PkZpeFXw
What else needs to be done, and what are the trade-offs?
The main trade-offs in making this kind of feature simplification are (i) how much and how fast we simplify vs. (ii) backward compatibility. The value of Ethereum as a chain is that it is a platform where you can deploy an application and be confident that it will still work years later. At the same time, it is possible to take this ideal too far and, in the words of William Jennings Bryan, “nail Ethereum to the cross of backward compatibility.” If only two applications in all of Ethereum use a feature, one of which has had no users for years and the other of which has almost no use at all, and together they derive $57 of value from it, then we should remove that feature, paying the victim $57 out of pocket if necessary.
The broader societal problem is to create a standardized pipeline for making non-urgent, backwards-compatibility breaking changes. One way to address this is to examine and extend existing precedents, such as the SELFDESTRUCT process. That pipeline would look something like this:
Step 1: Start a discussion about removing feature X.
Step 2: Perform an analysis to determine how disruptive removing X would be to the application, and based on the results, choose to (i) abandon the idea, (ii) proceed as planned, or (iii) determine a modified “least disruptive” way to remove X and proceed.
Step 3: Make a formal EIP to deprecate X. Make sure popular high-level infrastructure (e.g. programming languages, wallets) respect this and stop using the feature.
Step 4: Finally, actually delete X.
There should be a multi-year process between steps 1 and 4, with clear articulation of which projects are in which step. At this point, there is a trade-off between being aggressive and fast in the feature removal process versus being more conservative and putting more resources into other areas of protocol development, but we are far from the Pareto frontier.
EOF
The main set of changes proposed for the EVM is the EVM Object Format (EOF). EOF introduces a number of changes such as disabling gas observability, code observability (i.e. no CODECOPY), and only allowing static jumps. The goal is to allow more upgrades to the EVM to have stronger properties while maintaining backward compatibility (as the pre-EOF EVM still exists).
The benefit of this is that it creates a natural path for adding new EVM features and encourages migration to a stricter EVM with stronger guarantees. The downside is that it significantly increases the complexity of the protocol, unless we can find a way to eventually deprecate and remove the old EVM. A major question is: what role does EOF play in EVM simplification proposals, especially if the goal is to reduce overall EVM complexity?
How does it interact with the rest of the roadmap?
Many of the "Improvement" proposals in the rest of the roadmap are also opportunities to simplify older features. To repeat some of the examples above:
Switching to single-slot finality gives us the opportunity to remove committees, reformulate economics, and make other proof-of-stake related simplifications.
Fully implementing account abstraction allowed us to remove much of the existing transaction processing logic by moving it into a piece of “default account EVM code” that all EOAs could replace.
If we move the Ethereum state to a binary hash trie, this can be reconciled with a new version of the SSZ so that all Ethereum data structures can be hashed in the same way.
A more radical approach: converting large parts of the protocol into contract code
A more radical Ethereum simplification strategy would be to keep the protocol as is, but move large parts of the protocol from protocol functionality to contract code.
The most extreme version would be to have Ethereum L1 “technically” just the beacon chain, and introduce a minimal VM (like RISC-V, Cairo, or something simpler specifically for proof systems) that allows anyone else to create their own rollups. The EVM would then be the first of these rollups. Ironically, this is exactly the same outcome as the 2019-20 execution environment proposal, although SNARKs make it much more feasible to actually implement.
A more modest approach would be to keep the relationship between the beacon chain and the current Ethereum execution environment unchanged, but swap the EVM in place. We could choose RISC-V, Cairo, or some other VM as the new "official Ethereum VM", and then force all EVM contracts to be converted to the new VM code (either by compilation or interpretation) that interprets the logic of the original code. In theory, this could even be done with the "target VM" as the EOF version.