EIP-4444 can solve the problem of Ethereum's historical growth and leave room for an increase in the Gas limit.

Related reading: "Paradigm: Challenges and solutions to Ethereum state growth"

撰文:Storm Slivkoff、Georgios Konstantopoulos

Compiled by: Luffy, Foresight News

History growth is currently the biggest bottleneck for Ethereum’s expansion. Surprisingly, history growth has become a bigger problem than state growth. Within a few years, history data will exceed the storage capacity of many Ethereum nodes.

The good news is:

  • History growth is a much easier problem to solve than state growth.

  • A solution is already under active development.

  • Solving the history growth will alleviate the state growth problem.

In this post, we continue our study of Ethereum scaling from Part 1, now shifting our focus from state growth to historical growth. Using a refined dataset, our goals are to 1) technically understand Ethereum’s scaling bottlenecks, and 2) help inform the discussion around the optimal solution to Ethereum’s gas limit.

What is historical growth?

History is the collection of all blocks and transactions executed by Ethereum throughout its life cycle. It is all data from the genesis block to the current block. History growth is the accumulation of new blocks and new transactions over time.

Figure 1 shows the relationship between history growth and various protocol metrics and Ethereum node hardware constraints. History growth is limited by a different set of hardware constraints than state growth. History growth puts pressure on network IO because new blocks and transactions must be transmitted throughout the network. History growth also puts pressure on node storage space because every Ethereum node stores a complete copy of the history. If history growth is fast enough to exceed these hardware constraints, the node will no longer be able to reach a stable consensus with its peers. For an overview of state growth and other scaling bottlenecks, see Part 1 of this series.

Figure 1: Ethereum scaling bottleneck

Until recently, most of each node's network throughput was used to transmit history (such as new blocks and transactions). This changed with the introduction of blobs in the Dencun hard fork. Blobs now account for a large portion of node network activity. However, blobs are not considered part of the history because 1) they are only stored by nodes for 2 weeks and then discarded, and 2) they are not required to repeat data from Ethereum's creation. Due to (1), blobs do not significantly increase the storage burden of each Ethereum node. We'll discuss blobs later in this article.

In this article, we will focus on history growth and discuss the relationship between history and state. Since state growth and history growth have some overlapping hardware constraints, they are related problems and solving one can help solve the other.

How fast has the historical growth been?

Figure 2 shows the historical growth rate since Ethereum’s genesis. Each vertical line represents one month of growth. The y-axis represents the number of gigabytes of historical growth for that month. Transactions are categorized by their “destination address” and are expressed in RLP bytes. Contracts that cannot be easily identified are classified as “unknown.” The “other” category includes a range of subcategories such as infrastructure and games.

Figure 2: Ethereum’s historical growth rate over time

A few key takeaways from the above chart:

  • History is growing 6-8 times faster than state: History recently peaked at 36.0 GiB/month and is currently at 19.3 GiB/month. State has peaked at about 6.0 GiB/month and is currently at 2.5 GiB/month. A comparison of history and state in terms of growth and cumulative size is shown later in this article.

  • Prior to Decun, the historical growth rate had been accelerating: while the state had been growing roughly linearly for many years (see Part 1), the history was superlinear. Given that a linear growth rate would lead to a quadratic growth in overall size, a superlinear growth rate would lead to a more than quadratic growth in overall size. This acceleration stopped abruptly after Decun. This was the first time Ethereum experienced a significant drop in the historical growth rate.

  • Most of the recent historical growth comes from Rollup: each L2 publishes a copy of its transactions back to the mainnet. This generates a large amount of history and has caused Rollup to be the most important contributor to historical growth over the past year. However, Dencun allows L2s to publish their transaction data using blobs instead of history, so Rollup no longer generates the majority of Ethereum history. We will cover Rollup in more detail later in this article.

Who is the biggest contributor to Ethereum's historical growth?

The historical number of contracts generated by different contract categories reveals how Ethereum usage patterns have evolved over time. Figure 3 shows the relative contribution of various contract categories. This is the same data as Figure 2, normalized.

Figure 3: Contribution of different contract types to historical growth

The data reveals four distinct periods of Ethereum usage patterns:

  • Early (purple): Ethereum’s first few years saw little on-chain activity. Most of these early contracts are difficult to identify now and are marked as “unknown” in the chart.

  • The ERC-20 Era (Green): The ERC-20 standard was finalized in late 2015, but did not gain significant development until 2017 and 2018. ERC-20 contracts became the largest source of historical growth in 2019.

  • DEX/DeFi Era (brown): DEX and DeFi contracts appeared on-chain as early as 2016 and began to gain traction in 2017. But they did not become the largest category in terms of historical growth until the DeFi Summer of 2020. DeFi and DEX contracts accounted for more than 50% of historical growth in 2021 and parts of 2022.

  • Rollup Era (Gray): L2 Rollups start executing more transactions than mainnet in early 2023. In the months before Dencun, they generated about 2/3 of Ethereum history.

Each era represents a more complex usage pattern for Ethereum than the one before it. Complexity can be seen as a form of Ethereum scaling over time, which cannot be measured by simple metrics like transactions per second.

In the most recent data month (April 2024), Rollups no longer generate the majority of history. It is unclear whether future history will come from DEXs and DeFi, or if some new usage pattern will emerge.

What about blobs?

The Dencun hard fork introduced blobs, which significantly changed the historical growth dynamics by allowing Rollup to publish data using cheap blobs instead of historical records. Figure 4 zooms in on Dencun’s historical growth rate before and after the upgrade. The chart is similar to Figure 2, except each vertical line represents a day instead of a month.

Figure 4: Dencun’s impact on historical growth

We can draw several key conclusions from this chart:

  • Since Dencun, the historical growth of rollups has dropped by about 2/3: most rollups have converted from call data to blobs, which has greatly reduced the amount of history they generate. However, as of April 2024, there are still some rollups that have not yet converted from call data to blobs.

  • Total historical growth has dropped by about 1/3 since Dencun: Dencun only reduced historical growth for rollups. Historical growth for other contract categories increased slightly. Even after Dencun, historical growth is still 8x the growth of state (see next section for details).

While blobs have reduced the historical growth rate, they are still a new feature of Ethereum and it is unclear what level the historical growth rate would stabilize at with blobs in place.

How fast is historical growth acceptable?

Increasing the gas limit will increase the historical growth rate. Therefore, proposals to increase the gas limit (such as Pump the Gas) must consider the relationship between historical growth and the hardware bottleneck of each node.

To determine an acceptable historical growth rate, we first need to understand how long the current node hardware can sustain in terms of networking and storage. Networking hardware can probably maintain the status quo indefinitely, as the historical growth rate is unlikely to return to its pre-Dencun peak before the gas limit is increased. However, the storage burden of history will continue to increase over time. Under the current storage strategy, it is inevitable that each node's storage hard disk will eventually be filled with historical records.

Figure 5 shows the storage burden of Ethereum nodes over time and forecasts the growth of storage burden over the next 3 years. The forecast refers to the growth rate in April 2024. The growth rate may increase or decrease as usage patterns or gas limits change in the future.

Figure 5: Size of history, state, and full node storage burden

We can draw several key conclusions from this figure:

  • History takes up about 3 times as much storage space as state. This difference grows over time, as history grows about 8 times as fast as state.

  • 1.8 TiB is the critical threshold, and many nodes will be forced to upgrade their storage hard disks. 2TB is a common storage hard disk size, which only provides 1.8TiB of free space. Note that TB (1 trillion bytes) is a different unit from TiB (= 1024^4 bytes). For many node operators, the "real" critical threshold is even lower, because after the merger, validators must run a consensus client together with the execution client.

  • The critical threshold will be reached in 2-3 years. Increasing the gas limit by any amount will accelerate this time accordingly. Reaching this threshold will impose a non-trivial maintenance burden on node operators and require the purchase of additional hardware (e.g. $300 NVME drives).

Unlike state data, history data is append-only and is accessed much less frequently. Therefore, in theory, history data can be stored separately from state data on cheaper storage media. This can be achieved by some clients such as Geth.

In addition to storage capacity, network IO is another major limitation to historical growth. Unlike storage capacity, network IO limitations will not cause problems for nodes in the short term, but these limitations will become important with future increases in gas limits.

To understand how much historical growth the network capacity of a typical Ethereum node can support, one must know the relationship between historical growth and various network health metrics, such as reorg rate, slot misses, finality misses, attestation misses, sync committee misses, and block commit latency. Analysis of these metrics is beyond the scope of this article, but more information can be found in previous surveys of consensus layer health. Additionally, the Ethereum Foundation’s Xatu project has been building public datasets to expedite such analysis.

How to solve the historical growth problem?

History growth is a much easier problem to solve than state growth. It can be almost completely solved by candidate proposal EIP-4444. This EIP changes each node from saving the entire Ethereum history to only saving one year of history. After implementing EIP-4444, data storage will no longer be a bottleneck for Ethereum's expansion, and gas limit increases will no longer be a constraint in the long run. EIP -4444 is necessary for the long-term sustainability of the network, otherwise the history growth rate will be very fast and the hardware of the network nodes will need to be regularly updated.

Figure 6 shows the impact of EIP-4444 on the storage burden of each node over the next 3 years. This is the same as Figure 4, but with the addition of a lighter line representing the storage burden after EIP-4444 is implemented.

Figure 6: The impact of EIP-4444 on Ethereum node storage burden

Some key conclusions can be drawn from this figure:

  • EIP-4444 will cut the current storage burden in half. The storage burden will drop from 1.2 TiB to 633 GiB.

  • EIP-4444 will stabilize the history storage burden. Assuming a constant history growth rate, history data will be discarded at the rate it is generated.

  • After EIP-4444, it will take many years for node storage burden to reach today's levels. This is because state growth will be the only factor increasing storage burden, and state growth is slower than historical growth.

After the implementation of EIP-4444, the history growth will still bring a certain degree of storage burden, because the node will store one year of history. However, even if Ethereum reaches global scale, this burden is not difficult to solve. Once the history preservation method is proven to be reliable, the one-year expiration time of EIP-4444 may be shortened to a few months, weeks or even less.

How to preserve Ethereum history?

EIP-4444 raises the question: if history is not kept by Ethereum nodes themselves, then how should it be kept? History plays a central role in Ethereum's verification, accounting, and analysis, so preserving history is critical. Fortunately, history preservation is a simple problem that only requires 1/n honest data providers. This is in stark contrast to the state consensus problem, which requires 1/3 to 2/3 of the participants to be honest. Node operators can verify the authenticity of historical data sets by 1) replaying all transactions since the genesis block and 2) checking that these transactions reproduce the same state root as the current blockchain end.

There are many ways to save history.

  • Torrents/P2P: Torrents are the simplest and most reliable method. Ethereum nodes can periodically package parts of the history and share them as public torrent files. For example, a node might create a new history torrent file every 100,000 blocks. Node clients like erigon already perform this process in a somewhat non-standardized way. In order to standardize this process, all node clients must use the same data format, the same parameters, and the same P2P network. Nodes will be able to choose whether to participate in this network based on their storage and bandwidth capabilities. Torrents have the advantage of using a high-lindy open standard that is already supported by a large number of data tools.

  • Portal Network: Portal Network is a new network designed specifically for hosting Ethereum data. It is a Torrent-like approach while also providing some additional features to make data verification easier. The advantage of Portal Network is that these additional layers of verification provide utility for light clients to efficiently verify and query shared data sets.

  • Cloud hosting: Cloud storage services such as AWS's S3 or Cloudflare's R2 provide a cheap and high-performance option for preserving historical records. However, this approach carries more legal and business operational risks, as there is no guarantee that these cloud services are always willing and able to host encrypted data.

The remaining implementation challenges are more social than technical. The Ethereum community needs to coordinate specific implementation details so that they can be integrated directly into every node client. In particular, performing a full sync from the genesis block (rather than a snapshot sync) will require retrieving history from a history provider rather than an Ethereum node. These changes do not technically require a hard fork, so they can be implemented earlier than Ethereum's next hard fork, Pectra.

All of these history preservation methods can also be used by L2s to preserve the blob data they publish to mainnet. Compared to history preservation, blob preservation is 1) more difficult because the total amount of data is much larger; 2) less important because blobs are not necessary for replaying mainnet history. However, blob preservation is still necessary for each L2 to replay its own history. Therefore, some form of blob preservation is important to the entire Ethereum ecosystem. In addition, if L2s develop a strong blob storage infrastructure, they may also be able to easily store L1 history data.

It would be helpful to directly compare datasets stored by various node configurations before and after EIP-4444. Figure 7 shows the storage burden of different Ethereum node types. State data is accounts and contracts, history data is blocks and transactions, and archive data is an optional set of data indexes. The byte counts in this table are based on a recent reth snapshot, but the numbers for other node clients should be roughly comparable.

Figure 7: Storage burden of different Ethereum node types

in other words,

  • Archive nodes store state data and historical data as well as archive data. Archive nodes can be used when someone wants to be able to easily query the historical chain status.

  • Full nodes only store historical and state data. Most nodes today are full nodes. The storage burden of a full node is about half that of an archive node.

  • Full nodes after EIP-4444 only store state data and historical data for the last year. This reduces the node's storage burden from 1.2 TiB to 633 GiB and brings the storage space for historical data to a steady-state value.

  • Stateless nodes, also known as “light nodes”, do not store any data sets and are able to verify immediately at the end of the chain. This type of node will become possible once the Verkle experiment or other state commitment schemes are added to Ethereum.

Finally, there are a few additional EIPs that limit the historical growth rate rather than just accommodating the current growth rate. This helps stay within the network IO constraints in the short term and within the storage constraints in the long term. While EIP-4444 is still necessary for the long-term sustainability of the network, these other EIPs will help Ethereum scale more efficiently in the future:

  • EIP-7623: Reprice call data to make certain transactions with too much call data more expensive. Making these usage patterns more expensive will force some of them to convert from call data to blob. This will reduce the historical growth rate.

  • EIP-4488: Impose a limit on the total amount of call data that can be included in each block. This will impose a stricter limit on how fast the history can grow.

These EIPs are easier to implement than EIP-4444, so they may serve as a short-term stopgap measure before EIP-4444 goes into production.

Conclusion

The purpose of this article is to use data to understand 1) how historical growth works and 2) how to solve this problem. Much of the data in this article is difficult to obtain through traditional means, so we hope that making this data public can provide some new insights into the historical growth problem.

History growth as a bottleneck for Ethereum's expansion has not received enough attention. Even without increasing the Gas limit, Ethereum's current practice of preserving history will force many nodes to upgrade their hardware in a few years. Fortunately, this is not a difficult problem to solve. There is already a clear solution in EIP-4444. We believe that the implementation of this EIP should be accelerated to leave room for future Gas limit increases.