Author: toly, co-founder of Solana

Compiled by: Felix, PANews

 

With approximately 1 million new accounts added to Solana every day, the total state is now over 500 million, and the snapshot size is around 70GB. These numbers themselves are perfectly manageable as hardware improves, but the goal of the SVM runtime is to provide the cheapest way to access hardware, and in order to achieve this, state and memory must be managed within the current hardware limitations.

PCI Bandwidth

As of 2024, the latest PCI bandwidth can reach 0.5 Tbs to 1 Tb of throughput. Or 64GB to 128GB per second. While it sounds large, if a tx read/write is 128MB, 128GBps of PCI bandwidth will limit the chain's TPS to around 1000. In reality, most txs access memory that was recently loaded and cached in RAM. The ideal design would allow loading 1000 txs with 128MB of new state, plus 10k or more txs that read and write existing cached state.

Account Index

Creating a new account requires proving that the account does not currently exist. This is usually done automatically on each validator, since each validator has a full index of all currently valid accounts. Even if the account data is not stored locally, and only the hash of the data is stored, 500 million accounts would be 32 bytes of key + 32 bytes of data hash or 64 bytes per item, which is 32 GB. This is enough to keep RAM and disk separate.

Snapshot size

At certain snapshot sizes, if part of the network experiences a hardware failure, the time required to cold-start a new system is long enough to extend the worst-case restart time. This changes daily as bandwidth and hardware improve, and Solana is not close to this limit, but it exists at any point in time.

overview

Memory and disk have different performance characteristics and limits. If the SVM does not differentiate, then transactions and limits must be priced for the worst case, limiting performance. All account keys must at least be available during transaction execution, and the total number of accounts will impact RAM and disk PCIi bandwidth utilization. Snapshots cannot grow arbitrarily large. The ideal solution is:

  • Allows packing more txs into blocks that don't require PCI resources

  • Managing total index size and snapshot size

Chilly. Avocado. LSR. Bad names are often a sign of good software design. The engineers at Anza and Firedancer came up with the following.

Chilly

The account runtime cache is managed deterministically by all instances. At a high level, this is an LRU cache of accessed state. During block construction and scheduling, the implementation can easily check accounts without locking or iterating the LRU cache. The cache is implemented with a very simple counter mechanism.

  • The total loaded bytes are tracked as Bank::loaded_bytes:u64

  • Each account is tagged with the current running total account::load_counter:u64 when in use

  • When loading an account, if Bank::loaded_bytes - Account::load_counter > CACHE_SIZE, the account is considered a cold account, and its size is calculated based on the LOAD_LIMIT per block.

  • New accounts have a load_counter of 0, so all new accounts are cold accounts

  • The leader's scheduler uses LOAD_LIMIT as a watermark, similar to the write lock CU limit.

The beauty of this design is that it fits naturally into the current scheduler. Users only have to worry about their priority fee. The scheduler has to handle putting all txs below LOAD_LIMIT and the account write lock limit into the knapsack. The highest priority tx can be loaded first and use LOAD_LIMIT. Once this limit is reached, all other txs can still fit into a block. Therefore, validators can maximize cache locality mitigation for txs.

Avocado

Avacado consists of two parts, state compression and index compression. First, the account data is replaced with hashes, and then the account index is migrated to Binary Trie/patricia Trie. New accounts must provide proof that they are not in the "trie".

State compression

The general design is as follows:

  • During allocation, each account binds X lamports per byte.

  • If X < current economic floor price, keep the account in memory and the account will be compressed

  • Compression is a multi-step process that runs over an epoch.

  • Account data is replaced with a hash value (data)

  • Account keys are still in state

  • Transactions referencing compressed accounts failed

  • Unzipping requires uploading data similar to the loader

  • The cost of unpacking should be the same as the cost of allocating a new account

An estimated 75% of accounts have not been accessed in more than 6 months and will likely never be accessed. Compressing them can save 50% of the snapshot size.

Index Compression

This is a more difficult problem to solve. With state compression alone, validators still have all possible valid accounts in the system. Creating a new account requires checking this database. It is expensive for validators to store this database, but cheap for users to create new accounts. It is guaranteed that new private keys will not collide with any existing accounts.

Binary Trie mining

  • Binary Trie is tracked as part of the snapshot

  • A validator who wants to earn additional sol can create a transaction that removes the compressed account kv pairs from the state and adds them to the Binary Trie

  • The user can remove the kv from the trie during decompression, thus doing this in reverse if not allowed (this may need to be done atomically on decompression to make it easier when the background service compresses the account).

  • For validators, the size of the Trie root is constant no matter how many kv pairs it contains

  • Using zkp, each tx can compress about 30 accounts

  • Assuming there is only one per block, it would take about 80 days to compress 500 million accounts.

The key point of this process is that validators who do this will be rewarded, but not all validators have to do this. If all validators had to do this, then all validators would have to maintain the contents of the current Binary Trie, which means that the entire state must be part of the snapshot. Validators who want to maintain the entire state should submit a transaction to compress the N accounts in the index into the Trie.

New Account Proof

To create a new account, a user must prove that the account does not exist in the Trie. Validators that maintain the entire state can generate proofs that an account is not in the Trie. This places a burden on users, who must always connect to large state providers to generate these proofs.

Alternatively, a user can prove that their account was created with a recent PoH hash. The simplest way to support this is:

  • Generate a new PKI

  • The account address is a hash (the most recent PoH hash, PKI::public_key)

Given that accounts in the trie must first undergo state compression, which requires a full epoch, it is impossible for any account in the trie to generate an address using the most recent PoH hash.

Other approaches that can be supported are that the PKI creation itself can provide a proof that the private key was created with a hash (the user's hidden secret, most recently the PoH hash).

LSR

Lightweight Simple Rent, also known as Less Stupid Rent. How do you price the cost of allocating new accounts, and how do you ensure that old, abandoned accounts eventually get compacted, reducing the overall load on the system and the price of new users?

The rent system needs to be restored. Rent means that accounts in the current state should pay X dollars/byte/day, just like accounts on AWS pay for storage.

Rent Rate bonding curve

RentRate = K*(state_size)^N

Regardless of the current state size, if it is small, the rate should be low, and if it is close to the snapshot limit, the rate should be very high.

Allocation Minimum Bonding Price

Accounts must exist for at least one epoch. Allocations need to bring accounts into the Hot state. Hot accounts should exist for the duration of the cache.

New Account bond = Epoch Slots * RentRate * Account::size

New accounts must have at least this many lamports in their balance before they can be created.

Hot Account Burn

lruturnverrate = The average time each account spends in the LRU cache, with a maximum of 1 epoch. This value can be a constant or calculated off-chain and reported to the SVM as a median stake-weighted constant.

compression

When (current slot - account::creation_slot) * RentRate * account::size > account::lamports, compact the account and burn all lamports.

The above solution should make state cheap because over time, unused accounts will eventually reach lamports 0 and will be compacted. So data overhead will be reduced, and even indexing overhead will be reduced, which will reduce the size of the current state. Reducing the size of the state will reduce the cost of super-quadratic allocations.