This explanation of the data availability layer project Celestia should be easier to understand

TL;DR
1/ The essence of modularization is to break the "impossible triangle" and achieve capacity expansion without increasing the burden on node hardware.
2/ Celestia is a data availability layer, similar to Optimistic Rollup. The default block data is valid. It uses fraud proofs, erasure codes, and data availability sampling to perform data verification, while allowing light nodes to participate in the verification.
3/ Celestia has initially formed an ecosystem, and currently well-known projects in the ecosystem include Fuel, Cevmos, etc.
4/ It will be crucial for Celestia to seize the window of opportunity, form a scale effect before Polygon Avail and Danksharding, and attract a large amount of liquidity, especially the liquidity of native Rollup.
Typically, Layer1 is divided into four layers:
1) Consensus layer
2) Settlement Layer
3) Data Layer
4) Execution layer
The consensus layer is necessary. Modularization means separating one or two of settlement, data, and execution (more precisely, “decoupling”), and adding consensus to form a new layer of network protocol to break the “impossible triangle” and achieve capacity expansion without increasing the burden on node hardware and causing centralization.
For example, Ethereum Rollup separates the execution layer to provide consensus and execution. A centralized sequencer sorts transactions, packages and compresses a large number of transactions, and submits them to the Ethereum mainnet, where the full nodes of the mainnet verify the transaction data.
Celestia is a Data Availability (DA) project based on the Cosmos architecture. It provides data layer and consensus layer for other Layer1 and Layer2, builds a modular blockchain, and has a toB business model, charging other public chains.
To fully understand Celestia and data availability, we must first start with the “impossible triangle” and the problem of data availability.
Why is data availability important? From the “impossible triangle” to the data availability problem
The impossible triangle, also known as the trilemma, usually refers to the fact that decentralization, scalability, and security cannot be achieved at the same time. It was first proposed by Ethereum officials.
Usually, when a transaction is submitted to the chain, it first enters the Mempool, is "selected" by miners, packaged into a block, and then spliced ​​into the blockchain.
The block containing this transaction will be broadcast to all nodes in the network. Other full nodes will download this new block, perform complex calculations, and verify each transaction in it to ensure that the transaction is authentic and valid.
Complex calculations and redundancy are the foundation of Ethereum's security, but they also bring problems.
1) Data availability
There are generally two types of nodes:
Full node - downloads and verifies all block information and transaction data.
Light node - a non-full verification node, easy to deploy, only verifies the block header (data summary).
First, ensure that when a new block is generated, all the data in the block has been published so that other nodes can verify it. If the full node does not publish all the data in the block, other nodes cannot detect whether the block hides malicious transactions.
In other words, the node needs to obtain all transaction data within a certain period of time and verify that there is no transaction data that has been confirmed but not verified. This is the data availability in the usual sense.
If a full node conceals some transaction data, other full nodes will refuse to follow this block after verification, but light nodes that only download block header information cannot verify and will continue to follow this forked block, affecting security.
Although the blockchain will usually confiscate the full node's deposit, this will also cause losses to users who have pledged to the node.
And when the benefits of concealing data exceed the penalty costs, the nodes have the motivation to conceal it. At that time, the only ones who actually suffer are the pledged users and other users of the chain.
On the other hand, if the full node deployment becomes increasingly centralized, there is a possibility of collusion between nodes, which will endanger the security of the entire chain.
This is why it is important that data is available.
Data availability is gaining more and more attention, on the one hand because of the Ethereum PoS merger, and on the other hand because of the development of Rollup. Currently, Rollup runs a centralized sequencer.
When users trade on Rollup, the sequencer sorts, packages, and compresses the transactions and publishes them to the Ethereum mainnet. The mainnet full node verifies the data through fraud proof (Optimistic) or validity proof (ZK).
As long as all the data in the blocks submitted by the sequencer are authentic and available, the Ethereum mainnet can track, verify, and reconstruct the Rollup state based on it to ensure data authenticity and user property security.
2) State explosion and centralization
State explosion means that the Ethereum full node accumulates more and more historical and state data, the storage resources required to run the full node are increasing, the operating threshold is increased, and the network nodes are centralized.
Therefore, a method is needed so that full nodes do not need to download all data when synchronizing and verifying block data, but only need to download some redundant fragments of the block.
So far, we understand that data availability is important. So, how to avoid the "tragedy of the commons"? That is, everyone knows the importance of data availability, but there still needs to be some tangible interest drive to make everyone use a separate data availability layer.
Just like everyone knows that protecting the environment is important, but when I see trash on the roadside, why should I pick it up? Why not others? What benefits can I get from picking up the trash?
It's Celestia's turn.
What is Celestia?
Celestia provides a pluggable data availability layer and consensus for other Layer1 and Layer2, built on the Cosmos Tendermint consensus and Cosmos SDK.
Celestia is a Layer 1 protocol that is compatible with EVM chains and Cosmos application chains. In the future, it will support all types of Rollups. These chains can directly use Celestia as the data availability layer. Block data will be stored, called, and verified through Celestia, and then returned to its own protocol for settlement.
Celestia also supports native Rollup, and Layer2 can be built directly on it, but it does not support smart contracts, so dApp cannot be built directly.
How Celestia works
Rollup connects to Celestia by running a Celestia node.
Celestia receives Rollup transaction information and sorts transactions through Tendermint consensus. After that, Celestia will not execute transactions or question the validity of transactions, but will only package, sort and broadcast transactions.
Yes, in other words, blocks that conceal transaction data can also be published on Celestia. So how does Celestia identify them?
Verification is done through Erasure Coding and Data Availability Sampling (DAS).
Specifically, the original data is K (if the actual data size is less than K, invalid data will be added to make the size equal to K), and erasure coding is performed on it, divided into N multiple small branches (Chunks), and expanded to a matrix of 2K rows and columns.
It can be simply understood that a square with a length and width of K and an area of ​​K*K becomes a square with a length and width of 2K and an area of ​​2K*2K after erasure coding.
If the original data is 1Mb, it is erasure coded, divided into several parts, and expanded to 4Mb, of which 3Mb is special data. Only a part of the K*K size data is needed to recover or view the entire 2K*2K data.
The specific mathematical calculations are extremely complicated, but the result is that even if a malicious block producer conceals even 1% of the transaction data, it will turn into concealing more than 50% of the branches (Chunks).
Therefore, if concealment is to be effective, the data matrix will undergo a qualitative change, which can be easily discovered by light nodes. This makes concealing data extremely unlikely.
Full nodes can verify data through fraud proofs, similar to other Layer1s. The key role of erasure codes is to mobilize light nodes to participate in data verification.
The full node sends the block header to the light node, which performs data availability sampling. If the data is not concealed, the light node recognizes the block. If the data is missing, the light node will send it to other full nodes. Other full nodes will initiate fraud proofs.
In summary,
1/ Celestia uses erasure codes to encode the original data and cut the original data into several small pieces (Chunks). (If there is still space in the block, invalid data will be used to fill it, so that the block with space is the block where the full node conceals the data)
2/ Expand the original data of size K*K to 2K*2K. Because the K*K data has been divided into several small parts, the state of the 2K*2K data is also the same, in several small parts.
3/ This provides three benefits:
1) Because the data is cut into several small parts, light nodes can also participate in verification. (If the data is still large, light nodes are restricted by hardware and cannot participate in verification)
2) Only K*K data needs to be sampled to restore the entire 2K*2K data. Light nodes take turns sampling until the sampling size reaches K*K. Then they can choose whether to approve the current block by comparing the entire data.
3) If a malicious block producer conceals even 1% of the transaction data, it will become a branch (Chunks) that conceals more than 50%.
4/ Full nodes can directly verify block data through fraud proofs, similar to other Layer1s such as Ethereum.
5/ Light nodes can be sampled and verified through data availability. Multiple light nodes randomly sample until the size of the extracted data area is K*K. This is the innovation of Celestia.
6/ For light node sampling, the sampling model is sub-linear. They only need to download the square root of the amount of data required for sampling. That is, if there are 10,000 small chunks of data that need to be sampled, only 100 of them need to be downloaded and checked.
Because 100 squared is 10,000.
7/ If the block data verified by the light node is concealed, it can be submitted to other full nodes, and the cheating node deposit can be confiscated through fraud proof.
Celestia Scaling
Erasure coding and data availability sampling enable Celestia to achieve further expansion and improve network efficiency compared to other Layer1 existing data availability.
1/ Fraud proof is adopted, and block data is available by default to ensure that the network operates efficiently under normal circumstances.
2/ The more light nodes there are, the higher the network efficiency.
Because the original data size is K*K, if there is only one light node, K*K samplings are required. Conversely, if there are K*K light nodes, only 1 sampling is required.
3/ Sublinear sampling, allowing Celestia to use large blocks.
In addition, the characteristics of erasure coding mean that in the event of a large-scale failure of the Celestia full node, the light nodes can use the branch data to restore the transaction data and ensure that the data is still accessible.
Quantum Gravity Bridge
Quantum Gravity Bridge is a relay bridge between Celestia and Ethereum Layer2, built on Ethereum. Layer2 can publish transaction data to Celestia through Quantum Gravity Bridge, use data availability services, and verify data on Celestia through smart contracts.
Heavenly
Celestium is Ethereum Layer2, using Celestia as the data availability layer and Ethereum as the settlement and consensus layer.
Currently in development.
Why Celestia?
Remember the “tragedy of the commons” we mentioned earlier? That is, why does Rollup use Celestia as the data layer?
1/ Low cost of using Celestia
The current cost of Ethereum Rollup consists of two parts:
1) Rollup’s own Gas cost, which is the fee charged for user interaction, sequencer sorting, and state transition.
2) Rollup submits the block to Ethereum, spending Gas.
After the Rollup sequencer packages and compresses, it creates a block on Ethereum. Currently, it is stored in the form of Calldata, with a cost of 16 Gas per byte.
Ethereum and Rollup each charge different Gas fees depending on the congestion situation. Before batching user interactions, the sequencer will try its best to predict the Ethereum Gas fee and charge it to the user.
In other words, the reason why Gas on Rollup is cheap is that several user interactions are packaged together, which is equivalent to sharing the Gas equally among everyone.
When the market is in a cold period, there are fewer interactions on Ethereum, and the Gas that everyone needs to share will also be reduced. Rollup will only charge a small profit on normal Gas. Once the Gas on Ethereum soars, the Gas on Rollup will also rise.
Therefore, Rollup is essentially competing for block space with dApps and other Rollups on the Ethereum mainnet.
On the other hand, the hot interaction of Rollup itself will also increase Gas, such as the recent Aribitrum Odyssey.
In general, the current cost model of Rollup is linear, and the cost will rise or fall with the demand for Ethereum interaction.
The cost of Celestia is sublinear, and the cost will eventually converge to a value that is much lower than the current cost of Ethereum.
After the EIP-4844 upgrade is deployed, Rollup data storage will change from Calldata to Blob, and the cost will be reduced, but it will still be more expensive than Celestia.
2/ Self-Sovereignty
Autonomous Rollup essentially gives Rollup the power to fork autonomously. Celestia’s native Rollup is an autonomous chain, and governance and fork upgrades are not restricted by Celestia.
Why is forking important?
Usually, blockchains need to be upgraded through hard forks, which will weaken security. The reason is that if someone wants to change or upgrade the blockchain code, other participants need to agree and make the changes.
If you want to upgrade the entire chain, you need to fork the entire consensus layer, just like the Ethereum PoS merger had to use a hash bomb to force nodes to migrate from PoW to PoS. All nodes must participate in the upgrade to avoid losing any security.
Celestia will provide forking capabilities for Rollup because all forks use the same data availability layer.
In addition, autonomous Rollup will also bring more flexibility. Ethereum Rollup is limited by the Ethereum mainnet's ability to process fraud proofs or validity proofs.
Autonomous Rollup does not rely on a specific virtual machine, such as EVM. Therefore, autonomous Rollup has more options, such as becoming Solana VM, etc. However, using different VM virtual machines will limit interoperability.
On the other hand, there may not be much demand for Rollups to become autonomous Rollups at present.
A.  Restricted by centralized assets. For example, USDC and USDT do not officially support new forked chains.
B.  Affected by dApp migration restrictions. For example, dApps such as Uniswap remain on the previous chain, and users are unwilling to give up their original habits and have not migrated to the new forked chain.
3/ Trust minimized bridges and shared security
Celestia's official article roughly divides cross-chain into two categories:
A. A trusted cross-chain bridge requires trust in a third party, such as a relay chain node. Its reliability is based on the consensus of third-party nodes, that is, most nodes are honest.
B. Trust-minimized cross-chain bridge, similar to the relationship between Ethereum and Rollup, relies on fraud proof (Optimistic) and validity proof (ZK) to verify the validity of Rollup transaction data.
Celestia proposes a concept - Clusters, which is a group of chains that communicate with each other, cross-chain through trust-minimizing bridges, and each chain can verify the status of other chains.
Typically, clusters encounter two limiting factors,
A.  All chains in the cluster need to understand each other’s execution environment. But this is difficult, for example, ZK Rollup needs to understand each other’s ZK system. But zk-SNARK and zk-STARK are different ZK systems. Therefore, ZK Rollup are relatively independent.
B. To maintain trust-minimized state verification for all chains within the cluster, each chain must verify the availability of block data of other chains within the cluster in a trust-minimized manner.
Using Celestia as the data availability layer, all chains in the cluster can check whether each other’s blocks are included in the Celestia chain.
However, it is slightly embarrassing that in the Celestia cluster concept, Optimistic Rollup and ZK Rollup belong to two clusters.
That is, among Optimistic Rollups, such as Optimism and Aribitrum, they belong to the same cluster, but they and zkSync do not.
And due to the differences in ZK Rollup solutions, zkSync and StarkNet do not even belong to the same cluster. Therefore, Celestia still cannot solve the problem of relative independence between Rollups and lack of atomic-level interoperability.
Optimint（Optimistic Tendermint）
Optimint is a Tendermint consensus alternative that allows developers to build Cosmos-based Rollups while using Celestia as a consensus and data availability layer.
The goal is to allow Cosmos-based Rollups to form clusters.
Celestia's current ecological projects
Fuel
Fuel is a modular execution layer built on Celestia, the Ethereum Optimistic Rollup Layer2.
Fuel built FuelVM, a custom virtual machine built specifically for smart contracts that can handle parallel transactions and use UTXO accounts.
Cevmos
Cevmos is a Rollup jointly developed by Cosmos EVM application chain and Celestia
Cevmos is built using Optimint. Since Evmos itself is a Rollup, Cevmos is actually a Rollup in a Rollup (recursive Rollup).
Existing Rollup contracts and applications on Ethereum can be redeployed on Cevmos, using it as the settlement layer and Celestia as the data layer.
Each Rollup built will have a two-way trust-minimized bridge with Cevmos Rollup to form a cluster.
dYmension
dYmension is an independent Rollup built on Cosmos. dYmension Hub provides settlement, and also provides the development kit RDK and inter-chain communication IRC to facilitate the development of Rollup-focused applications rollApp.
Eclipse
Eclipse is an autonomous Rollup based on Cosmos, using Solana VM as the settlement and execution layer and Celestia as the data layer.
Project Progress
The testnet is now online. The rewarded testnet will be released in the first quarter of 2023. You can now go to the official Discord to claim the faucet test coins. The mainnet is expected to be released in the second quarter of 2023.
Financing
In March 2021, it completed a US$1.5 million seed round of financing, with participants including Binance Labs, Interchain Foundation, Maven 11, KR1, etc.
In December 2021, it completed a financing of US$2.73 million, and the investment information was not disclosed.
In October 2022, it completed a financing of US$55 million, with participants including Bain Capital, Polychain Capital, Placeholder, Galaxy, Delphi Digital, Blockchain Capital, Spartan Group, FTX Ventures, Jump Crypto, etc.
Team situation
CEO Mustafa Al-Bassam, PhD in blockchain expansion at UCL, co-founder of Chainspace (acquired by Facebook)
CTO Ismail Khoffi, former senior engineer at Tendermint and Interchain Foundation
CRO John Adler, creator of Optimistic Rollups, former ConsenSys scalability researcher
COO Nick White, co-founder of Harmony, holds a bachelor’s and master’s degree from Stanford University.
Advisory Team:
Zaki Manian — IBC co-creator and early Cosmos contributor
Ethan Buchman — Co-founder of Tendermint and Co-founder of Cosmos
Morgan Beller — General Partner at NFX, Co-founder of Diem≋ (aka Libra)
Nick White - Co-founder of Harmony
James Prestwich — Founder of Summa (acquired by Celo)
George Danezis - Professor of Security and Privacy Engineering at University College London
Token Economic Model
According to the information released, Celestia’s native token will be used as Gas, and the protocol’s revenue source is Rollup transaction fees. The token also includes a destruction mechanism similar to EIP-1559.
Currently, Celestia's primary market valuation is US$1 billion.
Competitors
Polygon Avail
Avail is a data availability solution launched by Polygon. The implementation idea is the same as Celestia, but the difference is that Celestia uses erasure code + fraud proof, while Avail uses erasure code + KZG Polynomial Commitment.
Celestia expands K*K data into a 2K*2K square, and Avail expands it row by row, expanding an n-row and m-column matrix into 2n rows, and calculating the KZG polynomial commitment for each row.
Light nodes use data availability sampling DAS to cryptographically verify KZG polynomials and proofs without downloading the original data.
In comparison, Avail is more difficult to implement, and when fully implemented, the results are relatively more reliable. However, at present, both projects are under development, and it is difficult to judge the competition.
Ethereum Danksharding
Danksharding is an independent data availability layer that Ethereum officially plans to launch. Similar to Avail, Danksharding uses erasure code + KZG polynomial commitment, and the data format will use Blob instead of the existing calldata.
There are two proposals as a transition before Danksharding is deployed.
EIP-4488 directly reduces the calldata gas from 16 to 3 per byte, and also stipulates an upper limit of 1.4Mb for calldata.
EIP-4844 introduces Blob (blob-carrying transactions, blob: binary large objects) instead of calldata. Blob is a new transaction type that contains additional storage space and costs much less than calldata.
Blob is stored on the Ethereum beacon chain and is compatible with subsequent shards. It uses the KZG commitment hash value to verify the data. Rollup does not need to access the data, it only needs to verify the KZG commitment.
KZG commitments are binding and cannot be changed after the calculation is completed. So, in essence, Avail and Danksharding verify data based on the cryptographic KZG polynomial commitment, while Celestia is based on the economic fraud proof method.
In theory, KZG polynomial commitment is more secure than fraud proof, and requires less bandwidth and less computational effort for sampling. In the future, Ethereum is also considering introducing verification methods that resist quantum attacks, such as zk-SRARK.
risk
1) Centralization
Although erasure codes allow light nodes to participate in data verification, Celestia data storage still requires the construction of full storage nodes.
8GB of memory, 4 cores of CPU, at least 250GB of free storage space, uplink bandwidth greater than 100Mb/s, and downlink bandwidth greater than 1Gb/s are required. The configuration requirements are very high and need to be built on a cloud server.
2) Competition of Ethereum Danksharding
3) The “dirty ledger” problem
This question was raised by the Stanford research team. Celestia uses fraud proofs and defaults to available block data to ensure that the network operates efficiently under normal circumstances. Therefore, it is a "dirty" ledger, because blocks with problematic data will still be accepted by Celestia and wait for challenges from fraud proofs.
Suppose a challenger wants to prove that transaction Tc is a double spend and submits evidence that the money has been used for transaction Tb. But what if there is a transaction Ta that can prove that Tb is invalid?
If Tb is invalid, then Tc double spend may be valid.
In some cases, a “dirty ledger” cannot know the true status of transactions unless every transaction in Celestia’s history is replayed until the genesis block.
This means that both the challenger and the challenged must be full storage nodes. This issue was posted on Celestia’s official Youtube account, and the team is currently working on solving the problem, such as introducing weak subjectivity assumptions.
The weak subjectivity assumption is a condition for solving problems. For example, how to buy delicious grapefruit? The subjectivity in this question is to choose according to subjective feelings. The objectivity is to judge the moisture content of grapefruit according to the ratio of its gravity to its volume.
Weak subjectivity is to hold grapefruits of similar size in both hands and compare their weights. After comparing several grapefruits, choose the heaviest one.
Back to the Celestia "dirty ledger" problem, challengers and challenged parties can be required to retain data for 3 weeks, but this is also a burden for nodes.
The “dirty ledger” problem is actually the fundamental problem faced by fraud proofs based on economic models to ensure security. However, the deployment difficulty of fraud proofs is less than that of KZG polynomial commitments. In theory, Celestia’s development progress is faster than Polygon Avail and Ethereum Danksharding.
Therefore, it will be crucial for Celestia to seize the window of opportunity, form economies of scale before Polygon Avail and Danksharding, and attract a large amount of liquidity, especially native Rollup liquidity.
This explanation of the data availability layer project Celestia should be easier to understand

Explore More From Creator

Latest News