When the data economy develops to a certain extent, people are widely and deeply involved in it, and everyone will inevitably participate in different data storage activities. In addition, with the advent of the Web3 era, most technology fields will slowly begin to upgrade or transform in the next few years, and decentralized storage, as an important infrastructure of Web3, will be implemented in more application scenarios in the future. For example, the data storage networks behind the social data, short videos, live broadcasts, smart cars, etc. that we are familiar with will also adopt a decentralized storage model in the future.

Data is the core asset of the Web3 era, and user ownership of data is the main feature of Web3. Allowing users to safely own data and the assets represented by the data will dispel ordinary users' various concerns about asset security and help guide the next billion users to enter the Web. An independent data availability layer will be an indispensable part of Web3.

From decentralized storage to data availability layer

In the past, data was stored in the cloud through traditional centralized methods, and data was usually stored completely on centralized servers. Amazon Web Services (AWS) is the originator of cloud storage and is currently the world's largest cloud storage provider. As time goes by, users' demand for personal information security and data storage continues to increase. Especially after data leaks occurred in some large data operators, the disadvantages of centralized storage began to gradually emerge, and traditional storage methods can no longer meet the needs of the current market. In addition, with the continuous advancement of the Web3 era and the development of blockchain applications, data has become more diversified and the scale of data has continued to grow. The dimensions of personal network data are more comprehensive and more valuable, making data security and data privacy more important, and the requirements for data storage have also begun to rise.

Decentralized data storage came into being. Decentralized storage is one of the earliest and most popular infrastructures in the Web3 field. The earliest solution was Filecoin, which was launched in 2017. Compared with AWS, decentralization and centralization are fundamentally different. AWS has established and maintained its own data center consisting of multiple servers. Users who need to purchase storage services can pay AWS directly. Decentralized storage follows the sharing economy and uses massive edge storage devices to provide storage services. The data is actually stored on the storage provided by the Provider node. Therefore, the decentralized storage project party cannot control this data. The most essential difference between decentralized storage and AWS is whether users can control their own data. In such a system without centralized control, the security factor of data is very high.

Decentralized storage is a storage business model that mainly stores files or file sets in shards on storage space through distributed storage. The reason why decentralized storage is important is that it solves the various pain points of Web2 centralized cloud storage, is more in line with the needs of the development of the big data era, can store unstructured edge data at a lower cost and higher efficiency, and empower various emerging technologies. Therefore, decentralized storage can also be said to be the cornerstone of the development of Web3.

There are two common decentralized storage projects at present. One is to mine with storage for the purpose of block generation. The problem brought by this model is that the storage and downloading on the chain will slow down the actual use speed. It often takes several hours to download a photo. The other is to use one or several nodes as centralized nodes. Only after verification by the centralized node can storage and download be carried out. Once the centralized node is attacked or damaged, it will also cause the loss of stored data.

Compared with the first project, MEMO's storage tiering mechanism solves the storage download speed problem very well, making the storage download speed reach the second level. Compared with the second project, MEMO adopts the role of Keeper to randomly select verification nodes, avoiding the emergence of centralization while ensuring security. Moreover, MEMO has created the original RAFI technology, which can increase the repair capability several times, greatly improving the security, reliability and availability of storage.

Data Availability DA (Data Availability) is essentially a light node that does not need to store all data or maintain the status of the entire network in a timely manner without participating in consensus. For such nodes, an efficient way is needed to ensure data availability and accuracy. Because the core of blockchain is that data cannot be changed. Blockchain can ensure that data is consistent across the entire network. In order to ensure performance, consensus nodes will tend to be more centralized. Other nodes need to obtain available data confirmed by consensus through DA. The independent data availability layer effectively eliminates single point failure problems and maximizes data security.

In addition, Layer2 expansion solutions such as zkRollup also need to use a data availability layer. Layer2, as the execution layer, uses Layer1 as the consensus layer. In addition to updating the result status of batch transactions to Layer1, it also needs to ensure the availability of the original transaction data to ensure that when no prover is willing to generate proofs, the status of the Layer2 network can still be restored to avoid the extreme situation where user assets are locked in Layer2. However, if the original data is directly stored in Layer1, it violates the function of Layer1 as the consensus layer under the modularization of the blockchain network. Therefore, it is a more reasonable design to store the data in a dedicated data availability layer and only record the Merkel root calculated for this data in the consensus layer. It is also a more inevitable trend in the long run.

Figure 1 shows the general Layer2 independent data availability layer model designed by Fox Tech.

Figure 1: General Layer2 Independent Data Availability Layer Model Independent Data Availability Layer Analysis of Celestia

An independent data availability layer is a public chain, which is better than an availability committee composed of a group of subjective people. If the private keys of enough committee members are stolen (which has happened to both Ronin Bridge and Harmony Horizon Bridge), making the off-chain data availability unavailable, users can be threatened - only if they pay enough ransom can they withdraw funds from Layer2.

Since the off-chain data availability committee is not secure enough, what if blockchain is introduced as a trust entity to ensure the availability of off-chain data?

What Celestia does is to make the data availability layer more decentralized - equivalent to providing an independent DA public chain with a series of verification nodes, block producers and consensus mechanisms to improve the security level.

Layer 2 publishes the transaction data to the Celestia main chain, and Celestia's validators sign the Merkle Root of DA Attestation and send it to the DA Bridge Contract on the Ethereum main chain for verification and storage. In this way, the Merkle Root of DA Attestation is actually used to prove the availability of all data. The DA Bridge Contract on the Ethereum main chain only needs to verify and store this Merkle Root, which greatly reduces the cost.

Celestia's fraud proof is an optimistic proof. As long as no one in the network makes a mistake, the efficiency is very high. If there is no mistake, I will not have a fraud proof. The light node does not need to do anything. As long as it receives the data and recovers it according to the encoding, the optimistic proof is still very efficient if there is no problem in the whole process.

Independent Data Availability Layer Analysis: MEMO

MEMO is a new generation of high-capacity, high-availability enterprise-level storage network built by aggregating global edge storage devices through algorithmic characteristics. The team was established in September 2017 and mainly studies the field of decentralized storage. MEMO is a highly secure, highly reliable, large-scale decentralized data storage protocol based on blockchain peer-to-peer technology, which can achieve large-scale data storage. Unlike one-to-many centralized storage, MEMO can achieve data-free, many-to-many storage operations. In the main chain of MEMO, smart contracts used to constrain all nodes are mainly stored. A series of key operations such as uploading storage data, matching storage nodes, normal operation of the system, and operation of the penalty mechanism are all controlled by smart contracts.

In terms of technology, among the existing decentralized storage systems, represented by Filecoin, Arweave, Storj, etc., they allow all computer users to connect and rent out their unused hard disk space in exchange for certain fees or tokens. Although they are all decentralized storage, they all have their own characteristics. The difference of MEMO is that it uses erasure codes and data repair technology to improve storage functions, making data more secure and making storage and downloading more efficient. Because creating a more pure and practical decentralized storage system is the ultimate goal of MEMO.

MEMO has enhanced the ease of use of storage while optimizing the incentive mechanism of Provider. In addition to the User and Provider roles, Keeper is also introduced to prevent nodes from being maliciously attacked. The system maintains economic balance through the mutual constraints of multiple roles. It can support high-capacity, high-availability enterprise-level commercial storage purposes, and can provide safe and reliable cloud storage services for NFT, GameFi, DeFi, SocialFi, etc. It is compatible with WEB2 and is a perfect fusion of blockchain and cloud storage.