The flaw of IPFS

When it comes to decentralized data storage, the InterPlanetary File System, known as IPFS, is a project that cannot be bypassed.

As one of the most notable decentralized storage projects, IPFS uses the data structure of the Merkle DAG (Directed Acyclic Graph), a modification based on the Merkle Tree. With this data structure, IPFS implements content addressing and fragment downloading of files.

Specifically speaking, IPFS assigns a unique hash value to each file, similar to a file fingerprint. Each root file points to multiple node files, and once the content of a node file changes, the hash value changes accordingly, causing the hash of the root file to change as well.

In this way, IPFS stores and finds files in a unique content-based rather than an address-based addressing. This means that if you are looking for a file, you don’t need to know where it is, just what it contains. IPFS generates a unique hash for each file, and when the user needs to retrieve this file, they only need to ask IPFS who has this hash to complete the retrieval. Because hashes prevent duplicate storage, files with the same content are not duplicated by IPFS. This approach optimizes storage and improves network performance.

The mechanism of content addressing is a major advantage of IPFS, but every coin has two sides, it also brings a drawback. In IPFS, once a file is stored it cannot be modified in the system because modifying the file content changes the file hash and the user cannot find the changed file by the original hash value. This is a widely criticized pain point: IPFS is not good at storing files that need to be updated and changed from time to time.

Although IPFS performs well for storing static files, it lacks the computation and state management capacity for more advanced database-like features such as mutability, version control, access control, and programmable logic, which are required to enable developers to build fully-featured decentralized applications. Therefore, there is an urgent need for an efficient and decentralized solution to store dynamic data — Ceramic solves this issue with a NoSQL-like database for developers to store structured and mutable content.

Built for mutable content

Ceramic’s storage design is based upon IPFS and extends it with a decentralized dynamic storage layer.

On Ceramic, every piece of information is represented as an append-only log of commits, called a “Stream”, which is shown as the combination of gray squares in the figure below. The Stream is similar in concept to Git trees: the initial state (Genesis Commit) and each subsequent change (Commit) are all stored in IPLD (InterPlanetary Linked Data, IPFS’s layer dedicated to data structures), and these records are combined to form a Stream. Since Streams record “changes” rather than “snapshots” of the resulting state, it is only necessary to process all the events on the Stream to get the latest state of the log.

For example, Ceramic’s record pattern is as follows: initially, Alice and Bob each have $10; on the second day, Alice transfers $5 to Bob; on the third day, Bob transfers $3 to Alice. This is also very much like a blockchain ledger, where the ledger does not state the balance of each user, and all intermediate processes need to be calculated to get the final user balance.

Comparatively, the traditional record pattern of IPFS is: in file a, Alice and Bob each have $10; in file b, Alice has $5 and Bob has $15; and in file c, Alice has $8 and Bob has $12. Here, each record is a snapshot of the resulting state, and a new snapshot needs to be generated as soon as there is a change.

Ceramic ensures by this design that each log has a unique Stream ID, with a global uniform naming, and no name changes due to content changes. Each write requires user authorization, and the whole process is similar to blockchain bookkeeping, except that what is written is not transaction data, but other mutable contents, such as user account information.

Data composability

Ceramic achieves cross-application data composability primarily through its use of a novel abstraction, called data models.

Data models typically represent a single, logical application feature such as a user profile, a social graph, or a blog. For instance, you can imagine that every decentralized Twitter implementation would run on a few shared data models: one for each user’s tweets, one for their social graph, one for their DMs, etc. By adopting the same underlying data models, applications are able to natively interoperate on the same data.

In a way, you can compare Ceramic’s use of data model standards to the use of token standards for asset ledgers. On Ethereum, for example, the introduction of the ERC20 fungible token and ERC721 non-fungible token standards have given rise to entire ecosystems of tokens and financial applications that natively interoperate. Ceramic brings this same concept to data.

Ceramic takes a community-driven approach to creating these data models, allowing any developer to easily define, share, and reuse their models with other developers in the ecosystem. As more data models are created by the community, you will see a continuous expansion in the quantity and variety of applications that are built with composable data.

Composability done this way also makes the developer experience better. Building an application on Ceramic looks like browsing a marketplace of data models, plugging them into your app, and automatically gaining access to all data on the network that is stored in these models. Using Ceramic, developers won’t need to worry about bootstrapping their application with their own siloed users and data. The rate of compounding innovation across developers is going to accelerate dramatically.

Scalablility

Ceramic achieves scalability through a sharded execution environment. All streams on Ceramic maintain their state independently and network nodes execute stream transactions in parallel. This approach, unlike most blockchains, allows Ceramic to operate with the scalability required for decentralized versions of social applications like Twitter or Facebook.

Unlike traditional blockchain systems where scalability is limited to a single global virtual execution environment and the state of a single ledger is shared between all nodes, each Ceramic node acts as an individual execution environment for performing computations and validating transactions on streams — there is no global ledger. This “built-in” execution sharding enables the Ceramic Network to scale horizontally to parallelize the processing of an increasing number of simultaneous stream transactions as the number of nodes on the network increases. Such a design is needed to handle the scale of the world’s data, which is orders of magnitude greater than the throughput needed on a financial blockchain. Another benefit of this design is that a Ceramic node can perform stream transactions in an offline-first environment and then later sync updates with the rest of the network when it comes back online.

DID Solution

Ceramic also offers a flexible and robust identity solution called IDX, the first fully functional decentralized identity (DID) solution.

IDX is a cross-chain identity protocol for open applications with decentralized identity and interoperable user data, which lets users build up a unified digital identity consisting of all of their data while enabling developers to break down silos and freely share user data between applications. As shown in the figure below, it provides a decentralized index which allows structured data to be associated to a decentralized identifier (DID), and data is defined by definitions and stored in records.

In addition, IDX can be used with any kind of datastore such as Ceramic, Textile, OrbitDB, IPFS, Sia, Arweave, blockchain registries, or even centralized databases and supports authentication from any kind of Web3 wallet.

IDX is great for associating user profiles, portable social graphs, reputation scores, verifiable claims, user-generated content, application data, settings, domain names, blockchain addresses, and social Web2 accounts to a user in a decentralized way.

Conclusion

In summary, the emergence of Ceramic has greatly empowered the construction of Web3 and unlocked new features for Web3 developers. No matter which public blockchain (Ethereum, BSC, Polygon, Avalanche, etc.) developers are building on, they can simultaneously leverage Ceramic for data-centric functions to make their applications better. Furthermore, through Ceramic’s flexible DID-based account system, Ceramic naturally interfaces with account and key systems of current major blockchains, which provides users with great convenience.

It is pleasing to see that there are already many DID and Web3 social platform projects developed on Ceramic. Among them to name a few noteworthy projects: CyberConnect, a social graph middleware platform; Orbis, a Web3 Twitter platform; and The Convo Space, an instant messaging platform, etc. We are looking forward to the new possibilities that Ceramic’s infrastructure of data network can bring to the Web3 application layer.

Disclaimer: This research is for information purposes only. It does not constitute investment advice or a recommendation to buy or sell any investment and should not be used in the evaluation of the merits of making any investment decision.

🐩 @chestersigned

📅 8 May 2022

Links:

https://developers.ceramic.network/learn/welcome/

https://blog.ceramic.network/what-is-ceramic/

https://multicoin.capital/2022/02/16/the-composable-web3-data-network/

https://blog.ipfs.io/2021-07-13-ceramic-mainnet-launch/