What is a single point of failure (SPOF)? By definition, a single point of failure is a potential risk caused by a design, implementation, or configuration defect in a circuit or system. In other words, a SPOF is a failure that may cause the entire system to stop operating.
What is a single point of failure in a data storage system?
A single point of failure in a data storage system can be understood as a failure of an element, component, or part of the system that can cause the entire system to fail. There are usually several situations:
Assume that a storage device has only one power supply, which is a single point of failure. If the power supply fails, the entire device will shut down and the data will be inaccessible.
Likewise, if there is only one memory head unit/memory controller, its failure will destroy the entire data storage system.
If the data storage system does not have RAID or erasure coding, then there may also be a single point of failure.
If a drive fails, the data in that particular drive cannot be accessed, which also causes outages.
Why does a single point of failure exist in a cloud storage system?
It sounds like the single point of failure of a data storage system usually occurs in its hardware devices. However, for cloud storage/distributed storage, does the single point of failure still exist? How serious is the impact?
Centralized cloud storage service providers are often exposed to the risk of failure in a single computer room. This is because cloud storage services, like cloud hosting services, are concentrated in individual or partial computer rooms, and when using cloud storage services, you must choose to use one of the data centers. If there is a power or network failure in the data center where the data is located, it will affect normal services.
So how do we solve the problem of frequent single point failures in centralized cloud service providers? The solution to single point failures is "redundancy". Key servers should be redundant into clusters, network connections should be redundant into multiple channels, storage should be mirrored or RAID redundant, and the entire data center should achieve redundancy through disaster recovery and active-active.
It is undeniable that the leading centralized cloud service providers occupy the cloud storage market, so there are technical "walls" and commercial barriers between them, and it is difficult for users' data to be replicated across "clouds". Independent data centers do not allow data to be snapshotted or replicated between different clouds. Therefore, under the centralized cloud storage business model, if the "cloud" used by the user fails, another "cloud" cannot take over and handle it in time. Because the risks brought by single point failures are still only handled and controlled centrally, users can only rely on the cloud they choose to avoid failures, and there are no other more reliable solutions.
How decentralized cloud storage solves single point of failure
Decentralized cloud storage, due to its natural distributed architecture, largely avoids the problem of centralized single point failure. In the current distributed storage systems, such as Filecoin, Arweave, and Storj, users with idle storage resources can become a member of the storage network and obtain certain incentives by renting out storage space. Each project has its own characteristics, but in the face of single point failure problems, in addition to the natural advantages of distribution, no more innovative technologies have been demonstrated. For example, the use of point-to-point storage order services requires the network to actively reach transactions with multiple storage providers to achieve multiple copies in order to prevent single point failures.
As a secure, efficient, open-source, and scalable decentralized cloud storage network, CESS's distributed structure naturally avoids single point failure problems, and both the network and storage of CESS are distributed. Compared with other decentralized storage projects, CESS is different in that it introduces a new storage proof mechanism - the multi-copy recoverable storage proof mechanism (PoDR²). We analyze the advantages of this storage proof in dealing with single point failures and disaster recovery capabilities from two aspects:
- Multiple copies
PoDR² is a zero-trust data backup and recovery proof algorithm. The stored data is encrypted and sliced and then randomly sent to several miner nodes. Under the PoDR² mechanism, three copies are generated by default. Of course, the system also supports users to customize the number of production copies. The homomorphic signature mechanism is used to ensure that the storage miners store the number of data copies given by the CESS system or specified by the user. Of course, traditional centralized cloud storage also supports multiple backups, but the number of backups is ultimately still centralized storage and control, and security cannot be greatly improved through multiple copies.
- Recoverable
We mentioned in the previous article that "redundancy" is a way to solve single point failures, and behind it is actually replication and recovery. Through the PoDR² mechanism of CESS, after processing multiple copies of the data, redundant coding is used to achieve that when any two blocks of each copy of the data are damaged, they can be restored through redundant coding. Then the CESS system will generate verification parameters for each data segment to assist in the subsequent data storage proof, which will be used for the subsequent replication proof, time-space proof and PoDR² storage proof. In this mechanism, the CESS chain will randomly distribute the data segments of the replica to different storage miners, so that even if a storage miner encounters data deletion, loss, or hacker attack, PoDR² can extract data from other storage miners to provide retrieval and recovery, so as to protect the security of user data storage to the greatest extent.
It is worth mentioning that under the PoDR² mechanism, the CESS system will periodically check the data on the storage miners (i.e. check and prove whether the data stored on the storage node is valid, exists or has been modified) to ensure the authenticity and availability of the data.
To get rid of the single point of failure problem, it is necessary to reflect the ability of each system to predict risks in advance, avoid them through mechanisms, and provide data disaster recovery solutions. From the perspective of data availability, CESS's multi-copy recoverable storage proof mechanism ensures data availability to the greatest extent. From a security perspective, CESS slices and redundancies data and then distributes it to storage miners, achieving global data redundancy and recoverability. CESS truly solves the single point of failure faced by decentralized cloud storage systems, provides the industry with a multi-copy recoverable storage proof mechanism (PoDR²) based on data possession, and achieves encoding and decoding efficiency far exceeding similar projects. Users can store data securely and access data flexibly and efficiently.