[Research Essay] ZKML and Distributed Computing Power: Potential Governance Narratives for AI and Web3

About ZKML: ZKML (Zero Knowledge Machine Learning) is a machine learning technology that combines zero-knowledge proofs and machine learning algorithms to solve the privacy protection problem in machine learning.
About distributed computing power: Distributed computing power refers to breaking down a computing task into multiple small tasks and assigning these small tasks to multiple computers or processors for processing to achieve efficient computing.
The current state of AI and Web3: out-of-control swarms and increasing entropy
In Out of Control: The New Biology of Machines, Society and Economy, Kevin Kelly once proposed a phenomenon: the bee colony will make election decisions in a group dance according to distributed management, and the entire bee colony will follow the largest bee colony in this group dance to become the master of an event. This is also the so-called "soul of the bee colony" mentioned by Maurice Maeterlinck - each bee can make its own decision and guide other bees to confirm it. The final decision is truly the choice of the group.
The law of increasing entropy and disorder itself follows the laws of thermodynamics. The theoretical embodiment of physics is to put a certain number of molecules into an empty box and measure the final distribution profile. When it comes to people, the crowd generated by the algorithm can show group laws despite individual differences in thinking. They are often confined to an empty box due to factors such as the times, and will eventually make a consensus decision.
Of course, group rules may not be correct, but they can represent consensus. Opinion leaders who can single-handedly pull consensus are absolute super individuals. However, in most cases, consensus does not seek unconditional agreement from everyone, but only requires universal recognition by the group.
We are not discussing here whether AI will lead humans astray. In fact, there have been many discussions of this kind, whether it is the large amount of garbage generated by artificial intelligence applications that has polluted the authenticity of network data, or the errors in group decision-making that will lead some events to a more dangerous situation.
The current state of AI is inherently monopolistic. For example, the training and deployment of large models requires a large amount of computing resources and data, but only a small number of companies and institutions have these conditions. These billions of data are regarded as treasures by each monopoly owner. Not to mention open source sharing, even mutual access is impossible.
This results in a huge waste of data. Every large-scale AI project must repeatedly collect user data, and in the end the winner takes all - whether it is a merger or sale, expanding individual giant projects, or following the logic of traditional Internet land grabbing.
Many people say that AI and Web3 are two different things and have no connection at all. The first half of the sentence is correct, as they are two different tracks, but the second half is problematic. It is natural to use distributed technology to limit the monopoly of artificial intelligence and to use artificial intelligence technology to promote the formation of a decentralized consensus mechanism.
Bottom-level deduction: Let AI form a truly distributed group consensus mechanism
The core of artificial intelligence is still human beings. Machines and models are nothing more than speculations and imitations of human thinking. It is actually difficult to abstract the so-called group, because what we see every day are real individuals. But the model uses massive amounts of data to learn and adjust, and finally simulates the group form. I will not evaluate what kind of results this model will cause, because the incidents of group evil have not happened once or twice. But the model does represent the generation of this consensus mechanism.
For example, for a specific DAO, if governance is implemented as a mechanism, it will inevitably affect efficiency, because the formation of group consensus is a troublesome thing, not to mention a series of operations such as voting and statistics. If the governance of the DAO is embodied in the form of an AI model, all data collection comes from the speech data of everyone in the DAO, so the output decision will actually be closer to the group consensus.
The group consensus of a single model can be trained according to the above scheme, but it is still an island for these individuals. If there is a collective intelligence system to form a group AI, each AI model in this system will work together to solve complex problems, which actually plays a great role in empowering the consensus level.
For small collections, they can either build an ecosystem independently or form a coordinated collection with other collections to meet the needs of ultra-large computing power or data transactions more efficiently and at a low cost. But the problem arises again, the current situation between various model databases is that they are completely distrustful and wary of others - this is exactly the natural attribute of blockchain: through de-trust, truly distributed AI machines can interact safely and efficiently.
A global intelligent brain can make the originally independent and single-function AI algorithm models cooperate with each other, and execute complex intelligent algorithm processes internally, so as to form a distributed group consensus network that continues to grow. This is also the greatest significance of AI empowering Web3.
Privacy and data monopoly? The combination of ZK and machine learning
Whether it is AI doing evil or based on the protection of privacy and fear of data monopoly, humans must take targeted precautions. The core problem is that we don’t know how the conclusion is drawn. Similarly, the operators of the model do not intend to answer questions about this issue. The combination of the global intelligent brain mentioned above needs to solve this problem, otherwise no data party is willing to share its core with others.
ZKML (Zero Knowledge Machine Learning) is a technology that uses zero-knowledge proofs for machine learning. Zero-Knowledge Proofs (ZKP) means that the prover can convince the verifier of the authenticity of the data without revealing the specific data.
Let's take a theoretical example. There is a standard 9×9 Sudoku puzzle. To complete it, you need to fill in the nine squares with numbers from 1 to 9, so that each number can only appear once in each row, column, and square. So how can the person who sets up this puzzle prove to the challenger that the Sudoku puzzle has a solution without revealing the answer?
All you need to do is cover the padding with the answer, then randomly ask the challenger to pick a few rows or columns, shuffle all the numbers, and then verify whether they are all from 1 to 9. This is a simple zero-knowledge proof.
Zero-knowledge proof technology has three characteristics: completeness, correctness, and zero-knowledge, which means that the conclusion is proved without revealing any details. Its technical source can also reflect simplicity. In the context of homomorphic encryption, the difficulty of verification is much lower than the difficulty of generating proof.
Machine Learning uses algorithms and models to enable computer systems to learn and improve from data. By learning from experience in an automated way, the system can automatically perform tasks such as prediction, classification, clustering, and optimization based on data and models.
The core of machine learning is to build models that can learn from data and automatically make predictions and decisions. The construction of these models usually requires three key elements: datasets, algorithms, and model evaluation. Datasets are the foundation of machine learning and contain data samples for training and testing machine learning models. Algorithms are the core of machine learning models and define how models learn and predict from data. Model evaluation is an important part of machine learning, which is used to evaluate the performance and accuracy of the model and determine whether the model needs to be optimized and improved.
In traditional machine learning, data sets usually need to be collected in a centralized place for training, which means that the data owner must share the data with a third party, which may lead to the risk of data leakage or privacy leakage. With ZKML, data owners can share data sets with others without leaking the data, which is achieved by using zero-knowledge proofs.
The effect of zero-knowledge proof in empowering machine learning should be foreseeable, which solves the long-standing problems of privacy black box and data monopoly: Can the project party complete the proof and verification without leaking the user data input or the specific details of the model? Can each collection share its own data or model without leaking private data? Of course, the current technology is still early, and there will definitely be many problems in practice, which does not prevent us from imagining, and many teams are already developing it.
Will this situation lead to small databases taking advantage of large databases? When you consider the governance issue, you will return to the thinking of Web3. The essence of Crypto lies in governance. Whether it is through a large number of applications or sharing, it should receive due incentives. Whether it is through the original Pow, PoS mechanism or the latest PoR (Proof of Reputation mechanism), it is to provide a guarantee for the incentive effect.
Distributed computing power: an innovative narrative that interweaves lies and reality
The decentralized computing power network has always been a popular scenario mentioned in the crypto circle. After all, the computing power required for large AI models is astonishing, and the centralized computing power network will not only cause waste of resources but also form a de facto monopoly - if the final competition is all about the number of GPUs, it would be too boring.
The decentralized computing network is essentially the integration of computing resources scattered across different locations and devices. The main advantages that people often mention are: providing distributed computing power, solving privacy issues, enhancing the credibility and reliability of artificial intelligence models, supporting rapid deployment and operation in various application scenarios, and providing decentralized data storage and management solutions. That's right, through decentralized computing power, anyone can run AI models and test them on real on-chain data sets from users around the world, so that they can enjoy more flexible, efficient, and low-cost computing services.
At the same time, decentralized computing power can solve privacy issues by creating a powerful framework to protect the security and privacy of user data. It also provides a transparent and verifiable computing process, enhances the credibility and reliability of artificial intelligence models, and provides flexible and scalable computing resources for rapid deployment and operation in various application scenarios.
We look at model training from a complete centralized computing process. The steps are usually divided into: data preparation, data segmentation, data transmission between devices, parallel training, gradient aggregation, parameter update, synchronization, and repeated training. In this process, even if the centralized computer room uses a high-performance computing device cluster and shares computing tasks through high-speed network connections, the high communication cost has become one of the biggest limitations of the decentralized computing network.
Therefore, although the decentralized computing power network has many advantages and potential, the development path is still tortuous according to the current communication costs and actual operation difficulties. In practice, the realization of a decentralized computing power network requires overcoming many practical technical problems, whether it is how to ensure the reliability and security of nodes, how to effectively manage and schedule decentralized computing resources, or how to achieve efficient data transmission and communication, etc., these are probably the big problems faced in practice.
End: Expectations for Idealists
Returning to the current business reality, the narrative of the deep integration of AI and Web3 looks so beautiful, but capital and users have told us with more practical actions that this is destined to be an extremely difficult innovation journey. Unless the project can be as strong as OpenAI and hold on to a powerful sponsor, the bottomless R&D costs and unclear business model will completely crush us.
Both AI and Web3 are in the early stages of development, just like the Internet bubble at the end of the last century, which did not officially usher in the real golden age until nearly a decade later. McCarthy once imagined designing artificial intelligence with human intelligence within a vacation, but it was not until nearly seventy years later that we really took the key step of artificial intelligence.
The same is true for Web3+AI. We have determined the correctness of the direction we are heading, and the rest is up to time.
As the tide of time gradually recedes, those people and things that remain standing are the cornerstones of our journey from science fiction to reality.