In this technologically advanced era, the advent of generative AI such as ChatGPT and Midjourney has opened up new possibilities in fields such as design and art, software development, publishing, and even finance. Generative AI is like a miracle that promises to break through the boundaries of human creativity, greatly improve our productivity, and lead us on the road to a higher level of innovation.

In order to develop software such as ChatGPT and Midjourney to the level of today, it takes years of research and training with a lot of data to cultivate the artificial intelligence models behind these software. Take ChatGPT as an example, it needs to be trained with about 570GB of data sets from web pages, books and other sources. Some of this data may come from users, who may be completely unaware that their personal data is being used to train artificial intelligence software. Although most of the data collected and used may be harmless to the users themselves, some sensitive or private data may inevitably be mixed in and fed to the model without the user's consent.

Given the privacy concerns raised by such systems, awareness and attention to data privacy and security issues are growing. Some have called for finding a harmonious balance between leveraging the benefits of artificial intelligence and protecting individual privacy rights. Fortunately, there is a promising technology that can help bridge this gap - zero-knowledge proof (ZKP).

What is zkML?

A zero-knowledge protocol is a method by which one party (the prover) can prove to another party (the verifier) ​​that a certain proposition is true without revealing any other information other than the fact that this particular proposition is true . Since 2022, Zero Knowledge (ZK) technology has been developing steadily and has achieved significant growth in the blockchain field. Projects in the ZK space have been working hard and making significant progress in the areas of scalability and privacy protection.

Machine learning is a branch of artificial intelligence that focuses on developing systems that can learn from past data, identify patterns, and make logical decisions, reducing the need for significant human involvement. It is a data analysis technique that automatically creates analytical models by leveraging various types of digital information, such as numerical data, text content, user interactions, and visual data.

In supervised machine learning, we provide input to a pre-trained model with preset parameters, and the model produces output that can be used by other systems. However, we must emphasize the importance of maintaining the confidentiality and privacy of input data and model parameters. Input data may contain sensitive personal financial or biometric information, while model parameters may involve sensitive elements such as confidential biometric authentication parameters.

The fusion of zero-knowledge technology and artificial intelligence has given rise to zero-knowledge machine learning (zkML), an ethical and powerful new technology that has the potential to completely revolutionize the way we work.

In a recent paper titled The Cost of Intelligence, the Modulus Labs team conducted a comprehensive benchmark of various existing zero-knowledge proof systems using a collection of models of varying sizes. Currently, the main application of ZK in the field of on-chain machine learning is to verify accurate computations. However, with time and further development, especially Succinct Non-Interactive Arguments of Knowledge (SNARKs), ZKP is expected to develop to a point where it can ensure the privacy of users from overly curious validators by preventing the disclosure of inputs.

zkML essentially integrates ZK technology into AI software to overcome its limitations in privacy protection, data authenticity verification, etc.

Use cases for zkML

Although zkML is still an emerging technology, it has attracted widespread attention and has many compelling application scenarios. Some of the notable zkML applications include:

  • Computational completeness (validity ML)

    Validity proofs such as SNARKs and STARKs have the ability to verify the correctness of computations, which can be extended to machine learning tasks by verifying model inferences or confirming that specific inputs lead to specific model outputs. The convenience of proving that the output is the result of a specific model and input combination facilitates the off-chain deployment of machine learning models on specialized hardware while conveniently verifying ZKPs on-chain. For example, Giza is assisting Yearn, a decentralized finance (DeFi) yield aggregator protocol, to demonstrate the accuracy of executing complex yield strategies using machine learning on-chain.

  • Fraud Detection

    By leveraging smart contract data, anomaly detection models can be trained and subsequently recognized by the DAO (decentralized autonomous organization) as valuable indicators for automated security procedures. This proactive and preventive approach makes it possible to automatically execute actions, such as pausing a contract when potential malicious activity is identified, thereby enhancing its effectiveness.

  • Transparency in ML as a Service (MLaaS)

    In cases where multiple companies offer machine learning models through their APIs, it is difficult for users to determine whether the service provider actually provides the model claimed due to the opacity of the API. Providing proof of validity alongside the machine learning model API will provide transparency to users, allowing them to verify the specific model they are using.

  • Filtering in Web3 Social Media

    The decentralized nature of Web3 social applications is expected to lead to an increase in spam and malicious content. The ideal approach for social media platforms is to leverage an open source machine learning model that is collectively agreed upon by the community. Additionally, the platform can provide proof of the model's inferences when choosing to filter posts. Daniel Kang's analysis of the Twitter algorithm using zkML further explores this topic.

  • privacy protection

    The healthcare industry prioritizes the privacy and confidentiality of patient data. By leveraging zkML, medical researchers and institutions are able to develop models using encrypted patient data, ensuring the protection of individual records. This enables collaborative analysis without sharing sensitive information, thereby promoting advances in disease diagnosis, treatment effectiveness, and public health research.

Explore zkML project overview

Many applications of zkML are currently being experimented with, often appearing in hackathons for innovative new projects. zkML opens up new avenues for designing smart contracts, and there are currently several projects actively exploring its applications.

Image credit: @bastian_wetzel
  • Modulus Labs: Using zkML for real-world applications and research. They showcase zkML applications through projects such as RockyBot (an on-chain trading bot) and Leela vs. the World (a chess game where the entire human population competes against a verified on-chain version of the Leela chess engine).

  • Giza: The Starkware-powered protocol enables the ability to deploy AI models on-chain in a completely trustless manner.

  • Worldcoin: A proof-of-personhood protocol leveraging zkML. Worldcoin utilizes custom hardware to process detailed iris scans and incorporates them into its Semaphore implementation. These iris scans enable important functionality such as proof of membership and voting.

in conclusion

Just like ChatGPT and Midjourney went through countless iterations to reach where they are today, zkML is still being improved and optimized, going through iteration after iteration to overcome various challenges from both technical and practical aspects:

  • Quantization with Minimum Precision Loss

  • Managing circuit sizes, especially in multi-layer networks

  • Efficient matrix multiplication proof

  • Dealing with adversarial attacks

In the field of zkML, progress is occurring at an accelerating pace, and is expected to reach levels comparable to the broader field of machine learning in the near future, especially as hardware acceleration techniques continue to advance.

Incorporating ZKPs into AI systems can provide a higher level of security and privacy protection for both users and the organizations leveraging these systems. Therefore, we eagerly await further product innovations in the zkML space, where the combination of ZKPs and blockchain technology creates a safe and secure environment for AI/ML operations in the permissionless world of Web3.