On November 27, Zhao Changpeng posted on X that tasks such as AI data labeling are very suitable to be completed through blockchain, which can make use of low-cost global labor and break geographical restrictions through instant payment through cryptocurrency.
Data labeling refers to the manual or automatic labeling of raw data (such as text, images, audio, etc.) to give it specific structured information. Labeled data is used to train machine learning or artificial intelligence models. For example, labeling text with sentiment categories (positive, negative, neutral) is a type of data labeling. The use of blockchain for artificial intelligence data labeling is particularly suitable for data labeling scenarios that require high transparency, credibility, and distributed collaboration. This not only improves the efficiency and quality of data labeling, but also creates new possibilities for global collaboration and data trading.
What are the high-quality projects in this field at present? What are the development prospects of this field?
The role of blockchain in AI data labeling
Blockchain is a decentralized distributed ledger technology with the characteristics of transparency, immutability and traceability. These characteristics can solve the following problems in traditional methods in data labeling:
Data authenticity and tamper-proofing: Each marked record is written into the blockchain and cannot be changed at will, ensuring the credibility of the marking.
Task allocation transparency: Blockchain can record the distribution, execution and review process of tasks to prevent unfair task allocation or tampering of results.
Incentive Mechanism: Using blockchain’s smart contract technology, data labelers can automatically earn cryptocurrency or other rewards by completing tasks.
Data traceability: The source of each tag, the information of the annotator and the reviewer can be tracked.
Application Scenario
Distributed labeling: Using blockchain, data labeling tasks are distributed to labelers around the world, making data processing more efficient.
Quality Review: The labeling results of multiple people are compared and reviewed through blockchain technology to ensure the accuracy of labeling.
Labeled data transactions: Labeled data can be traded on the blockchain, and buyers and sellers do not need to worry about the integrity or authenticity of the data.
Privacy protection: Use blockchain to encrypt and store labeled data to ensure the security of private data.
Related Projects
OORT DataHub: Provides decentralized data annotation services based on blockchain, using the Proof of Honesty algorithm for quality control. Its platform distributes tasks, audits data quality, and pays rewards through smart contracts, attracting global annotators to join, and ensuring the transparency and privacy of annotated data.
The economic model of the project token is as follows:
Community Rewards: Users can be rewarded with $OORT tokens for participating in data annotation and analysis. In addition, they may receive unique NFTs tied to their contributions, which provide additional benefits such as increased annual percentage yield (APY) rewards, equipment discounts, and DAO voting rights.
Task Staking: Participants are required to stake at least 210 $OORT tokens to demonstrate their commitment to the task, and tokens will be returned and rewards will be issued upon completion of the task.
Sales revenue sharing: Some NFT holders can also receive dividends from future data sales revenue, further increasing long-term returns.
PublicAI: Solana’s on-chain AI ecosystem project aims to connect data demanders and global annotators, reward participants through a crypto token incentive mechanism, and use blockchain technology to record the details of the annotation process to ensure data security and privacy.
The economic model of the project token is as follows:
Community rewards: 10% of Public tokens will be used for airdrop rewards for early user interactions. Specifically, there are three ways to obtain airdrops: Become an AI Builder: collect high-quality Internet content; become an AI Validator: verify the collected content; become an AI Developer: train AI agents using verified data sets.
Token distribution: The project completed a $2 million seed round of financing in January 2024. Investors include IOBC Capital, Foresight Ventures, Solana Foundation, Everstate Capital, and many well-known academicians and professors in the field of artificial intelligence. The specific details of the PublicAI token distribution have not yet been clarified.
Challenges
At present, several major factors constrain the development of this field: first, AI data labeling requires higher computing and storage resources; second, project performance is subject to the scalability of blockchain; third, technical standardization and supervision are still imperfect.
Among them, the second point is perhaps the biggest challenge currently faced. Because AI data labeling and model training usually require a lot of computing resources, and the computing power of nodes in the blockchain network is limited. How to effectively integrate and utilize distributed computing resources to meet the computing needs of AI data labeling projects while ensuring the decentralized characteristics of blockchain is an urgent problem to be solved. It is reported that Greenfield, a subsidiary of Binance, is providing storage support for this track, and we look forward to having more storage and computing resources to practice in this field.