In the era of artificial intelligence, the importance of data is self-evident. As the foundation for AI large models, the quality of the training data source determines the capabilities of AI and the user experience of the product. Internet technology giants with large, multi-dimensional business data have a scenario advantage, and through years of data accumulation during the operation of internet platforms and existing user usage scenarios, they can generate a large amount of private data, giving them a clear advantage in model training optimization.

After the early AI products went live, they relied on feedback from their user base and product interactions to fine-tune their models, creating a data flywheel effect that continuously optimizes iterations. This will become a moat for AI products in the future. However, start-ups in the AI sector struggle with insufficient quantity and quality of data sources to train their models, and these data barriers and the formation of data silos will hinder the development of artificial intelligence.

VANA: Breaking Data Silos, Users Share Data Value

In the early days of the internet, various companies that disrupted traditional industry operating models emerged. However, as the industry developed, leading internet technology companies began to monopolize the market, controlling traffic entry points and possessing vast amounts of user data. These leading platforms can use this user data for algorithm recommendations and credit loans to generate commercial value. Reddit has earned $200 million by selling user-generated content as AI training data, but the users who generated the data did not share in the value and results of that data. The emergence of VANA will break data silos, allowing users to own data and share in the data benefits.

VANA is an open and decentralized data sovereignty protocol, serving as an EVM-compatible L1, allowing users to own their data, contribute personal data, and share the profits generated from artificial intelligence.

VANA addresses the issue of sourcing training data for AI models

It is well known that the training data for AI models from internet technology companies mainly comes from web scraping, paid purchases, and the accumulation of their own business. The advantage of web-scraped data is its easy accessibility, but the data quality is low and cleaning is difficult; paid data is severely homogenized, and due to commercial competition, there are few truly valuable business data sources, making it hard to provide a differentiated advantage for AI models; data accumulated from business scenarios has high value, but this method is not friendly to start-ups and small enterprises.

VANA's data comes from user contributions within the ecosystem. Users participating in the VANA ecosystem contribute social media data from X, LinkedIn, and IoT data to DataDAOs, and this data will be securely stored off-chain. After verification, cleaning, and annotation, the data is applied to the development of AI models. Participating users can gain governance rights over the DataDAO after contributing data, deciding on data usage rights and sharing in the value generated from the data.

Advantages of the VANA model

  • Adopting a decentralized governance model allows users to have ownership of their data and autonomously decide how the data is used;

  • Users can convert data through VANA into tradable data assets, which can be used for decentralized artificial intelligence applications;

  • Data privacy and security are ensured through the use of zero-knowledge proofs (ZKP) and trusted execution environments (TEE).

VANA Network Composition

The main participants in VANA include data contributors, validators, stakers, data consumers, and DLP (Data Liquidity Pool Creator), which is the DataDAO.

1. Data Contributors

Participating users can choose to contribute their data to DataDAOs established within the VANA network. The submitted data is stored off-chain, while proof of contribution is stored on-chain. Taking the ChatGPT DataDAO as an example, users request OpenAI to export ChatGPT data via email, and after receiving a reply, they upload the data and download links through gptdatadao.org.

2. DataDAO

Staking at least $100 worth of VANA can create a registered DataDAO. After registration, the DataDAO will appear on DataHub for data contributors to choose from. To promote the continuous development of DataDAO, VANA will provide rewards to the top 16 DataDAOs ranked by staked VANA. The staking reward for the first three years will be 15% of the total token supply, with a reward cycle every 21 days, and a 7-day unlock period for staking. The amount of VANA rewards is determined by the amount staked, staking duration, and the number of rewards obtained by the DataDAO. DataDAOs need to stake at least 10,000 VANA to have a chance to receive rewards. 50% of the rewards are fixed for the staker, and the remaining rewards are determined by the DataDAO on whether they will be used.

Currently, 17 DataDAOs have been registered and created, including Volara, which focuses on Twitter/X data, R/DataDAO for Reddit, and DLP Labs for LinkedIn resume data. There are already 140,000 Reddit users who have joined R/DataDAO, and the first AI model owned by a user has been trained.

3. Validators

Validators are responsible for the security, integrity, and functionality of the VANA Layer 1 blockchain, ensuring that data transactions are correctly verified, recorded, and added to the blockchain, including L1 Validators and Satya Validators.

L1 Validators are responsible for the security and consensus of VANA. A minimum of 35,000 VANA must be staked to become L1 Validators, with an initial 64 L1 Validators, expanding to 128 later. Each block earns 5 VANA, and downtime incurs a 10% penalty, with rewards decreasing by 10% annually.

Satya Validators provide a trusted execution environment (TEE) to verify the data contributed by users, ensuring the security and privacy of the data verification process. This allows users to earn VANA rewards.

4. Data Consumers

AI model developers, as Data Consumers, select and purchase access to data sets suitable for AI model development, using VANA's infrastructure for AI training and data analysis, collaborating with DataDAOs to optimize AI models.

Taking the ChatGPT DataDAO as an example, the user uploads download links and data files that are transmitted encrypted to Satya Validators. After decryption, Satya Validators perform verification to ensure the authenticity of the data uploaded by the user and that it has not been tampered with.

Application scenarios and economic model of the VANA token

1. Validators stake VANA to ensure network security and verify data to earn VANA rewards;

2. VANA serves as the GAS for executing contracts and interacting with DataDAOs within the network;

3. Users stake VANA in DataDAO to earn VANA staking rewards;

Data Consumers default to using VANA when accessing data;

5. VANA holders participate in governance and vote on proposals, with VANA serving as the main trading pair for tokens issued by DataDAOs.

The total supply of VANA is capped at 120 million, and the token distribution is shown in the figure below.

  • Community

Mainly includes high-quality data contribution rewards for DataDAOs, airdrops for early users, and developers. TGE supplies 20.3% of VANA with no lock-up period.

  • Ecosystem

Mainly includes tokens issued by DataDAOs, block rewards, and partners, with TGE supplying 4.8% of VANA and no lock-up.

  • Investors

Vana has currently secured a total of $25 million in funding, which includes $5 million in strategic round financing from Coinbase Ventures, $18 million in Series A financing from Paradigm, and $2 million in seed round financing from Polychain.

  • Core Contributors

In summary, the total circulation of VANA at TGE will be 30 million, including 4.8 million VANA from the Binance Launchpool.

Legal risks of the VANA model if it exists in China

The VANA decentralized AI model data project aims to solve the data issues in AI model training at a lower cost, enabling entrepreneurs of AI models to access high-quality training data. This breaks down the data silos created by large internet companies, making it possible for Tencent to obtain Alibaba user data to train AI models. It lowers the barrier for individuals and companies dedicated to AI model entrepreneurship, but this model may face risks related to data export in China.

The National Internet Information Office has clarified in the (Guidelines for Data Export Security Assessment Declaration (First Edition)) that data export behaviors include:

(1) Data processors will transfer and store data collected and generated in domestic operations abroad;

(2) Data processors collect and generate data stored domestically, and overseas institutions, organizations, or individuals can query, retrieve, download, or export this data;

(3) Other data export behaviors as specified by the National Internet Information Office.

(Exit and Entry Administration Law of the People's Republic of China) Article 89 clearly states that 'exit' refers to leaving the mainland of China for other countries or regions, traveling from the mainland to the Hong Kong Special Administrative Region, the Macao Special Administrative Region, or traveling from the mainland to Taiwan. This indicates that the determination of whether an exit has occurred is based on jurisdiction.

The creation of DataDAOs and user data contributions have no restrictions; Data Consumers do not need to undergo KYC and can access collected data by simply paying VANA. In this case, domestic users participating in various DataDAOs contributing social media and resume data may involve data export.

Definition of personal data: According to Article 76 of the (Cybersecurity Law of the People's Republic of China): Personal information refers to various information recorded electronically or otherwise that can identify a natural person’s identity alone or in conjunction with other information, including but not limited to a natural person's name, birth date, ID number, personal biometric information, address, phone number, etc.

The resumes and healthcare data collected by DataDAO may involve personal information such as names, birth dates, phone numbers, and even sensitive personal information. (Personal Information Protection Law of the People's Republic of China) There are restrictions on the use of this data and cross-border transfers.


#币安LaunchpoolVANA