Written by: Haotian
'Let users have their own data sovereignty', this slogan originally carried the grand vision of the entire web3 era, but due to challenges such as data on-chain costs and public privacy, it has not been truly applied. Recently, due to the massive demand for data sources in the AGI large model training market, @withvana, which is about to launch on Binance, has proposed a data ownership solution combining DLP liquidity pools and TEE. What are the highlights of this solution?
1) Data sovereignty and personal data dividends are very old topics. In the web2 era, personal data experienced a huge explosion but led to platform monopolies and serious violations of data privacy; in the early web3 era, many projects tried to realize this vision through smart contract management + decentralized storage + on-chain rights confirmation, but found that on-chain storage costs were high and the transparency of on-chain data increased the challenge of protecting privacy.
It is precisely because of this that the exploration of 'data ownership' through blockchain has been shelved due to technical bottlenecks.
2) With the arrival of the AI era, the training of AGI large models, multimodal training, and diverse application scenarios such as data inference and fine-tuning, especially in vertical fields of machine learning and specialized model training, require a large amount of non-public high-quality data as support, making the private data held by individuals and institutions a key resource for AI development, thus making data for AI learning a massive 'demand side'.
This is the prerequisite for Vana's governance to solve data sovereignty for users in the AI era, because most individuals in the web2 environment have low sensitivity to data ownership, privacy, etc., while the situation in the AI era, which regards 'data' as a petroleum asset, is completely different.
3) The Vana solution, which is about to launch its mainnet, mainly addresses two major issues: 'data double spending' and 'privacy protection'. Specifically, when data is publicly available on the chain, arbitrary copying and storage may lead to the loss of data scarcity and thus lose its value capture ability.
Vana establishes a data market through the DLP (Data Liquidity Pool), using a special contribution proof mechanism called Proof of Contribution to support system operation.
Data owners can stake their data usage rights in specific domain data pools, such as medical case pools, financial transaction pools, etc. After staking, they will receive DataDAO & data tokens as proof of rights. When AI training demand-side pays fees for using specific data pools, the fees will be automatically distributed to the holders of the tokens in proportion, and data owners can also participate in DataDAO governance, engaging in joint decision-making regarding DLP operational rules, pricing strategies, etc.
This data liquidity pool is similar to common DeFi trading pools and will manage the entire data validity verification, pool access permissions, token distribution, and other scheduling tasks through smart contracts. These are also key to effectively solving the 'data double spending' problem, enabling tokenization of data to confirm ownership, with the entire process recorded and coordinated by smart contracts to ensure the traceability of data usage and the automation of benefit distribution.
Vana addresses data privacy issues through TEE secure enclave environments, where the technical characteristics of TEE enable the realization of 'usage rights' under the premise of data privacy protection, providing 'end-to-end' security throughout the process from personal server storage to data access via DLP pools, to data training usage.
For example, if a user authorizes part of their data to the DLP pool, that portion of data will remain in a TEE privacy environment, and customers accessing that data will be granted usage rights for training but will not be able to back up or steal that data.
Throughout the process, TEE can provide full recording and isolated environment processing, ensuring that data remains private while being used. The 'available but invisible' feature of TEE perfectly addresses the privacy protection challenge. In addition to these two major features, Vana gives complete data control to data owners, allowing users to withdraw or modify data usage authorization at any time.
Furthermore, Vana adopts a clear layered technical architecture: the bottom layer supports users to flexibly store data through lightweight self-custody or agent hosting; the middle layer uses DLP as the protocol layer, conducting refined scheduling and management through smart contracts, including core functions such as data flow, access control, and benefit distribution; the top layer connects various AI application scenarios, providing standardized interfaces for large model training, data analysis, and other needs.
This layered design not only ensures data sovereignty but also realizes the scalability of use cases.
The above.
Finally, I would like to add a point: Vana provides a solution for data ownership in the AI era, which is an 'old narrative' of data rights triggered by AI scenarios, forming an important part of the entire AI Narrative wave.
The moat that Vana aims to build lies in the fact that once its entire data collection, use, and rights chain is opened up, it could extend to broader scenarios and fields. Don't forget, the grand vision of data ownership may run through the entire blockchain and web3.