Written by: Haotian

"Let users own their own data sovereignty", this slogan originally carried the grand vision of the entire web3 era, but due to challenges such as data chain costs and public privacy, it has never been truly applied. Recently, due to the huge demand for data sources in the AGI large model training market, @withvana, which is about to be launched on Binance, proposed a set of DLP liquidity pool + TEE data ownership solutions. What are the specific highlights?

1) Data sovereignty and personal data dividends are a very old topic. In the web2 era, personal data has exploded, but it has caused platform monopoly and serious infringement of data privacy. In the early days of the web3 era, many projects tried to use smart contract management + decentralized storage + on-chain rights confirmation to realize this vision, but found that the cost of chain storage is high and the transparent nature of on-chain data increases the challenge of protecting privacy.

It is precisely because of this that the exploration of 'data ownership' through blockchain has been shelved due to technical bottlenecks.

2) With the arrival of the AI era, diverse application scenarios such as AGI large model training, multi-modal training, as well as data reasoning and fine-tuning, especially in vertical fields of machine learning and specialized model training, require a large amount of high-quality non-public data as support. This makes the private data held by individuals and institutions a key resource for AI development, thereby creating a massive 'demand side' for making data usable for AI learning.

This is the premise of Vana's governance to solve data sovereignty for users in the AI era, as most individuals in web2 environments have low sensitivity to data ownership, privacy, etc., while the situation in the AI era, viewing 'data' as an oil asset, is completely different.

3) The upcoming mainnet solution from Vana primarily addresses two major issues: 'data double-spending' and 'privacy protection'. Specifically, when data is publicly available on the chain, arbitrary copying and storage can lead to the loss of data scarcity and consequently its value capture ability.

Vana establishes a data market through the DLP (Data Liquidity Pool), using a Proof of Contribution special contribution verification mechanism to support system operations.

Data owners can pledge their data usage rights to specific domain data pools, such as medical case pools, financial transaction pools, etc. After pledging, they will receive DataDAO & data tokens as proof of rights. The fees paid by AI training demand parties for specific data pools will be automatically distributed to the holders of the tokens proportionally, and data owners can also participate in the governance of DataDAO and contribute to joint decisions on DLP operational rules, pricing strategies, etc.

This data liquidity pool is similar to common DeFi trading pools and will manage the entire data validity verification, pool access permissions, token distribution, and other scheduling tasks through smart contracts. These are also key to effectively resolving the 'data double-spending' issue, enabling data tokenization to realize ownership affirmation, with the process fully recorded and coordinated by smart contracts to ensure traceability of data usage and automation of revenue distribution.

Vana addresses data privacy issues through a TEE secure enclave environment. The technical features of TEE enable the realization of 'usage rights' under the premise of data privacy protection, ensuring that throughout the process from personal server storage to access through the DLP pool and then to data training usage, the TEE environment provides 'end-to-end' security protection.

For example, if a user authorizes a portion of their data to the DLP pool, that data will be in a TEE privacy environment. Customers accessing this data will be granted usage rights for training but will not be able to back up or steal that data.

Throughout the process, TEE can provide complete recording and isolated environment processing, ensuring that data maintains its privacy while being used. The 'usable but invisible' characteristic of TEE perfectly solves the privacy protection dilemma. In addition to these two major features, Vana grants complete data control rights to data owners, allowing users to revoke or modify data usage authorization at any time.

Furthermore, Vana employs a clear layered technical architecture: the underlying layer supports users in flexibly storing data through lightweight self-custody or proxy hosting; the middle layer uses DLP as the protocol layer, with smart contracts for refined scheduling management, including core functions such as data circulation, permission control, and revenue distribution; the top layer connects various AI application scenarios, providing standardized interfaces for large model training, data analysis, and other needs.

This layered design ensures data sovereignty while achieving scalability in usage scenarios.

The above.

Finally, to add a perspective, Vana provides a solution for data ownership in the AI era, which is a data rights affirmation 'old narrative' spurred by AI scenarios, and an important part of the entire AI Narrative wave.

The moat that Vana aims to build is that once its entire data collection, usage, and rights chain is connected, it may extend to broader scenarios and fields. Don't forget, the grand vision of data ownership could run through the entire blockchain and web3.