TL/ DR

We have discussed how AI and Web3 can leverage each other's strengths and complement each other across various vertical industries such as computing networks, proxy platforms, and consumer applications. When focusing on the vertical field of data resources, emerging Web projects provide new possibilities for data acquisition, sharing, and utilization.

  • Traditional data providers struggle to meet the demand for high-quality, real-time verifiable data from AI and other data-driven industries, particularly in areas of transparency, user control, and privacy protection.

  • Web3 solutions are committed to reshaping the data ecosystem. Technologies such as MPC, zero-knowledge proofs, and TLS Notary ensure the authenticity and privacy protection of data as it circulates among multiple sources, while distributed storage and edge computing provide greater flexibility and efficiency for real-time processing of data.

  • Among them, the emerging infrastructure of decentralized data networks has given rise to several representative projects, including OpenLayer (modular authentic data layer), Grass (utilizing users' idle bandwidth and decentralized crawler node networks), and Vana (a Layer 1 network for user data sovereignty), paving new prospects for AI training and applications through different technological paths.

  • Through crowdsourced capacity, a trustless abstraction layer, and a token-based incentive mechanism, decentralized data infrastructure can provide more private, secure, efficient, and economical solutions than Web2's super-scale service providers, while also granting users control over their data and related resources, building a more open, secure, and interoperable digital ecosystem.

去中心化数据层:AI时代的新基础设施

1. The Data Demand Wave

Data has become a key driver of innovation and decision-making across industries. UBS predicts that the global data volume is expected to grow more than tenfold from 2020 to 2030, reaching 660 ZB. By 2025, each person globally will generate 463 EB (Exabytes, 1EB=1 billion GB) of data daily. The data-as-a-service (DaaS) market is rapidly expanding, with Grand View Research reporting that the global DaaS market will be valued at $14.36 billion in 2023, with an expected compound annual growth rate of 28.1% by 2030, ultimately reaching $76.8 billion. These high-growth figures reflect the demand for high-quality, real-time reliable data across multiple industry sectors.

AI model training relies on a large amount of data input for pattern recognition and parameter adjustment. After training, it also requires datasets to test the model's performance and generalization ability. Moreover, AI agents, as a foreseeable emerging form of intelligent applications, require real-time and reliable data sources to ensure accurate decision-making and task execution.

去中心化数据层:AI时代的新基础设施

(Source: Leewayhertz)

The demand for business analytics is also becoming diverse and widespread, becoming a core tool driving innovation in enterprises. For instance, social media platforms and market research companies need reliable user behavior data to formulate strategies and gain insights into trends, integrating diverse data from multiple social platforms to build a more comprehensive profile.

For the Web3 ecosystem, reliable real data on-chain is also needed to support some new financial products. As more new assets are tokenized, flexible and reliable data interfaces are required to support innovative product development and risk management, allowing smart contracts to execute based on verifiable real-time data.

In addition to the above, there are research, the Internet of Things (IoT), and more. New use cases reveal that various industries have seen a surge in demand for diverse, authentic, and real-time data, while traditional systems may struggle to cope with the rapidly growing data volume and constantly changing demands.

去中心化数据层:AI时代的新基础设施

2. Limitations and Issues of Traditional Data Ecosystem

A typical data ecosystem includes data collection, storage, processing, analysis, and application. The centralized model is characterized by centralized data collection and storage, managed and maintained by core enterprise IT teams, implementing strict access controls.

For example, Google's data ecosystem encompasses multiple data sources from search engines, Gmail to the Android operating system, collecting user data through these platforms, storing it in globally distributed data centers, and then using algorithms to process and analyze it to support the development and optimization of various products and services.

For example, in the financial market, data and infrastructure LSEG (formerly Refinitiv) obtains real-time and historical data from global exchanges, banks, and other major financial institutions while utilizing its own Reuters News network to gather market-related news, using proprietary algorithms and models to generate analytical data and risk assessments as additional products.

去中心化数据层:AI时代的新基础设施

(Source: kdnuggets.com)

Traditional data architectures are effective in professional services, but the limitations of centralized models are becoming increasingly evident. Particularly in terms of coverage of emerging data sources, transparency, and user privacy protection, traditional data ecosystems are facing challenges. Here are a few aspects:

  • Insufficient data coverage: Traditional data providers face challenges in quickly capturing and analyzing emerging data sources like social media sentiment and IoT device data. Centralized systems struggle to efficiently acquire and integrate 'long tail' data from numerous small-scale or non-mainstream sources.

For example, the GameStop incident in 2021 revealed the limitations of traditional financial data providers in analyzing social media sentiment. Investor sentiment on platforms like Reddit quickly changed market trends, but data terminals like Bloomberg and Reuters failed to capture these dynamics in a timely manner, leading to delays in market predictions.

  • Data accessibility is limited: monopolies restrict accessibility. Many traditional providers open some data through API/cloud services, but high access costs and complex authorization processes still increase the difficulty of data integration.

On-chain developers find it challenging to quickly access reliable off-chain data, with high-quality data monopolized by a few giants, leading to high access costs.

  • Data transparency and credibility issues: Many centralized data providers lack transparency regarding their data collection and processing methods and lack effective mechanisms to verify the authenticity and integrity of large-scale data. Validating large-scale real-time data remains a complex issue, and the centralized nature also increases the risk of data tampering or manipulation.

  • Privacy protection and data ownership: Large technology companies have extensively commercialized user data. As the creators of private data, users find it difficult to receive the value they deserve from it. Users often cannot understand how their data is collected, processed, and used, nor can they decide the scope and manner of data usage. Excessive collection and usage have also led to severe privacy risks.

For example, the Facebook Cambridge Analytica incident exposed significant gaps in the transparency and privacy protection of traditional data providers.

  • Data islands: In addition, real-time data from different sources and formats is difficult to integrate quickly, affecting the possibility of comprehensive analysis. Many data is often locked within organizations, restricting data sharing and innovation across industries and organizations, and the data island effect hinders cross-domain data integration and analysis.

For example, in the consumer industry, brands need to integrate data from e-commerce platforms, physical stores, social media, and market research. However, these data may be difficult to integrate due to platform format inconsistencies or isolation. Another instance is that ride-sharing companies like Uber and Lyft, while both collect large amounts of real-time data from users regarding traffic, passenger demand, and geographical locations, cannot aggregate and share this data due to competitive relationships.

In addition, there are issues of cost efficiency, flexibility, and more. Traditional data vendors are actively addressing these challenges, but the emerging Web3 technologies provide new ideas and possibilities for solving these problems.

去中心化数据层:AI时代的新基础设施

3. Web3 Data Ecosystem

Since the release of decentralized storage solutions like IPFS (InterPlanetary File System) in 2014, a series of emerging projects have emerged in the industry, dedicated to addressing the limitations of traditional data ecosystems. We see that decentralized data solutions have formed a multi-layered, interconnected ecosystem that covers all stages of the data lifecycle, including data generation, storage, exchange, processing and analysis, validation and security, as well as privacy and ownership.

  • Data storage: The rapid development of Filecoin and Arweave has demonstrated that decentralized storage (DCS) is becoming a paradigm shift in the storage domain. DCS solutions reduce single points of failure risk through distributed architecture while attracting participants with more competitive cost-effectiveness. With a surge of scalable application cases, the storage capacity of DCS has seen explosive growth (for example, the total storage capacity of the Filecoin network reached 22 exabytes by 2024).

  • Processing and analysis: Decentralized data computing platforms like Fluence have improved the real-time nature and efficiency of data processing through edge computing technology, particularly suitable for IoT and AI inference applications that require high real-time performance. Web3 projects utilize federated learning, differential privacy, trusted execution environments, and fully homomorphic encryption to provide flexible privacy protection and trade-offs at the computing layer.

  • Data market/exchange platforms: To facilitate the quantification and circulation of data value, Ocean Protocol creates efficient and open data exchange channels through tokenization and DEX mechanisms, for example, helping traditional manufacturing companies (Daimler, parent company of Mercedes-Benz) to co-develop data exchange markets for data sharing in supply chain management. On the other hand, Streamr has created a permissionless, subscription-based data stream network suitable for IoT and real-time analytics scenarios, demonstrating excellent potential in projects related to transportation and logistics (for instance, collaborating with Finland's smart city initiative).

As data exchange and utilization become increasingly frequent, the authenticity, credibility, and privacy protection of data have become critical issues that cannot be overlooked. This has prompted the Web3 ecosystem to extend innovation into the fields of data validation and privacy protection, giving rise to a series of groundbreaking solutions.

3.1 Innovations in Data Validation and Privacy Protection

Many web3 technologies and native projects are dedicated to solving data authenticity and private data protection issues. In addition to ZK and MPC, the development of technologies such as Transport Layer Security Notary (TLS Notary) as an emerging verification method is particularly noteworthy.

Introduction to TLS Notary

Transport Layer Security (TLS) is a widely used encryption protocol for network communication, aiming to ensure the security, integrity, and confidentiality of data transmission between clients and servers. It is a common encryption standard in modern web communications and is used in multiple scenarios, including HTTPS, email, instant messaging, and more.

去中心化数据层:AI时代的新基础设施

At its inception a decade ago, the initial goal of TLS Notary was to introduce a third-party 'notary' outside the client (Prover) and server to verify the authenticity of TLS sessions.

Using key splitting technology, the main key of the TLS session is divided into two parts, held by the client and the notary, respectively. This design allows the notary to participate in the verification process as a trusted third party while not accessing the actual communication content. This notary mechanism is aimed at detecting man-in-the-middle attacks, preventing fraudulent certificates, ensuring that communication data is not tampered with during transmission, and allowing trusted third parties to confirm the legitimacy of communication while protecting communication privacy.

Thus, TLS Notary provides secure data validation and effectively balances the demands for validation with privacy protection.

In 2022, the TLS Notary project was rebuilt by the Ethereum Foundation's Privacy and Scaling Exploration (PSE) research lab. The new version of the TLS Notary protocol was rewritten from scratch in Rust, incorporating more advanced encryption protocols (such as MPC). The new protocol features allow users to prove the authenticity of data received from servers to third parties without disclosing data content. While maintaining the core verification functionality of the original TLS Notary, it significantly enhances privacy protection capabilities, making it more suitable for current and future data privacy needs.

3.2 Variants and Extensions of TLS Notary

In recent years, TLS Notary technology has also been continuously evolving, leading to the development of multiple variants that further enhance privacy and verification capabilities.

  • zkTLS: A privacy-enhanced version of TLS Notary that combines ZKP technology, allowing users to generate encrypted proofs of web data without exposing any sensitive information. It is suitable for communication scenarios requiring extremely high privacy protection.

  • 3P-TLS (Three-Party TLS): Introduces a client, server, and auditor, allowing the auditor to verify the security of communication without disclosing the communication content. This protocol is very useful in scenarios that require transparency while also demanding privacy protection, such as compliance reviews or audits of financial transactions.

Web3 projects use these encryption technologies to enhance data validation and privacy protection, break data monopolies, solve data island and trustworthy transmission issues, allowing users to prove ownership of social media accounts, shopping records used for financial lending, bank credit records, occupational backgrounds, and educational certifications without disclosing privacy.

  • Reclaim Protocol uses zkTLS technology to generate zero-knowledge proofs of HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.

  • zkPass combines 3P-TLS technology, allowing users to verify real-world private data without leakage, widely used in scenarios like KYC and credit services, and is compatible with HTTPS networks.

  • Opacity Network is based on zkTLS, allowing users to securely prove their activities across platforms (like Uber, Spotify, Netflix, etc.) without directly accessing the APIs of these platforms, enabling cross-platform activity proof.

去中心化数据层:AI时代的新基础设施

(Projects working on TLS Oracles, Source: Bastian Wetzel)

Web3 data validation, as an important link in the data ecosystem chain, has vast application prospects, with its thriving ecosystem guiding a more open, dynamic, and user-centered digital economy. However, the development of authenticity verification technologies is merely the beginning of building a new generation of data infrastructure.

去中心化数据层:AI时代的新基础设施

4. Decentralized Data Network

Some projects combine the aforementioned data validation technologies to explore more in-depth issues upstream in the data ecosystem, such as data provenance, distributed data collection, and trustworthy transmission. Below, we will focus on several representative projects: OpenLayer, Grass, and Vana, which demonstrate unique potential in building a new generation of data infrastructure.

4.1 OpenLayer

OpenLayer is one of the a16z Crypto Spring 2024 crypto startup accelerator projects, as the first modular real data layer, dedicated to providing an innovative modular solution for coordinating data collection, validation, and transformation to meet the needs of both Web2 and Web3 companies. OpenLayer has attracted support from well-known funds and angel investors, including Geometry Ventures and LongHash Ventures.

Traditional data layers face multiple challenges: a lack of trusted verification mechanisms, reliance on centralized architectures leading to limited accessibility, a lack of interoperability and liquidity between different systems, and no fair data value distribution mechanism.

A more tangible issue is that AI training data is becoming increasingly scarce today. On the public internet, many websites have begun to implement anti-crawling restrictions to prevent AI companies from scraping data on a large scale.

However, in terms of private proprietary data, the situation is even more complex, as many valuable data types are stored in a privacy-protective manner due to their sensitive nature, lacking effective incentive mechanisms. Under this status quo, users cannot safely obtain direct revenue by providing private data, leading to reluctance in sharing these sensitive data.

To address these issues, OpenLayer has built a modular authentic data layer by combining data validation technologies and coordinating data collection, validation, and transformation processes in a decentralized + economic incentive manner, providing a more secure, efficient, and flexible data infrastructure for Web2 and Web3 companies.

4.1.1 Core Components of OpenLayer's Modular Design

OpenLayer provides a modular platform to simplify the processes of data collection, trusted verification, and transformation:

a) OpenNodes

OpenNodes is the core component responsible for decentralized data collection in the OpenLayer ecosystem, collecting data through user mobile applications, browser extensions, and other channels. Different operators/nodes can optimize returns based on their hardware specifications to perform the most suitable tasks.

OpenNodes supports three main types of data to meet the needs of different tasks:

  • Publicly available internet data (such as financial data, weather data, sports data, and social media streams)

  • User private data (such as Netflix viewing history and Amazon order records)

  • Self-reported data from secure sources (such as data signed by proprietary owners or verified by specific trusted hardware).

Developers can easily add new data types, specify new data sources, demands, and data retrieval methods. Users can choose to provide de-identified data in exchange for rewards. This design allows the system to continuously expand to meet new data needs, and the diversity of data sources enables OpenLayer to provide comprehensive data support for various application scenarios, also lowering the barriers to data provision.

b) OpenValidators

OpenValidators are responsible for subsequent data validation, allowing data consumers to confirm that the data provided by users matches the data source completely. All verification methods provided can be encrypted proofs, and the verification results can be confirmed afterward. For the same type of proof, multiple different providers offer services. Developers can choose the verification provider that best suits their needs.

In initial use cases, especially for public or private data from internet APIs, OpenLayer uses TLS Notary as a verification solution to export data from any web application and prove the authenticity of data without compromising privacy.

Not limited to TLS Notary, the modular design allows the verification system to easily incorporate other verification methods to accommodate different types of data and verification needs, including but not limited to:

  1. Attested TLS connections: Utilize Trusted Execution Environments (TEE) to establish certified TLS connections, ensuring the integrity and authenticity of data during transmission.

  2. Secure Enclaves: Use hardware-level security isolation environments (like Intel SGX) to handle and verify sensitive data, providing a higher level of data protection.

  3. ZK Proof Generators: Integrate ZKP, allowing the verification of data properties or computation results without exposing the original data.

c) OpenConnect

OpenConnect is the core module responsible for data transformation within the OpenLayer ecosystem, ensuring usability by processing data from various sources and ensuring interoperability between different systems to meet the needs of different applications. For instance:

  • Convert data into on-chain oracle format for direct use by smart contracts.

  • Transform unstructured raw data into structured data for preprocessing for AI training and other purposes.

For data from users' private accounts, OpenConnect provides data de-identification functions to protect privacy and offers components to enhance security during the data sharing process, reducing data leaks and abuse. To meet the real-time data needs of applications like AI and blockchain, OpenConnect supports efficient real-time data transformation.

Currently, through integration with Eigenlayer, OpenLayer AVS operators listen to data request tasks, responsible for data collection and validation, then reporting the results back to the system. By staking or re-staking assets through EigenLayer, they provide economic guarantees for their actions. If malicious behavior is confirmed, they face the risk of having their staked assets forfeited. As one of the earliest AVS (Active Validation Service) on the EigenLayer mainnet, OpenLayer has attracted over 50 operators and $4 billion in re-staked assets.

In summary, the decentralized data layer built by OpenLayer expands the range and diversity of available data without sacrificing usability and efficiency, while ensuring data authenticity and integrity through encryption technology and economic incentives. Its technology has a wide range of practical use cases for Web3 Dapps seeking to access off-chain information, AI models needing real inputs for training and inference, and companies wishing to segment and target users based on existing identities and reputations. Users can also monetize their private data.

4.2 Grass

Grass is the flagship project developed by Wynd Network, aimed at creating a decentralized web crawler and AI training data platform. At the end of 2023, the Grass project completed a $3.5 million seed round financing led by Polychain Capital and Tribe Capital. Shortly after, in September 2024, the project welcomed a Series A funding round led by HackVC, with participation from well-known investment institutions such as Polychain, Delphi, Lattice, and Brevan Howard.

We mentioned that AI training requires new data exposure, and one solution is to use multiple IPs to break through data access permissions for AI data feeding. Grass starts from this point, creating a distributed crawler node network dedicated to collecting and providing verifiable datasets for AI training using users' idle bandwidth in a decentralized physical infrastructure manner. Nodes route web requests through users' internet connections, accessing public websites and compiling structured data sets. It uses edge computing technology for preliminary data cleaning and formatting to improve data quality.

Grass uses the Solana Layer 2 Data Rollup architecture, built on top of Solana to enhance processing efficiency. Grass uses validators to receive, verify, and batch process web transactions from nodes, generating ZK proofs to ensure data authenticity. Verified data is stored in a data ledger (L2) and linked to the corresponding L1 chain for proof.

4.2.1 Main Components of Grass

a) Grass Nodes

Similar to OpenNodes, end-users install the Grass application or browser extension and run it, using idle bandwidth for web crawling operations. Nodes route web requests through users' internet connections, accessing public websites and compiling structured datasets, employing edge computing technology for preliminary data cleaning and formatting. Users earn GRASS tokens based on the bandwidth and data they contribute.

b) Routers

Connecting Grass nodes and validators, managing the node network and relaying bandwidth. Routers are incentivized to operate and receive rewards, with the reward ratio proportional to the total validated bandwidth relayed.

c) Validators

Receiving, verifying, and batch processing web transactions from routers, generating ZK proofs, using a unique key set to establish TLS connections, and selecting appropriate cipher suites for communication with target web servers. Grass currently uses centralized validators, with plans to transition to a validator committee in the future.

d) ZK Processor (ZK Processor)

Receiving proof of validity for each node session data generated from validators, batch processing the validity proof of all web requests, and submitting it to Layer 1 (Solana).

e) Grass Data Ledger (Grass L2)

Store complete datasets and link them to the corresponding L1 chain (Solana) for proof.

f) Edge Embedded Models

Responsible for converting unstructured web data into structured models usable for AI training.

去中心化数据层:AI时代的新基础设施

Source: Grass

Comparative Analysis of Grass and OpenLayer

Both OpenLayer and Grass leverage distributed networks to provide companies with access to open internet data and closed information requiring authentication. Incentive mechanisms promote data sharing and the production of high-quality data. Both are committed to creating a decentralized data layer to address issues of data access and validation, but they adopt slightly different technological paths and business models.

Differences in Technical Architecture

Grass uses a Layer 2 Data Rollup architecture on Solana, currently employing a centralized validation mechanism with a single validator. Openlayer, as one of the first AVS, is built on EigenLayer, utilizing economic incentives and forfeiture mechanisms to realize a decentralized validation mechanism. It also adopts a modular design, emphasizing the scalability and flexibility of data validation services.

Product Differentiation

Both offer similar To C products, allowing users to monetize data through nodes. In the To B use case, Grass provides an interesting data marketplace model and uses L2 to verifiably store complete data to provide structured, high-quality, verifiable training sets for AI companies. OpenLayer currently does not have a dedicated data storage component but offers broader real-time data stream validation services (Vaas), providing data for AI while also applicable to scenarios requiring rapid responses, such as serving as an oracle for RWA/DeFi/prediction market projects feeding prices, providing real-time social data, etc.

Thus, today Grass's target customer base mainly focuses on AI companies and data scientists, providing large-scale, structured training datasets. It also serves research institutions and enterprises that require extensive web datasets; meanwhile, Openlayer temporarily targets on-chain developers needing off-chain data sources, AI companies that require real-time, verifiable data streams, and users supporting innovative acquisition strategies, such as verifying competitors' usage history like Web2 companies.

Future Potential Competition

However, considering industry development trends, it is indeed possible that the functionalities of the two projects may converge in the future. Grass may soon also provide real-time structured data. Meanwhile, OpenLayer, as a modular platform, may also expand into data set management and have its own data ledger, thus the competitive landscape of the two may gradually overlap.

Furthermore, both projects may consider adding data labeling as a key link. Grass may expedite progress in this area, as they have a large node network—reportedly over 2.2 million active nodes. This advantage gives Grass the potential to provide reinforcement learning services based on human feedback (RLHF), utilizing vast amounts of labeled data to optimize AI models.

However, with its expertise in data validation and real-time processing, OpenLayer's focus on private data may maintain an advantage in data quality and credibility. Additionally, as one of the AVS on Eigenlayer, OpenLayer may have deeper developments in decentralized validation mechanisms.

Although the two projects may compete in certain areas, their unique advantages and technological routes may also lead them to occupy different niche markets within the data ecosystem.

去中心化数据层:AI时代的新基础设施

4.3 VAVA

As a user-centric data pool network, Vana is also committed to providing high-quality data for AI and related applications. Compared to OpenLayer and Grass, Vana adopts a distinctly different technological path and business model. Vana completed a $5 million financing round in September 2024, led by Coinbase Ventures, after previously receiving $18 million in Series A financing led by Paradigm, with other notable investors including Polychain and Casey Caruso.

Originally launched in 2018 as a research project at MIT, Vana aims to become a Layer 1 blockchain specifically designed for users' private data. Its innovations in data ownership and value distribution enable users to profit from AI models trained on their data. The core of Vana lies in facilitating the circulation and monetization of private data through a trustless, private, and attributable Data Liquidity Pool and the innovative Proof of Contribution mechanism.

4.3.1. Data Liquidity Pool

Vana introduces a unique concept of Data Liquidity Pools (DLP): as the core component of the Vana network, each DLP is an independent peer-to-peer network for aggregating specific types of data assets. Users can upload their private data (such as shopping records, browsing habits, social media activities, etc.) to specific DLPs and flexibly choose whether to authorize these data for use by specific third parties. Data is integrated and managed through these liquidity pools, and the data undergo de-identification processing to ensure user privacy while allowing data to participate in commercial applications, such as for AI model training or market research.

Users submit data to DLPs and receive corresponding DLP tokens (each DLP has a specific token) as rewards. These tokens not only represent users' contributions to the data pool but also grant users governance rights over the DLP and future profit-sharing rights. Users can not only share data but also obtain continuous revenue from subsequent calls to the data (with visual tracking). Unlike traditional one-time data sales, Vana allows data to continuously participate in the economic cycle.

4.3.2. Proof of Contribution Mechanism

One of Vana's core innovations is the Proof of Contribution mechanism. This is a key mechanism for Vana to ensure data quality, allowing each DLP to customize a unique contribution proof function based on its characteristics to validate the authenticity and integrity of data and assess its contribution to improving AI model performance. This mechanism ensures that users' data contributions are quantified and recorded, thereby providing rewards to users. Similar to 'Proof of Work' in cryptocurrency, Proof of Contribution allocates earnings to users based on the quality, quantity, and frequency of the data contributed by users. Automatically executed through smart contracts, it ensures that contributors receive rewards that match their contributions.

Vana's technical architecture

1. Data Liquidity Layer

This is the core layer of Vana, responsible for the contribution, verification, and recording of data to DLPs, introducing data as transferable digital assets to the chain. DLP creators deploy DLP smart contracts, setting data contribution purposes, verification methods, and contribution parameters. Data contributors and custodians submit data for verification, and the contribution proof (PoC) module performs data validation and value assessment, granting governance and rewards based on parameters.

2. Data Portability Layer

This is an open data platform for data contributors and developers, also the application layer of Vana. The Data Portability Layer provides a collaborative space for data contributors and developers to build applications using the accumulated data liquidity within DLPs. It provides infrastructure for user-owned model distributed training and AI Dapp development.

3. Universal Connection Group (Connectome)

A decentralized ledger, also a real-time data stream map that runs through the entire Vana ecosystem, using Proof of Stake consensus to record real-time data transactions within the Vana ecosystem. It ensures the effective transfer of DLP tokens and provides cross-DLP data access for applications. Compatible with EVM, allowing interoperability with other networks, protocols, and DeFi applications.

去中心化数据层:AI时代的新基础设施

Vana provides a somewhat different path, focusing on the liquidity and value enablement of user data. This decentralized data exchange model is not only suitable for AI training, data marketplaces, and other scenarios but also provides a new solution for cross-platform interoperability and authorization of user data in the Web3 ecosystem, ultimately creating an open internet ecosystem where users own and manage their data and the smart products created from that data.

去中心化数据层:AI时代的新基础设施

5. Value Proposition of Decentralized Data Networks

Data scientist Clive Humby stated in 2006 that data is the new oil of the era. Over nearly two decades, we have witnessed the rapid development of 'refining' technologies. Big data analytics, machine learning, and other technologies have led to an unprecedented release of data value. According to IDC's forecast, by 2025, the global data sphere will grow to 163 ZB, with most coming from individual users. With the proliferation of emerging technologies like IoT, wearable devices, AI, and personalized services, a large amount of commercially needed data will also originate from individuals in the future.

Pain points of traditional solutions: Innovations unlocked by Web3

Web3 data solutions break through the limitations of traditional facilities through a network of distributed nodes, achieving broader and more efficient data collection while enhancing the real-time acquisition efficiency and verification credibility of specific data. In this process, Web3 technology ensures the authenticity and integrity of data while effectively protecting user privacy, thus achieving a fairer data utilization model. This decentralized data architecture promotes the democratization of data access.

Whether it is the user node model of OpenLayer and Grass or Vana's monetization of user private data, it not only improves the efficiency of specific data collection but also allows ordinary users to share in the dividends of the data economy, creating a win-win model for users and developers, enabling users to truly control and benefit from their data and related resources.

Through token economics, Web3 data solutions redesign incentive models, creating a fairer data value distribution mechanism. They attract a large influx of users, hardware resources, and capital, optimizing the operation of the entire data network.

Compared to traditional data solutions, they also possess modularity and scalability: for instance, Openlayer's modular design offers flexibility for future technological iterations and ecosystem expansions. Benefiting from technical characteristics, it optimizes the data acquisition methods for AI model training, providing richer and more diverse datasets.

From data generation, storage, validation to exchange and analysis, Web3-driven solutions address many shortcomings of traditional facilities through unique technological advantages, while also giving users the ability to monetize personal data, triggering a fundamental shift in data economic models. With further technological development and the expansion of application scenarios, the decentralized data layer is expected, along with other Web3 data solutions, to become a key infrastructure for data-driven industries.