Grass's Positioning and Use Cases
Grass is a project deployed on the Solana chain, combining AI, Depin, and Solana technologies, positioned as a data layer for AI. It is a decentralized web scraping platform aimed at helping companies and non-profit organizations train artificial intelligence (AI) by leveraging unused internet bandwidth. It achieves web scraping through a browser extension application, utilizing individuals' unused internet bandwidth, rewarding users with Grass Points, and aims to redefine the internet incentive structure, allowing users to benefit directly from the internet and ensuring that the value of the internet is in the hands of the users. Currently, the network has over 2 million users running nodes, scraping a large amount of data for AI models.
Technical Architecture
Grass Sovereign Data Rollup is a network specially built by Grass on Solana, enabling the protocol to handle all transactions from data sources to processing, verification, and dataset construction. The network is built around validators (issuing data collection instructions), routers (managing web request distribution), and Grass nodes (used by users to contribute their unused network resources). The specific architecture is as follows:
Validator: Receives, verifies, and batches web transactions from routers. Then, generates ZK proofs to check session data on-chain. On-chain proofs can be referenced in datasets to verify data provenance and track its lineage throughout its lifecycle. The validator set will transition from an initially centralized framework of a single validator to a decentralized validator committee.
Router: Connects Grass nodes to validators. Routers keep the node network traceable and relay bandwidth. Grass incentivizes its operation based on the total validated bandwidth provided through relaying. Routers are responsible for reporting the following metrics to validators in the network: the size (in bytes) of each incoming and outgoing request; the latency for each node and the validator; the network status of each connected node.
Grass Node: Utilizes users' unused bandwidth and relayed traffic so that the network can scrape public web data (not users' personal data). Running a node is free, and node operators are rewarded based on the data relayed through their nodes.
ZK Processor: Batch processes the validity proofs of all web request session data and submits the proofs to the L1 blockchain. This operation permanently records every scraping action performed on the network. It also lays the foundation for a comprehensive understanding of the sources of AI training data.
Grass Data Ledger: This is the link between the scraped data and the L1 settlement layer. The ledger is an immutable data structure that hosts complete datasets and links data to its respective chain proofs, serving as a repository to ensure data provenance.
Edge Embedding Models: This is the process of converting unstructured web data into structured models. This includes all necessary preprocessing steps to ensure that the collected raw data is cleaned, normalized, and structured, formatted to meet the requirements of AI models.
Technical Features
In the above architecture, the Grass network is positioned between the client and the web server. The client initiates web requests, which are sent through validators and ultimately routed through Grass nodes. Regardless of which website the client requests, its server will respond to the web request, allowing its data to be scraped and sent back along the line. It will then be cleaned, processed, and prepared for training the next generation of AI models.
This process requires understanding two main additional features: Grass Data Ledger and ZK processors.
The Grass Data Ledger is where all data is ultimately stored, serving as a permanent ledger for each dataset scraped by Grass, embedding metadata that records its earliest lineage from the moment of origin. The metadata proof for each dataset will be stored on Solana's settlement layer, and the settlement data itself is also provided through the ledger.
The purpose of the ZK processor is to help record the sources of scraped datasets on the Grass network. The process is as follows: when a node on the network (i.e., a user with the Grass extension installed) sends a web request to a given website, it returns an encrypted response containing all the data requested by the node. This is the moment the dataset is born, the origin moment that needs to be recorded, and also the moment for recording metadata. It contains many fields, such as session key, scraped website URL, target website IP address, transaction timestamp, and of course, the data itself. Thanks to this necessary information and the clear data provenance of the dataset, AI models can receive accurate and faithful training.
ZK processors can ensure that the data needing on-chain settlement is not visible to Solana validators. Moreover, a large number of web requests executed on Grass in the future will exceed the throughput that L1 can handle. Grass will soon scale to the level of executing tens of millions of web requests per minute, and the metadata for each request will need to be settled on-chain. Without ZK processors performing proofs and batching first, it is impossible to submit these transactions to L1. Thus, Rollup is the only feasible method to achieve the planned goals.
In addition to recording the source websites of datasets, metadata also indicates which node it was routed through on the network. This means that every time a node scrapes the network, it can be rewarded for its contributions without revealing any of its personal information. This allows Grass to proportionally reward nodes, where nodes that scrape more and more valuable data will receive greater incentives, significantly boosting rewards in the world's hottest regions, ultimately encouraging more people in those areas to register and increase network capacity. The larger the size of the joining network, the greater the capacity Grass can scrape and the larger the stored network data repository. More data means Grass can provide more data for AI laboratories that need training data, thus incentivizing the network to continue growing.
Grass Node Operations and Security Mechanisms
Running a Grass node is free, serving as a gateway for the network to the internet. Node operators (i.e., application users) are rewarded for traffic relayed through their nodes and receive network traffic based on their reputation scores and geographical demand.
Grass nodes have two main purposes: to pass traffic initiated by clients and indicated by validators (i.e., web requests); and to return encrypted web server responses to the specified routers.
The systems supported by nodes are shown in the figure above, and the process of running a node is also very simple: create an account, download the Grass desktop application, and connect to the network.
After connecting, nodes will automatically register on the network. Operators are responsible for maintaining network uptime so that nodes can forward network requests to public network servers. Each request sent to a Grass node is an encrypted packet. The packet only provides direction to the node at each packet destination. Network requests are authenticated through the digital signatures of all relevant parties. These signatures will validate the legitimacy of the request and determine whether it should be forwarded to the target network server (i.e., public websites). This encryption process prevents data tampering and ensures that validators can accurately measure the reputation of each node.
Node reputation scoring mainly includes the following points:
Integrity: Assess whether the data is complete and whether the dataset contains all the necessary data points required for the expected use case.
Consistency: Check the consistency of data across different datasets or within the same dataset over time.
Timeliness: Measure whether the data is up to date when needed.
Availability: Assess the degree of data availability for each node.
In terms of security mechanisms, the Grass network does not use user nodes (i.e., computers) or observe any actions performed by users on their computers. What it does is route internet traffic through the user's IP address, entirely unrelated to the user's activities. This means that Grass has zero access to the user's personal data, and the data collected is 100% sourced from public network data.
In addition, Grass uses bandwidth encryption to ensure that all users are protected while sharing an internet connection. Grass also collaborates with leading cybersecurity compliance auditing company AppEsteem, which monitors Grass's products 24/7 for vulnerabilities, leaks, backdoors, and malware to ensure user safety. AppEsteem certification holds a high reputation in the cybersecurity industry, and obtaining this certification means Grass's products are also whitelisted by top anti-malware applications, including Avast, Microsoft Defender, McAfee, AVG, etc.
Functions of Grass Token
Grass token holders can participate in the Grass network in several ways:
Transactions and Repurchases: After decentralization, Grass will be used to support network scraping transactions, dataset purchases, and LCR (live context retrieval) usage.
Staking and Rewards: Stake Grass to routers to facilitate network traffic and earn rewards for contributing to network security.
Network Governance: Participate in the development of the Grass network, including proposing and voting in support of network improvements, coordinating with which organizations to collaborate, and determining the incentive mechanisms for all stakeholders.
According to statistics from the Dune website, the current annualized yield for staking Grass is around 45%, with approximately 33% of Grass tokens participating in staking, totaling more than 26 million.
Router Staking and Rewards
Routers serve as decentralized hubs, connecting all network nodes and managing incoming and outgoing web requests from validators. Router operations are incentivized, rewarding them in proportion to the amount staked for each router. All traffic relayed through the Router is encrypted and measured to ensure security and performance.
Currently, the staking amounts of each Router are shown in the figure above, and users can stake Grass to Routers on their behalf to earn rewards, with different commission rates for each Router.
Currently, DBunker has approximately 1.43 million Grass staked, with a minimum staking period of 7 days and a commission of 10%. (Data source: https://www.grassfoundation.io/stake/delegations) Users just need to click STAKE to connect their wallet, stake Grass, and earn Router staking rewards.
Summary
Grass is committed to building a fair and open decentralized data layer aimed at addressing the ethical and data quality issues of current internet data extraction, opposing the phenomenon of data monopoly controlled by a few large companies. In terms of technical architecture and features, Grass constructs a data Rollup and introduces a metadata mechanism that records the sources of all datasets. The ZK proofs of these data are stored on the L1 settlement layer, while the metadata itself will ultimately be bound to its underlying datasets, as these datasets are stored on Grass's data ledger. Therefore, ZK proofs lay the foundation for increasing transparency and providing node providers with rewards proportional to their workload, which is also an important factor in incentivizing the expansion of the Grass network.
Grass focuses on the intersection of cryptocurrency and AI data, unlike traditional participants in closed-source, centralized AI, it is the original decentralized source of AI data. As an important player in the web3 wave, Grass builds a fair and open data layer for AI companies and protocols through decentralized technology, aiming to meet market demands with promising development prospects.