Grass's positioning and use cases

Grass is a project deployed on the Solana chain, integrating AI, Depin, and Solana technology, positioned as the data layer for AI. It is a decentralized web scraping platform designed to help companies and non-profits train artificial intelligence (AI) by leveraging unused internet bandwidth. It achieves web scraping through a browser extension application, utilizing individuals' unused internet bandwidth, and rewards users with Grass Points. Grass aims to redefine internet incentive structures by allowing users to directly benefit from the network and ensures the value of the internet is in the hands of users. Currently, the network has over 2 million user-operated nodes that have scraped a vast amount of data for AI models.

Technical Architecture

Grass Sovereign Data Rollup is a network specifically built for Grass on Solana, enabling the protocol to handle all transactions from data sources to processing, validating, and building datasets. This network is built around validators (who issue data collection instructions), routers (who manage web request distribution), and Grass nodes (that users use to contribute their idle network resources). The specific architecture is as follows:

Validator: Receives, verifies, and batches web transactions from routers. Then, generates ZK proofs to check on-chain session data. On-chain proofs can be referenced in datasets to verify data sources and track their lineage throughout their lifecycle. The validator set will transition from an initially centralized framework of a single validator to a decentralized validator committee.

Router: Connects Grass nodes to validators. The router maintains traceability of the node network and relays bandwidth. Grass incentivizes its operation based on the total validated bandwidth provided through relaying. The Router is responsible for reporting the following metrics to validators in the network: size (in bytes) of each incoming and outgoing request; latency of each node and validator; network status of each connected node.

Grass Node: Utilizes users' unused bandwidth and relays traffic so that the network can scrape public web data (not users' personal data). Running nodes is free, and the people running the nodes (node operators) are rewarded based on the data relayed through them.

ZK Processor: Validates the effectiveness of session data for all web requests in batches and submits the proof to the L1 blockchain. This operation permanently records every scraping action performed on the network. This also lays the foundation for a comprehensive understanding of the sources of AI training data.

Grass Data Ledger: This is the link between the scraped data and the L1 settlement layer. The ledger is an immutable data structure that hosts the complete dataset and links the data to its corresponding on-chain proofs, serving as a repository for ensuring data provenance.

Edge Embedding Models: This is the process of converting unstructured web data into structured models. It includes all necessary preprocessing steps to ensure that the collected raw data is cleaned, normalized, and structured, in a format suitable for AI models.

Technical Features

In the architecture described above, the Grass network sits between clients and web servers, with clients issuing web requests that are sent through validators and ultimately routed through Grass nodes. Regardless of which website the client requests, its server will respond to the web request, allowing its data to be scraped and sent back through the line. It will then be cleaned, processed, and prepared for training the next generation of AI models.

This process requires understanding two major additional features: Grass data ledger and ZK processor.

The Grass data ledger is where all data is ultimately stored. It is a permanent ledger of every dataset scraped by Grass, embedded with metadata that records its original lineage from the moment of origin. The metadata proof for each dataset will be stored on Solana's settlement layer, and the settlement data itself is also provided through the ledger.

The purpose of the ZK processor is to help record the sources of the datasets scraped on the Grass network. The process is as follows: When nodes on the network (i.e., users with the Grass extension installed) send web requests to a given website, it returns an encrypted response containing all the data requested by the node. This is the moment when the dataset is born, marking the origin that needs to be recorded, as well as the moment to record metadata. It includes many fields such as session keys, the URL of the scraped website, the IP address of the target website, transaction timestamps, and, of course, the data itself. Thanks to this necessary information and the clearly sourced datasets, AI models can receive correct and faithful training.

ZK processors can ensure that data needing on-chain settlement is not visible to Solana validators. Additionally, a large volume of web requests to be executed on Grass in the future will exceed the throughput that L1 can handle. Grass will soon scale to handle tens of millions of web requests per minute, and the metadata for each request will need to be settled on-chain. Without ZK processors first proving and batching, it would be impossible to submit these transactions to L1. Therefore, Rollup is the only feasible method to achieve the planned goals.

In addition to recording the source websites of datasets, metadata also indicates which node it was routed through on the network. This means that whenever a node scrapes the web, it can be rewarded for its contribution without revealing any of its identity. This allows Grass to proportionally reward nodes, with those scraping more and more valuable data receiving greater incentives. This mechanism will significantly enhance rewards in the world's hottest regions, ultimately encouraging people in those areas to register and increase network capacity. The larger the scale of the joined network, the greater the capacity that Grass can scrape, and the larger the stored network data repository. More data means that Grass can provide more data to AI labs that need training data, thus incentivizing the network to continue growing.

Running Grass nodes and security mechanisms

Running Grass nodes is free and acts as a gateway to the internet for the network. Node operators (i.e., application users) earn rewards for the traffic relayed through their nodes and receive network traffic based on their reputation score and geographic demand.

Grass nodes have two main uses: relaying traffic initiated by clients and instructed by validators (i.e., web requests); returning encrypted web server responses to designated routers.

The systems supported by nodes are shown in the above image, and the process of running nodes is also very straightforward: create an account, download the Grass desktop application, and connect to the network.

After connecting, nodes will automatically register on the network. Operators are responsible for maintaining network uptime so that nodes can forward network requests to public network servers. Each request sent to Grass nodes is an encrypted data packet. The packet only provides directional information to the node at each packet destination. Network requests are authenticated through digital signatures from all relevant parties. These signatures verify the legitimacy of the request and determine whether it should be forwarded to the target network server (i.e., public websites). This encryption process prevents data tampering and ensures that validators can accurately measure the reputation of each node.

Node reputation scores primarily include the following points:

Integrity: Assess whether the data is complete and whether the dataset contains all necessary data points required for the intended use cases.

Consistency: Assess the consistency of data across different datasets or within the same dataset over time.

Timeliness: Measure whether the data is up-to-date when needed.

Availability: Assess the availability of data for each node.

In terms of security mechanisms, the Grass network does not use user nodes (i.e., computers) or monitor any actions performed by users on their computers. What it does is merely route internet traffic through users' IP addresses, completely unrelated to users' activities. This means that Grass has zero access to users' personal data, and the scraped data is 100% sourced from public network data.

Additionally, Grass uses bandwidth encryption to ensure all users are protected while sharing internet connections. Grass also collaborates with leading network security compliance auditing company AppEsteem, which monitors Grass products 24/7 for vulnerabilities, leaks, backdoors, and malware to ensure user safety. AppEsteem certification is highly regarded in the cybersecurity industry, and obtaining this certification means Grass products are also whitelisted by top anti-malware applications, including Avast, Microsoft Defender, McAfee, AVG, etc.

Functions of Grass token

Grass token holders can participate in the Grass network in several ways:

Transactions and Repurchases: After decentralization, Grass will be used to support network scraping transactions, dataset purchases, and LCR (live context retrieval) usage.

Staking and Rewards: Stake Grass to the router to facilitate network traffic and earn rewards for contributing to network security.

Network Governance: Participate in the development of the Grass network, including proposing and voting on network improvements, coordinating partnerships with organizations, and determining incentive mechanisms for all stakeholders.

According to statistics from the Dune website, the current annualized return on staking Grass is around 45%, with about 33% of grass tokens participating in staking, and the staking amount exceeding 26 million.

Router Staking and Rewards

Routers act as decentralized hubs, connecting all network nodes and managing the incoming and outgoing web requests for validators. Router operations are incentivized, rewarding based on the staking amount delegated to each router. All traffic routed through Router relays is encrypted and measured to ensure security and performance.

Currently, the staking amounts of various Routers are shown in the above image. Users can stake Grass to Routers to earn rewards on their behalf, with different commissions for each Router.

Currently, the staking amount of Grass on DBunker is about 1.43 million, with a minimum staking period of 7 days and a commission of 10%. (Source of data: https://www.grassfoundation.io/stake/delegations) Users only need to click STAKE to connect their wallets, stake Grass, and earn Router staking rewards.

Summary

Grass is committed to building a fair and open decentralized data layer aimed at addressing the ethical issues of current internet data extraction and data quality, opposing the data monopoly controlled by a few large companies. In terms of technical architecture and features, Grass introduces a metadata mechanism to record the sources of all datasets by building data Rollup. The ZK proofs of this data are stored on the L1 settlement layer, while the metadata itself will eventually be bound to its underlying datasets, as these datasets are stored on Grass's data ledger. Therefore, ZK proofs lay the groundwork for increasing transparency and providing node providers with rewards proportional to their workload, which is also a crucial factor in incentivizing the expansion of the Grass network.

Grass focuses on the intersection of cryptocurrency and AI data. Unlike traditional players with closed-source, centralized AI, it is the original decentralized source of AI data. As an important participant in the web3 wave, Grass builds a fair and open data layer for AI companies and protocols through decentralized technology, with promising development prospects based on market demand.