"YOUR DATA IS CURRENCY. COLLECTIVE DATA IS POWER." Reddit Data DAO (r/datadao) company scrolls this loud slogan on its official website.

r/datadao is a decentralized data organization that allows users to provide Reddit data to the platform - users vote on how to use the data, such as authorizing AI companies to train large models, and receive rewards from it. The narrative of r/datadao regaining user data rights is undoubtedly exciting because it hits the pain point of the current data industry: it is difficult to balance personal information rights and industry needs.

Data has become oil, but it seems to have little to do with users

The emergence of generative artificial intelligence applications in recent years has made the value of data more prominent. Big data, high computing power, and strong algorithms are called the three pillars of big models. At the 2023 World Artificial Intelligence Conference (WAIC), the "Top Ten Trends in Artificial Intelligence" report pointed out that "the quality of a model in the future will be determined by 20% of the algorithm and 80% of the data quality."

Garbage in, Garbage out. Data is therefore called the new oil.

A comprehensive view of all datasets featuring language models from GPT-1 to Gopher from 2018 to early 2022. Unweighted size in GB. Image credit: Alan D. Thompson

However, the strong demand for data from big models has created tensions with legitimate rights and interests such as personal privacy and data security. A large amount of personal information has been illegally obtained and traded on the black market, becoming a data source for telemarketing, fraud, and precision marketing.

In 2016, the European Union introduced the GDPR (General Data Protection Regulation), which came into effect two years later. The GDPR gives individuals strong control over their data and sets up a series of systems including informed consent, the right to be forgotten, the right to data portability, the right to access, etc. However, some critics believe that strong supervision and strict punitive measures have harmed the development of the Internet. In China, through laws such as the Cybersecurity Law (passed in 2016), the Civil Code (passed in 2020), the Data Security Law (passed in 2021), and the Personal Information Protection Law (passed in 2021), my country has also established a system to balance the interests between promoting the development and utilization of data and protecting the legitimate rights and interests of individuals and organizations, as well as national security and development.

Although personal information rights have become a personal right in law, it is still difficult for individuals to get a share of data transactions. Reddit revealed in its IPO prospectus in February 2024 that it has achieved a total revenue of US$203 million through data licensing agreements with AI companies. However, users who create data cannot get a penny from it. Lawyer Huang of Mankiw Law Firm believes that there are three main reasons:

First: Individual personal data has basically no value; only “big data” is meaningful to data processors.

Second: Individuals have statutory informed consent procedures for every link in the flow of data, and the complex and unstable authorization chain makes transactions difficult.

Third: Through the ideal "anonymization" processing solution, that is, the process in which personal information is processed so that it cannot identify a specific natural person and cannot be restored, personal data will lose its value; other technical solutions such as privacy computing are still in the exploratory stage.

This creates a situation where processors want to use personal information but cannot obtain full authorization from a large number of users; users want to benefit but have no channels to manage and trade personal information. This problem has long plagued policymakers, academia, and industry.

The significant "Twenty Data Articles" issued in December 2022 proposed exploring a mechanism for trustees to represent personal interests and supervise the collection, processing and use of personal information data by market entities.

At present, there are very few personal data products in domestic data exchanges. This year, Shenzhen Data Exchange designed a personal health data trading product, which implemented the concept of "Twenty Data Rules" to a certain extent. The basic framework is to improve the efficiency of decentralized personal data authorization through a unified authorization service platform, while realizing personal benefits.

In this power/struggle between individuals and companies over data, why does Data DAO think it can help users regain their data rights?

Data DAOs: What and Why

Data DAO (Decentralized Autonomous Organization) is a decentralized autonomous organization based on blockchain technology, which aims to manage and utilize data assets through collective governance mechanisms. It uses smart contracts and decentralized storage technology to achieve transparent, tamper-proof and secure management of data. The core of Data DAO is to transfer data ownership and management rights from traditional centralized platforms to the actual owners of the data, that is, users.

Currently, the data DAO project that has been formed is r/datadao, and Mr. Huang from Mankiw LLP will also conduct compliance analysis based on the project’s business model.

r/datadao's business model

data storage

The underlying network of r/datadao is the Vana network, which is designed to serve the decentralized management and governance of data. It uses IPFS (InterPlanetary File System) as one of its decentralized storage solutions, supporting the secure storage and efficient processing of key data sets for projects such as r/datadao. Therefore, when r/datadao users upload their activity data on Reddit (such as posts and comments) to the platform, this data is decentralized stored through IPFS technology, and users have private keys for data storage and transmission, thereby ensuring data security and access control.

Incentives

Users can earn native tokens $RDAT by contributing Reddit data to r/datadao. These tokens not only represent contributions to the data, but also allow users to participate in the governance decisions of the platform. $RDAT is allocated based on the user's Karma value on Reddit, which is a measure of user community activity and contribution.

Community Governance

r/datadao implements decentralized governance, which means that all important decisions — such as data usage policies, partnerships, and platform upgrades — are decided through voting by users holding $RDAT. This ensures transparency and fairness in the operation of the platform.

Data usage and profit model:

Community members can vote on how to use the collected data. Options may include licensing the data to AI companies for large model training, or sharing data with other companies and research institutions. In this way, r/datadao can generate revenue, and then distribute part of the revenue to data contributors in the form of tokens.

Data Privacy and Security:

Although users submit personal data to r/datadao, the platform ensures the privacy and security of this data through encryption and decentralization technology. This means that the data will not be disclosed or misused without the user's explicit authorization.

It can be seen that compared with the "Twenty Data Regulations" and the Shenzhen Stock Exchange's plan, Data DAO also has the nature of entrusted management of personal data. The difference is that users have more autonomy, and there is a deep relationship between Data DAO and blockchain technology and tokens.

The significance of data DAO is reflected in:

1. By collecting a large amount of personal data, individuals can improve their negotiating position. Not only does the data held by a single user have limited value, but they are also in a weak position and easily exploited in transactions. WPS once allowed the platform to use user documents for AI training in its privacy policy, which caused widespread controversy. Through data DAO, the transaction value of personal data can be increased.

As r/datadao said on its official website: Reddit has been selling our data for $60 million a year and expects to earn $200 million a year from our data. If we unite, we can fight against Reddit and trade this data ourselves.

2. Promote the compliance of data circulation. AI companies like wps face the difficulty of using personal data. Sometimes, under fierce competition, they use illegal web crawlers (bypassing the Robots protocol) and overbearing authorization terms to obtain data. In this way, AI companies are very likely to face accusations of unfair competition, intellectual property rights, privacy, etc. Data DAO organizations like r/datadao will provide the market with more compliant data.

ChatGPT answers "What lawsuits are openai facing"

3. Break the data monopoly and data wall. Internet companies build moats by occupying data. For a long time, data between platforms cannot be interconnected, and even users do not have data ownership. In recent years, with the deepening of antitrust law enforcement, there has been only progress such as WeChat directly opening Taobao links. The right to carry personal information stipulated in the Personal Information Protection Law is in an inactive state because it cannot be operated. The emergence of data DAO can provide a new outlet for personal data of Internet companies, activate the right to carry personal information, and return data to the people.

Compliance Operation of Data DAO

In addition to facing compliance issues such as operating site selection, anti-money laundering, customer identity identification, multi-jurisdiction supervision, etc. that exist in the crypto industry, tokenized data DAOs also need to pay special attention to data compliance.

Informed consent

Data DAO requires written consent from individuals to collect and store personal information. The data DAO should use a conspicuous manner and clear and understandable language to truly, completely and accurately inform individuals of the purpose of processing, processing methods, types of information processed, retention periods, rights exercise procedures, etc.

The method used is determined by majority voting, and opponents cannot be forced to use their personal information in accordance with the voting results.

Sensitive information and information about minors

Sensitive personal information is personal information that, once leaked or used illegally, may easily cause infringement on the personal dignity of a natural person or endanger the safety of his or her person or property. It includes information such as biometrics, religious beliefs, specific identities, medical health, financial accounts, whereabouts, and the personal information of minors.

Personal information processors may only process sensitive personal information when there is a specific purpose and sufficient necessity, and strict protection measures are taken. When processing information of minors, the consent of the guardian should be obtained, and special personal information processing rules should be formulated.

Data cross-border

Taking China as an example, personal information processors who process a certain amount of personal information must store the collected and generated personal information within the country, and data export abroad needs to pass a security assessment by the Cyberspace Administration of China.

Data Security

By formulating internal management systems and operating procedures, and taking appropriate security technical measures such as encryption and de-identification, we can prevent unauthorized access and the leakage, tampering, and loss of personal information.

There are still many regulations that need to be followed depending on the type of data, data usage scenario, and regulatory jurisdiction. It is recommended that Data DAO seek further advice from a lawyer.

Summarize

The narrative of Data DAO helping users regain data rights is undoubtedly exciting, and decentralized arrangements do seem to help return data rights to users. However, the tendency of tokenization makes the problem more complicated. Facing the dual strong supervision of tokens and the data industry, does it mean that Data DAO cannot obtain a legal birth certificate? In any case, this is a direction for data trading that can be explored.

On the other hand, domestic data exchanges, TreeGraph Blockchain Research Institute, etc. have proposed a fully compliant plan to build a personal data trading platform based on blockchain technology. This type of data DAO has relatively stronger policy certainty in large-scale applications.