Technologist and CEO of Synesis One, Isaac Bang, warns against the “extremely dangerous” scenario of a few tech giants hoarding data and leading the artificial intelligence (AI) race. He argues for the “democratization” of AI power, ensuring the eventual “winner” of the ongoing AI race does not become an industry behemoth.
According to Bang, part of the solution lies in prioritizing decentralized data crowdsourcing over relying on large data-focused firms. As Bang explains in his written responses to Bitcoin.com News, decentralized data crowdsourcing lets companies avoid utilizing in-house data scientists. Instead, they can “pitch work” to a general pool of digital workers or specialists for data analysis tasks.
This model, Bang believes, is ideal for companies seeking to scale but lacking in-house resources. Beyond the commercial advantage, decentralized data crowdsourcing also helps combat the data bias challenge faced by centralized tech giants.
While governments express public safety concerns regarding decentralized data management, Bang nevertheless cautions against broad regulations that may eventually stifle innovation. Instead, he urges regulators and lawmakers to study how “decentralized data sourcing can and is being utilized” before enacting policies.
Bang’s additional responses address competition within the AI industry and the inherent risks associated with AI use. Below are the Synesis One CEO’s answers to questions sent.
Isaac Bang (IB): AI is the key technology ushering in the fourth industrial revolution, and its impacts are far wider than we can imagine currently. A few dominant players hoarding the data and leading the AI race is extremely dangerous in many ways. Not only will AI technology enable businesses to become more productive and maximize their bottom line, but it will also enable governments to enhance its military capabilities both physically and digitally. The “winner” of the AI race will be a dominant force, and it’s critical we take action now to democratize the power of AI for the good of all.
IB: Traditionally, companies collect data from its users/customers using the product or service provided. To utilize the data collected for AI, companies employ data scientists and other specialists to clean and annotate the data. The traditional methods of collecting and preparing data are efficient for large companies with many users and lots of money. However, for smaller and medium sized companies, scaling its data needs will be tough.
Decentralized data crowdsourcing is sourcing raw data or data preprocessing through a large network of digital workers who are willing and able to provide the data or preprocessing work. Companies or developers can, without having users or in-house data scientists, place a bounty for data tasks from a general pool of digital workers or specialists to perform data work. This enables companies to scale without needing to spend an immense amount of money and time toward hiring in-house.
IB: Humans have the ability to perform logical reasoning. AI using machine learning today uses statistical computation to recognize patterns, without any logical reasoning. As AI models improve, the need for higher quality data and domain-specific data becomes more and more valuable. For example, a general LLM is not suitable for use in a medical setting. The LLM could be fine tuned for a specific field of medicine, but doing so would require humans with expert knowledge in that field. This concept not only applies to general LLMs, but any other AI applications with more specific use cases.
IB: It’s simple – the more diverse the pool of data providers and data annotators, the more diverse and representative the data will be. In a decentralized crowdsourcing network, the providers of the raw data and/or data annotators do not come from one platform, company, network, or group. This reduces the data bias that a centralized company could face.
IB: One of the most practical use cases is in the realm of natural language. Businesses today are global, and this requires companies to be proficient in providing the same quality of services and products in all languages of the markets they serve. However, much of the best performing LLMs today are mainly English based. We’ve seen companies rely on crowdsourcing for different languages and dialects, not only for AI needs, such as localization of their products.
IB: As long as all the data transactions are recorded onchain, the transparency should be enough to address any supervision and oversight concerns. If the regulators are really concerned about public safety and security, there should be more regulations for centralized entities’ management and use of data. Rather than jumping to conclusions with fear, lawmakers should first learn about the ways that decentralized data sourcing can and is being utilized. If there are malicious intent or uses, then they should step in, instead of issuing umbrella regulations that hurt innovation.
CEO of Synesis One, Isaac Bang
IB: At the moment, we have not witnessed any misuse of the platform. It’s difficult to really see any potential risks that a misuse could impact at the national security level. At the data storage level, Synesis can work with both distributed storage solutions (e.g. IPFS, Arweave) and centralized solutions (e.g. AWS), so it is up to the client. At the data annotation level, everyone goes through peer review and even the peer reviews can be specifically optimized by the client to prevent malicious behavior.
IB: At Synesis, we aim to be the world’s largest digital worker network of specialist and domain specific experts that aid in any AI data needs from companies. We’re already seeing an increase in demand for expert level knowledge for AI training (e.g. fine tuning, RLHF, raw data) as AI is being utilized for more and more use cases. We want to enable any sized company in any domain to be able to scale their AI data needs by tapping into our platform and network of digital experts around the world. This will not only help companies scale, but also bring new opportunities to people around the world to earn money by providing their knowledge and skills online.
IB: Surprisingly, there are a lot of pain points that the mainstream firms have not solved for its workers. One is around payments, as crossborder payments are often expensive and slow. The other main pain point is the lack of transparency. This is a huge advantage for us as our payout system requires no minimum balance, has no fees, and is instant. We’ve onboarded a lot of frustrated digital workers who have used the big players in the web2 data labeling space. As we bring in more and more digital workers of all backgrounds and build out the network, our solutions will become more and more attractive to potential clients.
IB: One of the biggest risks that our users face is the mismatch of knowledge and/or skills needed for certain campaigns. Some of the data campaigns are technical, and if a user does not perform well, the user will not be rewarded well. Everything, including a users’ reputation, is based on the accuracy of the work provided by the users. Some tasks require technical skills/knowledge, or have steep learning curves. So any new user on the platform should expect to spend some time learning how to do some of the campaigns/data tasks. We’re continuously updating and producing new educational and training materials for new and existing users so that we can guide them to perform better. This benefits everyone, as long as the user(s) spend time reading and learning from the material.