Author: YBB Capital Researcher Zeke
One, Starting from the Novelty and Obsolescence of Attention
Over the past year, due to the narrative gap at the application layer, which could not match the explosion of infrastructure, the crypto field has gradually turned into a game for capturing attention resources. From Silly Dragon to Goat, from Pump.fun to Clanker, the novelty and obsolescence of attention have led to this competition becoming increasingly insular. Starting with the most clichéd eye-catching monetization, it quickly evolved into a unified platform model for attention seekers and providers, where silicon-based beings became the new content providers. Amidst the myriad carriers of Meme Coin, a presence has finally emerged that can achieve consensus between retail investors and VCs: AI Agent.
Attention is ultimately a zero-sum game, but speculation can indeed promote the wild growth of things. In our article about UNI, we reviewed the beginning of the last golden age on blockchain, where the rapid growth of DeFi originated from the LP mining era initiated by Compound Finance. The primitive game on-chain during that period involved entering and exiting various mining pools with APYs in the thousands or even tens of thousands, although the final situation resulted in the collapse of various pools. However, the frenzied influx of gold miners did indeed leave unprecedented liquidity in blockchain, and DeFi eventually transcended pure speculation to form a mature track, meeting users' financial needs in payments, trading, arbitrage, staking, and other aspects. AI Agents are currently also experiencing this wild phase, and we are exploring how Crypto can better integrate AI, ultimately promoting the application layer to new heights.
Two, How Agents Can Be Autonomous
In our previous article, we briefly introduced the origin of AI Meme: Truth Terminal, and the outlook for AI Agents in the future, this article focuses first on the AI Agent itself.
Let’s first start with the definition of AI Agents. In the AI field, the term Agent is relatively old but lacks a clear definition, emphasizing mainly on Autonomous capability, meaning any AI that can perceive the environment and react can be referred to as an Agent. In the current definition, AI Agents are closer to intelligent agents, which set a system for large models to imitate human decision-making. In academia, this system is viewed as the most promising path to AGI (Artificial General Intelligence).
In the early versions of GPT, we could clearly sense that large models resembled humans, but when answering many complex questions, they could only provide vague answers. The fundamental reason is that the large models at that time were based on probability rather than causality, and they lacked the abilities of humans, such as using tools, memory, and planning, which AI Agents can complement. So, to summarize with a formula, AI Agent = LLM (Large Model) + Planning + Memory + Tools.
Prompt-based large models are more like a static person; they only come to life when we input them, while the goal of agents is to be more like a real person. Currently, the agents in the circle are mainly fine-tuned models based on Meta's open-source Llama 70b or 405b versions (with different parameters), possessing the ability to remember and use API access tools, while in other aspects they may still require human help or input (including interaction and collaboration with other agents). Therefore, we can see that the main agents in the circle still exist as KOLs on social networks. To make agents more human-like, we need to integrate planning and action capabilities, with the sub-item of thinking chains in planning being particularly crucial.
Three, Chain of Thought (CoT)
The concept of Chain of Thought (CoT) first appeared in Google's 2022 paper (Chain-of-Thought Prompting Elicits Reasoning in Large Language Models), which pointed out that enhancing the model's reasoning ability can be achieved by generating a series of intermediate reasoning steps to help the model better understand and solve complex problems.
A typical CoT Prompt contains three parts: instructions for a clearly defined task, logical basis supporting the theoretical foundation or principles for task resolution, and specific solution examples demonstrated. This structured approach aids the model in understanding task requirements, gradually approaching the answer through logical reasoning, thereby improving the efficiency and accuracy of problem-solving. CoT is particularly suitable for tasks that require in-depth analysis and multi-step reasoning. For simple tasks, CoT may not bring significant advantages, but for complex tasks, it can significantly enhance model performance by reducing error rates through step-by-step solving strategies, improving the quality of task completion.
CoT plays a key role in constructing AI Agents. AI Agents need to understand the information received and make reasonable decisions based on it. CoT helps Agents effectively process and analyze input information by providing an orderly way of thinking, transforming the analysis results into specific action guidelines. This method not only enhances the reliability and efficiency of Agent decision-making but also increases the transparency of the decision-making process, making Agent behavior more predictable and traceable. By breaking tasks down into multiple small steps, CoT helps Agents carefully consider each decision point, reducing erroneous decisions caused by information overload. CoT makes the decision-making process of Agents more transparent, allowing users to more easily understand the reasoning behind Agents' decisions. In interactions with the environment, CoT allows Agents to continuously learn new information and adjust their behavioral strategies.
CoT, as an effective strategy, not only enhances the reasoning ability of large language models but also plays a crucial role in building smarter and more reliable AI Agents. By utilizing CoT, researchers and developers can create intelligent systems that are better suited to complex environments and have a high degree of autonomy. CoT showcases its unique advantages in practical applications, particularly when handling complex tasks, by breaking down tasks into a series of smaller steps, which not only improves the accuracy of task resolution but also enhances the interpretability and controllability of the model. This step-by-step problem-solving approach can significantly reduce erroneous decisions arising from information overload or complexity when facing complicated tasks. Furthermore, this method also improves the traceability and verifiability of the entire solution.
The core function of CoT is to combine planning, action, and observation, bridging the gap between reasoning and action. This thinking mode allows AI Agents to devise effective countermeasures when predicting possible anomalies and accumulate new information while interacting with the external environment, verifying pre-set predictions and providing new reasoning bases. CoT acts like a powerful engine of precision and stability, helping AI Agents maintain efficient working efficiency in complex environments.
Four, The Right Pseudo-Demand
What aspects of AI technology stack should Crypto integrate with? In last year’s article, I believed that the decentralization of computing power and data is a key step in helping small businesses and individual developers reduce costs. This year, in the detailed segmentation of Crypto x AI compiled by Coinbase, we saw more detailed classifications.
(1) Computing Layer (referring to networks focused on providing GPU resources for AI developers);
(2) Data Layer (referring to networks that support decentralized access, orchestration, and verification of AI data pipelines);
(3) Middleware Layer (referring to platforms or networks that support the development, deployment, and hosting of AI models or agents);
(4) Application Layer (referring to user-facing products utilizing on-chain AI mechanisms, whether B2B or B2C).
In these four layers of classification, each layer has a grand vision, and the goal can be summarized as combating the dominance of Silicon Valley giants in the next era of the internet. As I mentioned last year, do we really have to accept the exclusive control of computing power and data by Silicon Valley giants? The closed-source large models under their monopoly are like black boxes; science, as the most believed religion of humanity today, will see every sentence answered by future large models regarded as truth by a significant portion of people. But how should this truth be verified? According to the vision of Silicon Valley giants, the permissions ultimately possessed by agents will be beyond imagination, such as having the payment rights to your wallet, the rights to use terminals, and how to ensure there are no malicious intentions?
Decentralization is the only answer, but sometimes do we need to reasonably consider how many payers there are for these grand visions? In the past, we could offset the errors brought by idealization through Tokens without considering the commercial closed loop. However, the current situation is very severe; Crypto x AI needs to be designed in conjunction with real-world conditions. For example, how to balance the supply sides under performance loss and instability at the computing layer to achieve competitiveness with centralized clouds? How many real users will data layer projects have, how to verify the authenticity and validity of the provided data, and what kind of customers need this data? The other two layers are similar; in this era, we do not need so many seemingly correct pseudo-demands.
Five, Meme has emerged from SocialFi
As I mentioned in the first paragraph, Meme has already rapidly emerged in a way that aligns with the SocialFi form of Web3. Friend.tech is the Dapp that fired the first shot in this round of social applications, but unfortunately, it failed due to a hasty Token design. Pump.fun has verified the feasibility of pure platformization, not creating any Tokens and not establishing any rules. The unification of attention seekers and providers allows users to post memes, go live, issue tokens, leave messages, and trade freely on the platform, with Pump.fun only charging a service fee. This is fundamentally consistent with the attention economy model of current social media like YouTube and Instagram, just that the charging targets are different, and in terms of gameplay, Pump.fun is more Web3.
Base's Clanker is the culmination, benefiting from the integrated ecosystem personally managed by the ecology, Base has its own social Dapp as an auxiliary, forming a complete internal closed loop. The agent Meme is the 2.0 form of Meme Coin; people always seek novelty, and Pump.fun happens to be at the forefront of the trend. From a trend perspective, it is only a matter of time before silicon-based beings replace carbon-based beings' vulgar memes.
I have mentioned Base countless times, but the content differs each time. From a timeline perspective, Base has never been a pioneer but has always been a winner.
What else can agents be?
From a pragmatic perspective, agents are unlikely to be decentralized for a long time to come. Based on the traditional AI field's construction of agents, it is not a simple reasoning process that decentralization and open sourcing can solve. It requires accessing various APIs to access Web2 content, and its operating costs are very high. The design of thinking chains and collaboration among multiple agents usually still depends on a human as a mediator. We will experience a long transition period until a suitable integration form emerges, perhaps similar to UNI. But like the previous article, I still believe that agents will have a significant impact on our industry, just as Cex's existence is in our industry—incorrect but very important.
The (AI Agent Overview) article released last month by Stanford & Microsoft describes extensively the applications of agents in the medical industry, smart machines, and virtual worlds. In the appendix of this article, there are already numerous experimental cases of GPT-4V participating as agents in the development of top-tier AAA games.
We need not be overly demanding about the speed of its integration with decentralization; I hope that the first puzzle agents complete is the bottom-up capability and speed. We have so many narrative ruins and blank metaverses that need filling, and at the right stage, we can consider how to make it the next UNI.
Reference materials
What kind of ability is the 'emergent' thinking chain of large models? Author: Brain Extreme Body
Understanding Agents in One Article, The Next Stop for Large Models Author: LinguaMind