Author: accelxr, 1KX; Translation: 0xjs@Golden Finance

The main purpose of current generative models is content creation and information filtering. However, recent research and discussion on AI agents (autonomous actors that use external tools to complete user-defined goals) suggests that AI could potentially unlock substantial benefits if it were provided with economic access similar to the Internet in the 1990s.

To do this, agents need to have agency over assets they can control, as the traditional financial system is not set up for them.

This is where crypto comes into play: crypto provides a digitized payment and ownership layer with fast settlement that is particularly suitable for building AI agents.

In this article, I will introduce you to the concepts of agents and agent architectures, how examples from research demonstrate that agents have emerging properties beyond traditional LLMs, and projects that have built solutions or products around crypto-based agents.

What is an intelligent agent?

AI agents are LLM-driven entities that are able to plan and take actions to achieve goals over multiple iterations.

The agent architecture consists of a single or multiple agents working together to solve a problem.

Typically, each agent is given a personality and has access to a variety of tools that will help them get their work done independently or as part of a team.

The agent architecture differs from the way we typically interact with LLMs today:

Zero-shot prompting is how most people interact with these models: you input a prompt, and the LLM generates a response based on its pre-existing knowledge.

In the agent architecture, you initialize the goal, the LLM breaks it down into subtasks, and then it recursively prompts itself (or other models) to complete each subtask autonomously until the goal is reached.

Single-agent architecture and multi-agent architecture

Single-agent architecture: A language model performs all reasoning, planning, and tool execution on its own. There is no feedback mechanism from other agents, but humans can choose to provide feedback to the agent.

Multi-agent architectures: These architectures involve two or more agents, where each agent can use the same language model or a different set of language models. The agents can use the same tools or different tools. Each agent usually has its own role.

  • Vertical structure: One agent acts as a leader and other agents report to him. This helps organize the output of the group.

  • Horizontal structure: A large group discussion about a task, where each agent can see other messages and volunteer to complete tasks or call tools.

Agent Architecture: Configuration Files

Agents have profiles or personalities that define roles as prompts to influence the behavior and skills of the LLM. This depends largely on the specific application.

Probably many people have used it as a prompting technique today: "You are a nutrition expert. Provide me with a meal plan...". Interestingly, providing a role to LLMs can improve their output compared to the baseline.

The configuration file can be created in the following ways:

  • Handcrafted: Profiles manually specified by a human creator; most flexible, but also time-consuming.

  • LLM-Generated: Use an LLM-generated profile that contains a set of rules around composition and attributes + (optionally) a small number of sample examples.

  • Dataset Alignment: Profiles are generated based on a real-world dataset of people.

Agent Architecture: Memory

The agent’s memory stores information sensed from the environment and uses this information to make new plans or actions. Memory enables the agent to self-evolve and act based on its experience.

  • Unified Memory: Similar to short-term memory achieved through contextual learning/through constant prompting. All relevant memories are passed to the agent with each prompt. Limited mainly by the size of the context window.

  • Hybrid: short-term + long-term memory. Short-term memory is a temporary buffer for the current state. Reflections or useful long-term information are stored permanently in a database. There are several ways to do this, but a common approach is to use a vector database (memories are encoded as embeddings and stored; recall comes from similarity search)

  • Formats: Natural language, databases (e.g. SQL fine-tuned to understand SQL queries), structured lists, embeddings

Agent Architecture: Planning

Complex tasks are deconstructed into simpler subtasks to be solved individually.

No feedback planning:

In this approach, the agent does not receive feedback after taking an action that influences future behavior. An example is Chain of Thought (CoT), where LLMs are encouraged to express their thought process when providing an answer.

  • Single-path reasoning (e.g. zero-pass CoT)

  • Multi-path reasoning (e.g. self-consistent CoT, where multiple CoT threads are spawned and the highest-frequency answer is used)

  • External planner (e.g. Planning Domain Definition Language)

Planning with feedback:

Iteratively refine subtasks based on external feedback

  • Environmental feedback (e.g. game task completion signal)

  • Human feedback (e.g. soliciting feedback from users)

  • Model feedback (e.g. soliciting feedback from another LLM - crowdsourcing)

Agent Architecture: Action

Action is responsible for converting the agent's decision into specific results.

Behavioral goals can take many forms, such as:

  • Task completed (such as crafting an iron pickaxe in Minecraft)

  • Communication (e.g. sharing information with another agent or a human)

  • Environment exploration (e.g. searching its own behavioral space and learning its own capabilities).

Behaviors are typically generated from memory recall or plan following, and the behavior space consists of internal knowledge, APIs, databases/knowledge bases, and external models of the use of itself.

Agent Architecture: Capability Acquisition

For an agent to correctly execute actions within the action space, it must have task-specific capabilities. There are two main ways to achieve this:

  • Through fine-tuning: Train the agent on a dataset of human-annotated, LLM-generated, or real-world example behaviors.

  • No fine-tuning is needed: the innate abilities of LLMs can be used through more sophisticated cue engineering and/or mechanism engineering (i.e., incorporating external feedback or experience accumulation while performing repeated trials).

Examples of Agents in the Literature

Generative Agents: Interactive Simulations of Human Behavior: Instantiate a generative agent in a virtual sandbox environment, showing that a multi-agent system has emergent social behavior. Starting with a single user-specified prompt about an upcoming Valentine's Day party, the agent automatically sends invitations, meets new people, dates each other, and coordinates to attend the party together at a suitable time over the next two days. You can try it yourself using the a16z AI Town implementation.

Description Explained Plan Selection (DEPS): The first zero-shot multi-task agent that can complete over 70 Minecraft tasks.

Voyager: The first LLM-powered lifelong learning agent in Minecraft that continuously explores the world, acquires skills, and makes new discoveries without human intervention. Its skill execution code is continuously improved based on feedback from trial and error.

CALYPSO: An agent designed for the game "Dungeons & Dragons" that can assist the dungeon master in creating and telling stories. Its short-term memory is based on scene descriptions, monster information, and previous summaries.

Ghost in Minecraft (GITM): An average agent in Minecraft with a 67.5% success rate in obtaining diamonds and a 100% completion rate for all items in the game.

SayPlan: LLM-based large-scale task planning for robots using a 3d scene graph representation, demonstrating the ability to perform long-term task planning for robots from abstract and natural language instructions.

HuggingGPT: uses ChatGPT for task planning based on user prompts, selects a model based on descriptions on Hugging Face, and performs all subtasks, achieving impressive results in language, vision, speech, and other challenging tasks.

MetaGPT: Takes input and outputs user stories/competitive analysis/requirements/data structures/APIs/documentation, etc. Internally, there are multiple agents that make up the various functions of a software company.

ChemCrow: An LLM chemical agent designed to accomplish tasks such as organic synthesis, drug discovery, and materials design using 18 expert-designed tools. Autonomously planned and executed the synthesis of an insect repellent, three organocatalysts, and guided the discovery of a new chromophore.

BabyAGI: General infrastructure for creating, prioritizing, and executing tasks using OpenAI and vector databases such as Chroma or Weaviate.

AutoGPT: Another example of a generic infrastructure for launching LLM agents.

Agent Examples in Crypto

(Note: not all examples are LLM based + some may be more loosely based on the agent concept)

FrenRug from Ritualnet: Based on the GPT-4 Turkish Carpet Salesman game { https:// aiadventure.spiel.com/carpet }. Frenrug is a broker that anyone can try to convince to buy their Friend.tech Key. Each user message is passed to multiple LLMs run by different Infernet nodes. These nodes respond on-chain and the LLMs vote on whether the agent should purchase the proposed key. When enough nodes respond, the votes are aggregated, the supervised classifier model decides on the action and passes a proof of validity on-chain, which allows the off-chain execution of multiple classifiers to be verified.

Prediction market agents on Gnosis using autonolas: AI bots are essentially smart contract wrappers around AI services that can be called by anyone by paying and asking questions. The service monitors the request, performs the task, and returns the answer on-chain. This AI bot infrastructure has been extended to prediction markets via Omen, where the basic idea is that agents will actively monitor and bet on predictions from news analysis, ultimately coming up with aggregated predictions that are closer to the true odds. Agents search for markets on Omen, autonomously pay the “bot” for a prediction on the topic, and trade using the market.

ianDAOs GPT<>Safe Demo: GPT uses the syndicateio transaction cloud API to autonomously manage USDC in its own Safe multi-signature wallet on the Base chain. You can talk to it and make suggestions on how to best use its capital, and it may allocate it based on your suggestions.

Game Agents: There are multiple ideas here, but in short, AI agents in a virtual environment are both companions (like AI NPCs in Skyrim) and competitors (like a group of chubby penguins). Agents can automatically execute profit strategies, provide goods and services (e.g. shopkeepers, traveling merchants, experienced generative quest givers), or be semi-playable characters like in Parallel Colony and Ai Arena.

Safe Guardian Angels: Uses a group of AI agents to monitor wallets and defend against potential threats to protect user funds and improve wallet security. Features include automatic revocation of contract permissions and withdrawal of funds in the event of anomalies or hacker attacks.

Botto: While Botto is a loosely defined example of an on-chain agent, it demonstrates the concept of autonomous on-chain artists creating works that are voted on by token holders and auctioned on SuperRare. One can imagine various extensions that employ multimodal agent architectures. ---

Some noteworthy AI projects

(Note: not all projects are LLM based + some may be more loosely based on the agent concept)

AIWay Finder - A decentralized knowledge graph of protocols, contracts, contract standards, assets, functions, API functions, routines + paths (i.e. a virtual roadmap of the blockchain ecosystem that pathfinder agents can navigate). Users will be rewarded for identifying viable paths for agents to use. Additionally, you can mint shells (i.e. agents) that contain character settings and skill activations, which can then be plugged into the pathfinder knowledge graph.

Ritualnet — As shown in the frenrug example above, Ritual infernet nodes can be used to set up a multi-agent architecture. Nodes listen for on-chain or off-chain requests and provide outputs with optional proofs.

Morpheus — A peer-to-peer network of personal general-purpose AIs that can execute smart contracts on behalf of users. This can be used for web3 wallet and tx intent management, data parsing via chatbot interfaces, recommendation models for dapps and contracts, and scaling agent operations via long-term memory that connects application and user data.

Dain Protocol — Exploring multiple use cases for deploying agents on Solana. Recently demonstrated the deployment of a crypto trading bot that pulls on-chain and off-chain information to execute on behalf of the user (e.g. sell BODEN if Biden loses)

Naptha - an agent orchestration protocol with an on-chain task marketplace for contracting agents, operator nodes for orchestrating tasks, an LLM workflow orchestration engine that supports asynchronous messaging across different nodes, and a workflow proof system for verifying execution.

Myshell - AI character platform similar to character.ai where creators can monetize agent profiles and tools. Multimodal infrastructure with some interesting example agents including translation, education, companionship, coding, etc. Contains both simple no-code agent creation and a more advanced developer mode for assembling AI widgets.

AI Arena — A competitive PvP fighting game where players can buy, train, and fight against AI-powered NFTs. Players train their agents NFTs through imitation learning, where the AI ​​learns how to play the game in different maps and scenarios by learning the associated probabilities of the player's actions. After training, players can send their agents into ranked battles for token rewards. Not based on LLM, but still an interesting example of the possibilities of agent-based games.

Virtuals Protocol — A protocol for building and deploying multimodal agents to games and other online spaces. The three main archetypes of virtuals today include IP character mirrors, specific function agents, and personal avatars. Contributors contribute data and models to virtuals, and validators act as gatekeepers. There is an economic incentive mechanism to promote development and monetization.

Brianknows — Provides a user interface for users to interact with agents that can perform transactions, research cryptocurrency-specific information, and deploy smart contracts in real time. Currently supports over 10 operations out of over 100 integrations. A recent example is having an agent stake ETH in Lido on behalf of a user using natural language.

Autonolas — Provides lightweight local and cloud-based agents, consensus-operated decentralized agents, and professional agent economies. Prominent examples include DeFi and prediction-based agents, AI-driven governance representatives, and agent-to-agent tool markets. Provides protocols for coordinating and incentivizing agent operations + the OLAS stack, an open source framework for developers to build collectively owned agents.

Creator.Bid - Provides users with social media persona agents that connect to X and Farcaster real-time APIs. Brands can launch knowledge-based agents to execute brand-aligned content on social platforms.

Polywrap — provides various agent-based products such as Indexer (social media agent by Farcaster), AutoTx (planning and trade execution agent built with Morpheus and flock.io), predictionprophet.ai (prediction agent with Gnosis and Autonolas), and fundpublicgoods.ai (agent for grant resource allocation).

Verification — Since economic flows will be directed by intelligent agents, output verification will be very important (more on this in a future post). Verification methods include those from Ora Protocol, zkML from teams like Modulus Labs+Giza+ EZKL, game theory solutions, and hardware-based solutions like TEEs.

Some thoughts on on-chain agents

  • Ownable, tradable, token-gated agents that can perform a variety of functions, from companionship to financial applications,

  • Agents that can identify, learn, and participate in the game economy on your behalf; or autonomous agents that can act as players in collaborative, competitive, or fully simulated environments.

  • Agents that can simulate real human behavior for profit opportunities

  • Smart wallets managed by multiple agents can act as autonomous asset managers

  • AI-managed DAO governance (e.g. token delegation, proposal creation or management, process improvement, etc.)

  • Use web3 storage or database as a composable vector embedding system for shared and persistent memory state

  • Locally running agents participate in the global consensus network and perform user-defined tasks

  • Knowledge graph of existing and new protocol interactions and APIs

  • Autonomous Guardian Network, Multi-Signature Security, Smart Contract Security and Functionality Enhancements

  • Truly autonomous investment DAOs (e.g., a collector’s DAO using art historian, investment analyst, data analyst, and degen agent roles)

  • Token economics and contract security simulation and testing

  • Generic intent management, especially in the context of crypto UX like bridging or DeFi

  • Art or experimental project

Attracting the next billion users

As Jesse Walden, co-founder of the Varaint Fund, recently said, autonomous agents are an evolution, not a revolution, in how blockchain is used: We already have protocol taskers, sniper bots, MEV seekers, robotic toolkits, etc. Agents are just an extension of all that.

Many areas of crypto are structured in a way that is conducive to agent execution, such as fully on-chain gaming and DeFi. Assuming the cost of LLMs continues to decline relative to task performance + the accessibility of creating and deploying agents increases, it is hard to imagine a world where AI agents do not dominate on-chain interactions and become the next billion users of crypto.

Reading material:

AI Agents That Can Bank Themselves Using Blockchains

The new AI agent economy will run on Smart Accounts

A Survey on Large Language Model based Autonomous Agents (I used this for identifying the taxonomy of agentic architectures above, highly recommend) 

ReAct: Synergizing Reasoning and Acting in Language Models

Generative agents: Interactive simulacra of human behavior

Reflexion: Language Agents with Verbal Reinforcement Learning

Toolformer: Language Models Can Teach Themselves to Use Tools

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents 

Voyager: An Open-Ended Embodied Agent with Large Language Models

LLM Agents Papers GitHub Repo

Original link