AI Agent is undoubtedly the most exciting development line of the current big model. It is called "the next battle of big models", "the last killer product" and "the agent-centric that opens the new industrial revolution era". On November 7, OpenAI's first developer conference (OpenAI DevDay) detonated AI Agent. OpenAI released GPTs, the initial form of AI Agent, and launched the corresponding production tool GPT Builder. Users can generate exclusive GPTs by simply chatting with GPT Builder and describing the desired GPT functions. Exclusive GPT can be more applicable in daily life, specific tasks, work or family. To this end, OpenAI has also opened a large number of new APIs (including vision, image DALL·E3, voice), as well as the newly launched Assistants API, so that developers can more easily develop their own exclusive GPT. Bill Gates recently published an article clearly stating that AI Agent will be popular within 5 years, and every user will have an exclusive AI Agent. Users no longer need to use different APPs for different functional requirements. They only need to tell their Agent what they want to do in everyday language. #AI #AIGC

Within a week after the release of GPTs, more than 17,500

So, what exactly is AI Agent? Why is it so important that the industry has paid so much attention to it, and some scholars even assert that "the development of the American Agent Store will continue to widen the gap between the large models of China and the United States."

What is AI Agent?

In the fields of computer science and artificial intelligence, agent is generally translated as “intelligent body”, which is defined as a software or hardware entity that exhibits one or more intelligent characteristics such as autonomy, responsiveness, sociality, proactiveness, speculation (deliberation), and cognition in a certain environment. [3]

OpenAI defines AI Agent as a system that uses a large language model as its brain, has the ability to autonomously understand, perceive, plan, remember, and use tools, and can automatically perform and complete complex tasks. [4] The basic framework of AI Agent is shown in the figure below:

Agent basic framework based on LLM driver

It has four main modules: memory, planning, action and tool use:

(1) Memory. The memory module is responsible for storing information, including past interactions, learned knowledge, and even temporary task information. For an intelligent agent, an effective memory mechanism can ensure that it can call on past experience and knowledge when faced with new or complex situations. For example, a chatbot with memory function can remember user preferences or previous conversation content, thereby providing a more personalized and coherent communication experience. It is divided into short-term memory and long-term memory: a. Short-term memory, all contextual learning is learned using short-term memory; b. Long-term memory, which provides the agent with the ability to retain and recall (unlimited) information for a long time, usually by utilizing external vector databases and fast retrieval, such as a large amount of data and knowledge accumulated in a certain industry field. With long-term memory, a lot of data can be accumulated, making the agent more usable, with more industry depth, personalization, and specialized capabilities.

(2) Planning. The planning module has two stages: pre-planning and post-reflection. In the pre-planning stage, this involves prediction and decision-making for future actions. For example, when performing complex tasks, the agent breaks down large goals into smaller, manageable sub-goals, so that a series of steps or actions can be efficiently planned to achieve the desired results. In the post-reflection stage, the agent has the ability to check and improve the shortcomings in the plan, reflect on the mistakes and shortcomings, and learn from the lessons to improve them, forming and adding long-term memory to help the agent avoid mistakes and update its understanding of the world in the future.

(3) Tool use. The tool use module refers to the ability of the intelligent agent to use external resources or tools to perform tasks. For example, learning to call external APIs to obtain additional information missing from the model weights, including current information, code execution capabilities, access to proprietary information sources, etc., in order to make up for the weaknesses of LLM itself. For example, if the training data of LLM is not updated in real time, tools can be used to access the Internet to obtain the latest information, or specific software can be used to analyze large amounts of data. There are already a large number of digital and intelligent tools on the market. Intelligent agents use tools more smoothly and efficiently than humans. By calling different APIs or tools, they can complete complex tasks and output high-quality results. This way of using tools also represents an important feature and advantage of intelligent agents.

(4) Action. The action module is the part of the agent that actually executes the decision or response. When faced with different tasks, the agent system has a complete set of action strategies and can choose the actions to be performed when making decisions, such as the well-known memory retrieval, reasoning, learning, programming, etc.

Overall, these four modules work together to enable the agent to take actions and make decisions in a wider range of situations, performing complex tasks in a smarter and more efficient manner.[6]

AI Agent will bring

Wider integration of man and machine

Agents based on large models will not only allow everyone to have an exclusive intelligent assistant with enhanced capabilities, but will also change the mode of human-machine collaboration and bring about a wider range of human-machine integration. The intelligent revolution of generative AI has evolved to date, presenting three modes of human-machine collaboration:

(1) Embedding mode. Users communicate with AI through language, use prompts to set goals, and then AI assists users in achieving these goals. For example, ordinary users input prompts into generative AI to create novels, music works, 3D content, etc. In this mode, AI acts as a tool to execute commands, while humans act as decision makers and commanders.

(2) Copilot mode. In this mode, humans and AI are more like partners, participating in the workflow together and playing their respective roles. AI intervenes in the workflow, from providing advice to assisting in completing each stage of the process. For example, in software development, AI can help programmers write code, detect errors, or optimize performance. Humans and AI work together in this process, complementing each other's capabilities. AI is more like a knowledgeable partner rather than a mere tool.

In fact, Microsoft first introduced the concept of Copilot on GitHub in 2021. GitHub Copilot is an AI service that assists developers in writing code. In May 2023, with the support of big models, Microsoft Copilot ushered in a comprehensive upgrade, launching Dynamics 365 Copilot, Microsoft 365 Copilot, and Power Platform Copilot, and proposed the concept that "Copilot is a new way of working." This is true for work, and life also needs "Copilot". Li Zhifei, the founder of "Chuwenwen", believes that the best job of big models is to be a "Copilot" for humans.

(3) Agent model. Humans set goals and provide necessary resources (such as computing power), then AI independently undertakes most of the work, and finally humans supervise the progress and evaluate the final results. In this model, AI fully embodies the interactive, autonomous and adaptable characteristics of the agent, and is close to an independent actor, while humans play more of a supervisor and evaluator role.

Three ways humans and AI can collaborate

Judging from the previous functional analysis of the four main modules of intelligent agent memory, planning, action and tool use, the intelligent agent mode is undoubtedly more efficient than the embedded mode and co-pilot mode, and may become the main mode of human-machine collaboration in the future.

Based on the human-machine collaboration model of Agent, every ordinary individual has the potential to become a super individual. A super individual has its own AI team and automated task workflow, and establishes a more intelligent and automated collaborative relationship with other super individuals based on Agent. There is no shortage of active exploration of one-person companies and super individuals in the industry. There are some automated teams based on Agents on the Github platform - GPTeam projects. GPTeam uses large models to create multiple intelligent agents with roles and functions, and multiple agents collaborate to achieve predetermined goals. For example, Dev-GPT is a multi-agent collaborative team for automated development and operation and maintenance, which includes product manager Agent, developer Agent, and operation and maintenance agent. This multi-agent team can meet and support the normal operation of a startup marketing company, which is a one-person company. Another example is NexusGPT, which claims to be the world's first AI freelancer platform. [8] The platform integrates various AI native data from open source databases and has more than 800 AI agents with specific skills. On this platform, you can find experts in different fields, such as designers, consultants, sales representatives, etc. Employers can choose an AI agent on this platform at any time to help them complete various tasks.

AI Agents will change the rules of the software game

Promoting AI infrastructure

AI Agent is redefining software. Bill Gates believes that AI Agent will completely subvert the software industry and will affect how we use software and how we write software.

AI Agent will shift the paradigm of software architecture from process-oriented to goal-oriented. Existing software (including APP) fixes the process through a series of predefined instructions, logic, rules and heuristic algorithms to ensure that the software operation results meet the user's expectations, that is, the user follows the instruction logic step by step to achieve the goal. Such a process-oriented software architecture has high reliability and determinism. However, this goal-oriented architecture can only be applied to vertical fields, and cannot be universally applied to all fields. Therefore, how to balance standardization and customization has also become one of the difficulties faced by the SaaS industry.

Software architecture paradigm shift

The AI ​​Agent paradigm gradually shifts the function development that was originally dominated by humans to one driven by AI. With large models as the technical infrastructure and Agent as the core product form, the task hierarchy of instructions, logic, rules and heuristic algorithms predefined in traditional software has evolved into the autonomous generation of goal-oriented intelligent agents. In this way, the original architecture can only solve tasks in a limited range, while the future architecture can solve tasks in an infinite domain. [11] In the future software ecosystem, not only is the agent the medium for interaction with everyone at the top level, but the development of the entire industry, whether it is the underlying technology, business model, intermediate components, or even people's living habits and behaviors, will change around the agent. This is the beginning of the agent-centric era.

Comparison between RPA (Robotic Process Automation) and APA (Agentic Process Automation)

Take ChatDev, the first "big model + agent" SaaS-level product released by Mianbi Intelligence, as an example. The platform is like a software development company composed entirely of AI agents, with various agent roles such as CEO, CTO, development manager, product manager, test specialist, supervisor, etc. Users only need to tell the agent in the CEO role clear requirements, and the CEO will organize the entire software development process based on the user's needs. The final delivery to the user includes the software product and the code of the entire development process, and all processes are automated. [14] This will enable the software industry to reduce production costs, improve customization capabilities, and enter the "3D printing" era of software.

Prospects and Challenges of AI Agents

AI Agent is an important driving force for AI to become infrastructure. Looking back at the history of technological development, the end of technology is to become infrastructure, such as electricity becoming an infrastructure that is not easily perceived by people like air, but is indispensable, and cloud computing. Of course, this has to go through the following three stages: innovation and development stage - new technologies are invented and applied; popularization and application stage - as technology matures, it begins to be widely used in various fields, which has a profound impact on society and economy; infrastructure stage - when technology becomes popular and almost ubiquitous, it turns into an infrastructure and has become an indispensable part of people's daily lives. Almost everyone agrees that artificial intelligence will become the infrastructure of future society. And intelligent agents are promoting the infrastructure of artificial intelligence. This is not only due to the low-cost production advantage of agent software, but also because agents can adapt to different tasks and environments, and can learn and optimize their performance, so that it can be applied to a wide range of fields, and then become the basic support for various industries and social activities.

Overview of AI Agent Applications

Agents may iterate in two directions at the same time in the next step. One is the intelligent agent that assists humans by performing various tasks, focusing on tool attributes; the other is the iteration in the direction of anthropomorphism, which can make independent decisions, has long-term memory, and has certain personality characteristics, focusing on human-like or superhuman attributes.

From the perspective of technical optimization, iteration and implementation, the development of AI Agents also faces some bottlenecks:

First, we can see from OpenAI's GPTs that LLM's complex reasoning ability is not strong enough, and the latency is too high, which inhibits the real maturity of Agent applications. This is also the direction of engineering optimization and technological research breakthroughs in the industry.

Secondly, the development of multi-agents still faces great difficulties. Multi-agents are a very complex academic research direction. As agents begin to spread to the mass market, they have become an important technical reality. For example, Stanford's virtual town includes multi-agent research of 25 agents. However, after the town framework was open sourced, according to the developer's test, an agent needs to consume tokens worth $20 a day because it needs to remember and think a lot about actions. This price is higher than that of many human workers, and requires subsequent optimization of the agent framework and LLM reasoning side.

Breaking through the development dilemma of multi-agents is an important prerequisite for the establishment of the future agent society. Multi-agent collaboration can form the highest form of technological social system, the agent society. The agent society is complex, dynamic, self-organizing and adaptive, and can collaborate, compete and evolve continuously. In this social system, agents can perform complex and flexible tasks according to goals and environmental changes, and interact and collaborate with humans and other agents at a high level and in multiple dimensions. The agent society not only helps humans explore and expand the physical and virtual world, but also enhances and expands human capabilities and experiences.

At the same time, these development trends indicate that AI Agents may face many challenges such as security and privacy, ethics and responsibility, and economic and social employment impacts.

(1) Security and privacy are key characteristics of intelligent agents, and are essential for their stable operation and the protection of users and society. These two factors directly affect the trust and control of AI agents. If AI agents have vulnerabilities, are attacked, or have data leaks, they may cause damage to users or society. For example, OpenAI's GPTs had a security vulnerability shortly after its release, which led to the leakage of user-uploaded data.

(2) Ethics and responsibility are the core principles of intelligent agents, which determine their values ​​and goals, as well as their respect and protection for users and society. These principles directly affect the credibility and controllability of intelligent agents. If intelligent agents exhibit problems such as unfairness, opacity, or unreliability, they may cause users or society to reject the technology. Responsibility attribution is also a key issue for intelligent agents. Unclear or unfair attribution of responsibility in the collaboration between humans and intelligent agents can also have serious consequences.

(3) Economic and social impact on employment. An important challenge in future work is the competition between humans and intelligent agents. For example, the emergence of the AI ​​freelance platform NexusGPT is an impact on traditional freelancers. In the future social work collaboration, more and more intelligent agents will appear. Employers may try to reduce manpower input based on efficiency and effectiveness considerations. As intelligent agent technology matures, we must think ahead to the long-term impact of these technological developments on society and personal careers.

With the release of ChatGPT as a watershed, the number and income of writers/editors on freelance platforms around the world have entered a cliff-like decline.