Credit:JP Sanday, Steve Sloane, Naomi Philosopher Ionita, Derek Xiao

Compiled by: TechFlow

Every job in the economy can be viewed as a collection of tasks performed by both humans and machines. Over the years, software has gradually taken on more and more tasks, but even today, humans are still responsible for the vast majority of business processes. In every functional area, personnel costs far outweigh software expenditures.

AI agents are poised to decisively change this balance of work. Unlike the software of the past, which primarily handled low-level, sequential, and rote tasks, new cognitive architectures enable agents to dynamically automate end-to-end processes. This is not just AI that can read and write, but AI that can determine the logical flow of an application and take actions on your behalf.

They are the biggest opportunity for Big Language Models (LLMs) in the enterprise today. In another post, we discussed the definition of these new “agents” and the design patterns that make them possible. Here, we’ll explore how they can be applied in the enterprise, driving a new era of enterprise automation.

Robotic Process Automation (RPA) Reappears?

If this sounds familiar to you, it’s because companies like UiPath and Zapier have been selling a similar vision under the name “robotic automation” for the past decade.

UiPath was the first to market. The core business of this Robotic Process Automation (RPA) giant is through screen scraping and GUI automation, which enables "robots" to record user actions and mimic those sequential steps to automate processes such as extracting document information, moving folders, filling out forms, and updating databases.

Later, iPaaS providers like Zapier emerged with a more lightweight “API automation” approach to improving productivity. The platform offers more robust automation through pre-built API integrations and webhooks, though this approach limits a company’s scope to web application automation, whereas UiPath is able to automate processes across different software, including those that may not support APIs.

UiPath and Zapier demonstrate the need for composable, horizontal, rules-based automation platforms that can address the enterprise’s long-tail processes within and outside of departmental or industry-specific software systems. However, as enterprises scale robotics-based automation, gaps between the capabilities of these traditional architectures and the autonomy they promise begin to emerge, particularly in the following areas:

  • It’s (still) labor-intensive and manual. Despite all the talk about robotics and automation, the process of building and maintaining automation is still cumbersome. In fact, for every $1 UiPath makes, $7 goes to implementation and consulting partners like EY, making deployment and maintenance cycles long and expensive.

  • UI automation is fragile or has limited API integrations. UI automation often breaks when the software's UI changes, while APIs, while more stable, have fewer integrations, especially for legacy or on-premises software.

  • Difficulty processing unstructured data. Unstructured and semi-structured data make up 80% of enterprise data, but sequence-based automation has little ability to process it intelligently. Intelligent document processing (IDP) solutions like Hyperscience and Ocrolus have attempted to make progress here, but still struggle with edge cases and exception handling when dealing with simple “extract and transform” document use cases.

Furthermore, even when traditional RPA and iPaaS solutions attempt to integrate large language models (LLMs), they are still limited by their deterministic architectures. Currently, UiPath’s AI solutions Autopilot and Zapier’s AI Actions only use LLMs in sub-agent design modes, such as (1) text-to-action, or (2) nodes for semantic search, synthesis, or one-off generation.

These AI capabilities are indeed powerful. They enable the business rather than IT to own automation rules, enabling more powerful object detection and recognition via Visual Transformer rather than OCR, and powerful data extraction and transformation via RAG. However, they still fall short of LLM’s more transformative use cases in process automation, which we’ll explore next.

The role of AI agents as decision-making engines

Agents are fundamentally different. They sit at the heart of the application’s control flow as decision engines, in stark contrast to the hard-coded logic of today’s RPA bots and even the RAG applications that defined the first wave of the generative AI revolution. They enable adaptability, multi-step operations, complex reasoning, and robust exception handling for the first time.

Let’s use an example of invoice reconciliation to illustrate the impact. Below is a simplified flow chart showing how a new invoice PDF is matched to a company’s general ledger (similar to the visual modeling that implementation engineers do for RPA):

Clearly, the complexity of the workflow grows rapidly, and it is nearly impossible to cover all relevant edge cases and exceptions in the first three decision sets. Often, the RPA robots tasked with mechanically executing this workflow will make mistakes and report partially matched or missing entries to humans - which may explain why most companies today still hire hundreds of employees per month to complete this task rather than automate this highly manual process.

However, when applied to the same workflow, the agent’s performance was much higher, enabling the following capabilities:

  • Adapt to new environments. Agents can intelligently recognize and adapt to new data sources, invoice formats, naming conventions, account numbers, and even policy changes based on basic reasoning and business context, all without the need for reprogramming or reliance on explicit standard operating procedures (SOPs).

  • Support for multi-step actions. When an invoice amount does not match, the agent can perform a multi-step investigation, such as scanning the supplier's recent emails for possible price change notifications.

  • Ability to reason in complex ways. For example, a company needs to reconcile an invoice from an international supplier with its ledger. This process involves considering multiple factors, including invoice currency, ledger currency, transaction date, exchange rate fluctuations, cross-border fees, and bank fees, all of which must be retrieved and calculated together to complete the payment reconciliation. An agent is capable of performing this type of intelligent operation, while an RPA robot may simply escalate the issue to a human.

  • Dealing with uncertainty. The agent is able to deal with uncertainty, for example by leveraging contextual clues (such as matching total order value and the timing and frequency of historical invoices) to cope with rounding errors or unreadable numbers for individual items.

Current state of the AI ​​agent market

Intelligent agents are no longer just a concept in science fiction. Although the field is still developing, companies ranging from startups to Fortune 500 companies are already purchasing and using these systems on a large scale.

The current market landscape of intelligent agents can be presented in two key dimensions:

  • Domain-specific: This includes highly specialized agents designed for vertical industries like healthcare or sectors like customer support, as well as horizontal agent platforms with broad, general capabilities.

  • Large language model autonomy: This indicates the ability of the language model to independently plan and guide the application logic.

These two factors form the two axes of the AI ​​agent market map we are studying, as shown below.

In the upper right corner of the market map, the most general and scalable agents include:

  • Enterprise-grade agents. Scalable agent platforms enable enterprises to build and manage agents across multiple functions and workflows through natural language SOPs or rules similar to new employee handbooks. These platforms are particularly attractive to centralized IT purchasers who want broadly applicable agent capabilities rather than separate solutions for each business unit. For example, the core processing capabilities of Sema4's invoice reconciliation agent can be used for a variety of data validation tasks in finance, procurement, and operations.

Despite this, most enterprise-grade agents adopt an “agent on rails” architecture, which requires agents to base each new process on a set of predefined actions, business context, and safeguards for a specific workflow. While some data infrastructure can be shared across workflows, the broad nature of these platforms comes more from cumulative use cases than human-like versatility. As a result, some players in the space have begun to focus on specific areas to gain greater product and go-to-market advantages (e.g., Brevian focuses on customer support and security, and Ema focuses on sales and support).

  • Browser Agents. Web agents such as MultiOn, Induced, and Twin represent another broad and generalizable agent type. Most adopt a “general AI agent” design, leveraging visual Transformer models trained on a variety of software interfaces and their underlying code bases. This enables the agent to “understand” web page components and their functions and interactions, thereby automating web browsing, visual user interface manipulation, and text input.

However, while these agents have improved in generality, they often sacrifice consistency. Currently, most agents focus on simple productivity or e-commerce applications and strive to achieve enterprise-grade performance. In the absence of more restricted problem spaces and appropriate data support and protection measures, more reliable browser agents must overcome some key challenges, such as managing complex action and observation spaces, maintaining context across multiple pages, and interpreting diverse web interfaces.

  • AI-powered services. Enterprise demand for agent capabilities currently outstrips the ability of customers to produce them themselves, especially since “agents in orbit” designs require extensive data infrastructure and protections to be effective in practice. This is where companies like Distyl and Agnetic come in, filling the gap by offering upfront engineering services like an “AI Palantir.” Similar to Palantir’s Foundry, these companies can reuse modular system infrastructure across different customers to gradually rebalance the platform-to-service ratio.

But not all agents are broad and generalizable. We are increasingly seeing the emergence of domain- and workflow-specific agents that improve reliability by limiting the types of problems they solve:

  • Vertical Agents. The most promising opportunities for vertical agents are in manual, program-driven processes that are currently handled by humans following standard operating procedures (SOPs) or rulebooks. Many enterprises have outsourced these functions to business process outsourcing (BPO) firms or contractors. These tasks are often too complex for rules-based automation, but not challenging or differentiated enough to justify the need for in-house knowledge workers. Major categories include customer support, recruiting, certain software development tasks such as code review, testing and maintenance, cold sales outbound calls, and security operations.

  • AI assistants. Another way to narrow the focus of an agent is through task specificity rather than domain specificity. Rather than taking on complex end-to-end processes like enterprise and vertical domain agents, AI assistants perform simpler, more productivity-focused tasks. Common basic tasks include simple web research, knowledge extraction, summarization, and unstructured data transformation for ad hoc tasks, such as chat PDFs or extracting feature requests from Gong transcriptions.

Finally, it’s worth noting that there are a wide range of generative AI solutions that, while not agents themselves, compete on budget with agent solutions and sometimes even participate in the same workflows. These solutions are mainly built on RAG architectures and are not in the application control flow, so they cannot fully simulate the human reasoning of the agent. However, their capabilities can still significantly increase service automation while providing control to the enterprise.

  • Vertical AI. Semantic search and unstructured data transformation are powerful foundational capabilities in vertical workflows. For example, healthcare AI automation platform Tennr extracts unstructured data from faxes, PDFs, phone calls, and other messy sources and feeds it into a clinic’s EHR system to streamline referral processing and reduce the need for staff to manually enter data. Industrial AI is another example, taking a similar approach to automate manufacturers’ quotation processes.

  • RAG as a Service. RAG as a service companies like Danswer and Gradient are the horizontal counterparts to vertical semantic search and unstructured data transformation companies, offering customers the ability to query unstructured data sources (such as PDFs), extract the data, and enter the results into a more structured database or system of record.

  • Enterprise search. Glean, Perplexity, and Sana provide semantic querying for the purpose of indexing and retrieving relevant documents, thereby better managing knowledge within an organization and breaking down enterprise data silos.

The future of enterprise automation

The second wave of generative AI will be defined by agents that can think and act on behalf of humans, not just reading and writing. As these architectures mature, they will become a powerful catalyst for AI to take over the service industry. At Menlo, we are excited to meet the teams that are building this future. If you are developing in the field of artificial agents, we would love to talk to you.

JP Sanday (jp@menlovc.com)

Steve Sloane (steve@menlovc.com)

Naomi Ionita (naomi@menlovc.com)

Derek Xiao (derek@menlovc.com)