Original title: Which Platform Builds the Best AI Agents? We Test ChatGPT, Claude, Gemini and More
Original author: Jose Antonio Lanz
Original source: https://decrypt.co/
Compiled by: Daisy, Mars Finance
Which platform can create the best AI agents? We tested ChatGPT, Claude, Gemini, and other platforms.
Hands-on comparisons of the five leading platforms reveal which is best suited to host your future AI agent in everyday scenarios.
AI agents can accomplish many things: from searching information in your document library, writing code, scraping web data, to providing insights and deep analysis of complex data, and even more. You can also create a virtual office composed of a team of AI agents focused on different tasks, working together like a professional digital employee team.
But just how hard is it? If an average person wants to create their own AI financial advisor, for example, without relying on APIs, without needing strange coding, and without using GitHub, which platform can provide the best support? We just want to see how these top AI companies perform in helping ordinary users create AI agents without requiring advanced technical skills.
Of course, you get what you pay for. In this case, we also wanted to see if there is a correlation between how easy it is for an ordinary person to set up an agent and the quality of results delivered by each platform.
Our experiment compared the top five platforms: ChatGPT, Claude, Huggingface, Mistral AI, and Gemini. Each platform received the same basic instruction to create a financial advisor.
The test focused on the platform's out-of-the-box capabilities. The emphasis was on whether the agents could handle a common scenario—in this case, helping someone balance a $25,000 investment with $30,000 in debt. We also wanted to see their ability to analyze trading charts. We avoided using additional tools to enhance the agents' productivity and instead tried to take the simplest approach.
In short, here are our findings and model rankings:
Platform Rankings
1) OpenAI's GPT (8.5/10)
Ease of Setup: 4/5
Result Quality: 4.5/5
ChatGPT is the most balanced platform, offering complex agent creation options, both guided and manual, capable of meeting the needs of complete novices and users with some experience.
Despite recent interface updates burying some features in menus, the platform excels at translating complex user needs into functional agents. We tested the model by building a financial advisor, and the results showed that this agent has excellent contextual awareness and structured problem-solving capabilities, providing detailed and coherent strategies for debt management and investment allocation.
2) Google Gemini (7/10)
Ease of Setup: 4/5
Result Quality: 3/5
Gemini stands out with its refined, intuitive interface and excellent error handling. While it requires more detailed prompts for the best results, its literal interpretation of instructions creates consistent and predictable outcomes.
The agent's consultative approach when providing financial advice emphasizes the importance of collecting context before making recommendations, similar to professional practice. However, it may be overly conservative in zero-shot responses.
3) HuggingChat (6.5/10)
Ease of Setup: 2/5
Result Quality: 4.5/5
This open-source platform offers unparalleled customization and model selection options. It's a great choice for those seeking granular control over every detail, but may not be suitable for users seeking convenience. (It can be compared to the difference between Linux and macOS.) Its complex timeframes and practical tool integrations showcase its advanced capabilities.
We built a pure agent without any extra features. We used Nvidia's Nemomotron as the foundational large language model, with output quality sufficient to rival ChatGPT. Not bad for the open-source camp.
4) Claude (5.5/10)
Ease of Setup: 2.5/5
Result Quality: 3/5
Anthropic's platform excels in specific areas, especially in tasks that require extensive context processing and code parsing. Its minimalist interface conceals its complex capabilities, but the 'optional' instruction field may confuse users.
Our agent is very conservative and vague when providing advice but demonstrates good risk awareness and strategic thinking. It needs more careful prompts to truly unleash its potential, but if the testing used adaptive prompts, it would contradict the premise of assuming similar conditions, so it wouldn’t be fair.
5) Mistral AI (5/10)
Ease of Setup: 2.5/5
Result Quality: 2.5/5
This French platform offers unique example-based learning and deep customization options. However, its developer-oriented interface and occasional language-switching issues create barriers for non-technical users. It also requires modifications to the agent's configuration to accommodate different models for tasks like analyzing images or processing code. This is less than ideal.
The financial advisor shows potential in interaction design but struggles with basic mathematical validation, yielding the worst output. It's not that the output is poor, but in zero-shot testing, it is the least satisfying.
In-depth Analysis
Considering the previous rankings, there is no one-size-fits-all solution, as all platforms have their respective strengths and weaknesses. With some focused and careful prompt customization, the results of one platform may differ and even surpass those of others. Ultimately, all language models (LLMs) have their own distinct prompting styles.
If you want to understand more about the reasoning behind our rankings, here is a more in-depth analysis of our experiences and agent results. We configured all agents with the same system prompt, with no additional parameters or features, and asked them the same basic question: 'I have $25K to invest and $30K in debt. Create a financial plan for me.'
OpenAI
ChatGPT's interface was recently updated, which actually made operations more complex. The GPT creation options are now hidden in menus, but once found, it offers two paths: one is a conversational setup where the AI helps build your agent; the other is a manual configuration for those who know exactly what they want.
OpenAI's GPT platform is a fully functional 'Swiss Army Knife'—it can read code, search the web, and handle image generation and analysis. The AI-guided setup process makes it particularly suitable for beginners, although it may feel somewhat limiting for advanced users who need fine control. (For example, if you ask the model to be more specific or detailed, it may change the entire system prompt, resulting in worse outcomes.)
When using the agent, ChatGPT is very straightforward, with a clear and easy-to-understand interface.
These agents can natively read documents and understand images, giving them an edge over other platforms.
Now, let's discuss the quality of agents you can create with basic prompts. The financial advisor we created, MoneyGPT, demonstrated a master's course in structured problem-solving, performing quite impressively.
In addition to its precise fund allocation—'$20,000 for high-interest debt' and detailed portfolio breakdown—the agent also demonstrates complex financial reasoning. It provides a five-step roadmap that is not just a checklist but a coherent strategy, considering short-term needs and long-term planning.
The advantage of this agent lies in its ability to balance detail and context. While it recommends a specific portfolio (40% in S&P 500, 30% in bonds), it also explains the reasoning behind the recommendation: 'Paying off high-interest debt is like getting a guaranteed return on investment.' This contextual awareness extends to long-term planning, recommending regular review cycles and adjusting strategies according to changing circumstances.
However, this richness of information also exposes a potential weakness: users may feel overwhelmed by too much detail provided all at once. While it is technically very comprehensive, the quickly delivered specifics of allocations, investment strategies, and monitoring plans may seem daunting for financial newcomers.
Overall, Google's Gemini agent creation platform stands out aesthetically, with a refined, intuitive interface that makes the agent creation process almost feel overly simple. The system's literal interpretation of instructions helps avoid confusion, and its clean user interface eliminates the feeling of oppression often associated with AI development.
However, to obtain quality results from it, it requires more detailed prompts. It will not take things for granted: brief prompts yield low-quality responses.
In the background, it has powerful capabilities—web search integration supported by Google, code analysis, and image processing capabilities, comparable to ChatGPT's functionalities, but mostly relies on Microsoft's technology.
Gemini's user interface feels like it was designed by someone who truly understands user experience. The interface guides users with clear labels, and all information can be displayed on one screen.
This refined approach makes it particularly appealing to novice users, although experienced users may find it lacking in finer control.
We named our agent MoneyGem and asked it to provide a financial plan. Its consultative approach showcases Google’s unique problem-solving method. It didn’t give a direct answer but first asked questions like 'What type of debt is this?' and 'What is your interest rate?'—showing that it understands financial advice is not one-size-fits-all.
It emphasizes collecting background information before providing advice, which aligns with professional financial planning practices, although this might frustrate users seeking quick answers.
A zero-shot response is not helpful. The agent essentially states that it does not understand the user and cannot provide good financial advice. After asking it to make assumptions and forcing it to provide a plan suitable for most scenarios, the agent generated a very conservative draft plan but did not provide specific investment advice.
However, MoneyGem ultimately suggested maximizing tax-advantaged accounts, such as 401(k) or Roth IRA, to reduce tax burdens. Not bad.
You can click here to see our interaction with MoneyGem and try out the model for yourself by clicking this link.
Mistral AI
The setup process for Mistral's agent is somewhat complicated, straying away from simplicity. The agent creation tool is hidden in its developer console, with deep customization options that may confuse beginners but please those who enjoy tinkering.
Its agent building interface is not part of LeChat (the chat interface), but once the agent is created, it will appear there.
One aspect we particularly liked was the ability to shape the agent’s behavior and response style through example inputs, which is a feature not offered by other platforms at the moment. However, there was a strange bug: during the agent creation, the UI suddenly switched to French, possibly due to the company being French. In any case, we could not switch back to English or Spanish.
Once the agent is created, users must call it in the normal chat interface to use it. Users need to exit Le Plateforme and enter Le Chat, which is not the most intuitive operation. However, the UI for using the agent is quite straightforward and feels like other AI chatbots.
We created our agent and named it Le Money, as a nod to Mistral's French roots. Its performance clearly demonstrated Mistral's general approach to problem-solving. It suggested 'keeping $10,000 as emergency funds, using $15,000 to pay off debt, and investing $10,000,' which seems straightforward but also indicates that the agent lacks some basic mathematical validation.
A total of $35,000 exceeded the available funds by $10,000, which is a basic error that certain language models may make when prioritizing conceptual correctness over numerical accuracy.
However, we must point out that the best-performing LLMs have already shown significant improvements and do not make such errors frequently—at least not as frequently as Mistral.
Aside from that, Le Money's plan is not very detailed, but it is the only agent that provides follow-up questions, which can make the interaction smoother and help it better understand the user's needs.
LeMoney's complete plan can be viewed here, and the agent can be tested here.
Anthropic
Claude's project feels less like an agent creation platform and more like a complex task execution system. The interface is minimalist, almost too minimalist, and not very intuitive.
This minimalist interface may confuse some users. The platform provides a basic setup and has an 'optional' instruction field that feels both unimportant and crucial: if the instruction is marked as optional, how does the AI agent know what it should do?
Its minimalist interface feels somewhat strange, but Anthropic has never been known for its UI design. The same window used to configure the model is also used to issue prompts to it. Its functionality mainly focuses on text code interpretation, with no other features. Web searching, image processing, and generation are advanced features left to competitors by Anthropic.
Our agent, named MoneyClaude, cannot be publicly tested because Anthropic does not allow it. It takes a very conservative stance when providing financial advice, and while the responses are technically accurate, the content is very vague—such as, 'Balancing debt reduction and necessary savings.'
It asked for more information, but at least without that information, it provided a very general strategy without further interaction, which seems more ideal than Google's approach.
Hugging Face
This open-source platform stands out, being a paradise for advanced users—but a potential nightmare for beginners. It is the only platform that allows users to choose their preferred language model, offering unprecedented control to define the agent's foundation.
Additionally, users can integrate dozens of different tools into their agents, but only three can be activated at a time. This limitation forces users to carefully consider which features are most important for each specific use case, a capability that no other model can provide.
It is the most customizable experience among all interfaces, with many adjustable settings. As a result, this platform can create more powerful and professional agents than its competitors, but only in the hands of someone who fully understands how to operate it.
Users can try their agents on HuggingChat—undoubtedly a dream for advanced users. Once the agent is created, it’s very easy to use. The interface displays a large card containing the agent's name, description, and photo. It also allows users to share the agent's link and adjust its settings, all of which can be done directly on the card.
After putting our HuggingMoney agent to the test, we found its way of handling timeframes demonstrated a deeper understanding of the psychology of financial planning. It divides planning into 'short-term (0-24 months), medium-term (24-60 months), and long-term (over 60 months),' which aligns with professional financial planning practices.
The agent suggests investing '$0-$5,000 in highly liquid, low-risk instruments' while maintaining a monthly 'active debt repayment of $1,000-$1,500.' At first glance, this recommendation shows a detailed understanding of cash flow management.
Another interesting feature is that it combines practical tools with theoretical advice. In addition to recommending the 50/30/20 rule, it also suggests specific budgeting applications and emphasizes tax optimization—bridging the gap between high-level strategy and day-to-day execution. The main downside? It assumes debt interest rates without seeking confirmation.
To provide useful advice, it too readily assumes many things. This issue, the impulse to want to respond regardless, can be solved with more precise prompts, but it is worth noting.