Caroline Bishop Nov 18, 2024 17:02
Mistral AI introduces Pixtral Large, a 124B multimodal model with advanced capabilities in image and text understanding, outperforming competitors in various benchmarks.
Mistral AI has announced the launch of Pixtral Large, a groundbreaking 124 billion parameter open-weights multimodal model, building upon the capabilities of Mistral Large 2. This latest model showcases advanced image understanding, particularly in processing documents, charts, and natural images, while maintaining superior text comprehension.
Advanced Performance Metrics
Pixtral Large has been evaluated against leading models on a series of standard multimodal benchmarks. In MathVista, which tests complex mathematical reasoning over visual data, Pixtral Large achieved a remarkable score of 69.4%, surpassing all other models in the category. Additionally, in ChartQA and DocVQA, which assess reasoning over complex charts and documents, Pixtral Large outperformed prominent models like GPT-4o and Gemini-1.5 Pro.
The model also demonstrated competitive abilities on the MM-MT-Bench, outperforming Claude-3.5 Sonnet (new), Gemini-1.5 Pro, and GPT-4o (latest). MM-MT-Bench serves as an open-source, judge-based evaluation reflecting real-world applications of multimodal language models.
Model Specifications and Applications
Pixtral Large features a 123 billion parameter multimodal decoder paired with a 1 billion parameter vision encoder. It is designed with a 128K context window, capable of accommodating a minimum of 30 high-resolution images, ensuring extensive data processing capabilities.
Available under the Mistral Research License for academic and research purposes, and a commercial license for business applications, Pixtral Large is set to revolutionize how enterprises utilize AI for document analysis, chart interpretation, and more.
Real-World Use Cases
In practical applications, Pixtral Large excels in multilingual optical character recognition (OCR) and reasoning tasks. For instance, when analyzing a German receipt, the model accurately calculates totals and incorporates an 18% tip, showcasing its proficiency in handling real-world scenarios.
Beyond document processing, the model’s capabilities extend to chart analysis, identifying critical points of instability in training loss curves, highlighting its utility in technical and business environments.
Continued Innovation
Alongside Pixtral Large, Mistral AI has updated its flagship text model, Mistral Large, now available as Mistral Large 24.11. This version offers improvements in long context understanding, a new system prompt, and enhanced function calling, tailored for enterprise use cases such as knowledge exploration, semantic document understanding, and task automation.
Mistral Large 24.11 is set to be accessible via cloud providers like Google Cloud and Microsoft Azure, enhancing its availability for businesses seeking advanced AI solutions.
For more details, visit the Mistral AI website.
Image source: Shutterstock
Source link
<p>The post Mistral AI Unveils Pixtral Large, a Cutting-Edge Multimodal Model first appeared on CoinBuzzFeed.</p>