OpenAI Introduces MLE-Bench To Evaluate AI Performance In Machine Learning

--・Потвърден официален акаунт в Binance

According to PANews, OpenAI has launched a new benchmark called MLE-bench, designed to assess the performance of AI agents in developing machine learning solutions. This benchmark includes 75 Kaggle competitions, focusing on challenging tasks in current machine learning development and comparing AI results with human performance. In initial tests, the o1-preview model combined with the AIDE framework performed the best, earning a bronze medal in 16.9% of the competitions, surpassing Anthropic's Claude 3.5 Sonnet. By increasing the number of attempts, the success rate of o1-preview doubled to 34.1%. OpenAI believes that MLE-bench helps evaluate core ML engineering skills, although it does not cover all areas of AI research.

Отказ от отговорност: Включва мнения на трети страни. Това не е финансов съвет. Може да включва спонсорирано съдържание. Вижте Правилата и условията.

Свързани новини

Bitwise Strategist Highlights Potential of AI Framework in Emerging Economy