According to PANews, OpenAI has launched a new benchmark called MLE-bench, designed to assess the performance of AI agents in developing machine learning solutions. This benchmark includes 75 Kaggle competitions, focusing on challenging tasks in current machine learning development and comparing AI results with human performance. In initial tests, the o1-preview model combined with the AIDE framework performed the best, earning a bronze medal in 16.9% of the competitions, surpassing Anthropic's Claude 3.5 Sonnet. By increasing the number of attempts, the success rate of o1-preview doubled to 34.1%. OpenAI believes that MLE-bench helps evaluate core ML engineering skills, although it does not cover all areas of AI research.