Author: Kha'Zix

In the middle of the night, OpenAI abstracted a new model that took almost half a year.

Without any prior notice, it officially debuted.

The official name is not Strawberry, Strawberry is just an internal code name. Their official name is:

Why is it named o1? OpenAI said:

For complex reasoning tasks this is a significant advancement and represents a new level of AI capability. Given this, we are resetting the counter back to 1 and naming this series OpenAI o1.

Translated:

This is an important advance for complex reasoning tasks and represents a new level of AI capability. Given this, we are resetting the counter to 1 and naming this series OpenAI o1.

The model was so powerful that OpenAI even abandoned the previous GPT series and renamed it the O series.

It exploded. It really exploded.

My scalp is tingling now, really. The release of OpenAI o1 also marks that the AI ​​industry has officially entered a new era.

“There is nothing standing in our way of achieving AGI.”

In terms of logic and reasoning ability, I'll just post the picture and you'll know how outrageous this thing is.

In AIME 2024, a high-level mathematics competition, GPT4o has an accuracy rate of 13.4%, while this o1 preview version is 56.7%, and the o1 official version that has not yet been released is 83.3%.

In the coding competition, the accuracy of GPT4o is 11.0%, the o1 preview version is 62%, and the o1 official version is 89%.

For the most challenging doctoral-level scientific questions (GPQA Diamond), GPT4o scored 56.1, the human expert level was 69.7, and o1 reached a terrifying 78%.

I asked Claude to translate the o1 diagram. It's a bit ugly, but it's OK as long as you can understand the meaning of each data.

This is what a total crushing is.

In particular, on the benchmark GPQA-diamond, which tests expertise in chemistry, physics, and biology, o1's performance comprehensively surpassed that of human PhD experts, making it the first model ever to achieve this achievement.

The reason why the entire model has achieved such success is that the cornerstone is Self-play RL. If you don’t know this, you can read my prediction article two days ago: What is the new model Strawberry?

Through Self-Play RL, o1 learned to hone its thought chain and perfect the strategy it used. It learned to recognize and correct its own mistakes.

It also learned to break down complex steps into simpler ones.

And it also learns to try different approaches when the current one doesn't work.

What he learned is the most core way of thinking for us humans: slow thinking.

Nobel Prize winner in Economics Daniel Kahneman has a book called: Thinking, Fast and Slow.

It explains in great detail the two ways of human thinking.

The first is fast thinking (System 1), which is characterized by being fast, automatic, intuitive, and unconscious. Here are a few examples:

  • When you see a smiling face, you know that the other person is in a good mood.

  • It’s such a simple calculation: 1+1=2.

  • If you encounter a dangerous situation while driving, brake immediately.

These are fast thinking, which is the traditional big model, the ability to respond quickly learned by rote memorization.

The second type is slow thinking (System 2), which is characterized by slowness, effort, logic, and consciousness. Here are a few examples:

  • Solve a complex math problem

  • Fill out your tax return

  • Weigh the pros and cons before making important decisions

This is slow thinking, the core of why we humans are powerful, and the cornerstone for AI to move towards the next step of AGI.

And now, o1 has finally taken a solid step and possessed the human characteristic of slow thinking. Before answering, he will repeatedly think, disassemble, understand, and reason, and then give the final answer.

Honestly, these enhanced reasoning abilities are absolutely extremely useful when dealing with complex problems in science, coding, mathematics, and similar fields.

For example, o1 can be used by medical researchers to annotate cell sequencing data, by physicists to generate the complex mathematical formulas required for quantum optics, and by developers in various fields to build and execute multi-step workflows, and so on.

o1 is also definitely a new generation of data flywheel. If the answer is correct, the entire logic chain will become a small data set containing training examples of positive and negative rewards.

Given OpenAI's user level, the speed of future evolution will only be more terrifying.

Having written this, I suddenly sighed. I feel that compared to O1 a year later, I may be a complete waste, really. . .

Currently, the o1 model has been gradually opened to all ChatGPT Plus and Team users, and will be considered to be opened to free users in the future.

It is divided into two models, o1 preview version and o1 mini. o1-mini is faster, smaller and cheaper, and is good at reasoning. It is extremely suitable for mathematics and code, but the world knowledge is very poor. It is suitable for scenarios that require reasoning but not extensive world knowledge.

o1-preview has 30 posts per week and o1-mini has 50 posts per week.

Avalanches are not even limited to the previous 3 hours, but 30 per week. This also shows how expensive the o1 model is.

For developers, it is only open to Level 5 developers who have paid $1,000, and is limited to 20 times per minute.

There are quite few of them.

And the functionality has been greatly reduced, but after all, it is still in its early stages, so I understand.

In terms of API pricing, the o1 preview version costs $15 per million inputs and $60 per million outputs. This inference cost...

The o1-mini will be cheaper at $3 per million input and $12 per million output.

The output cost is 4 times the inference cost, compared to GPT4o, which is $5 and $15 respectively.

o1-mini still has some economic effect, but we still have to start and wait for OpenAI to make a big discount.

Since it was said that o1 was already available to Plus users, I went straight to my account to take a look, and it was pretty good, so I got it.

Of course, give it a try as soon as possible.

Currently, it does not support all the previous functions, that is, there is no image understanding, image generation, code interpreter, web search, etc., only a bare model that can communicate.

I first have a question that was once fatal:

"The farmer needs to bring the wolf, the sheep, and the cabbage across the river, but he can only bring one item at a time. Moreover, the wolf and the sheep cannot live alone, and the sheep and the cabbage cannot live alone. How should the farmer cross the river?"

After thinking for 6 seconds, he gave me a perfect answer.

There is also a previous problem of adjusting holidays that has plagued all large models:

"This is China's holiday arrangement from September 9, 2024 (Monday) to October 13: Work 6, rest 3, work 3, rest 2, work 5, rest 1, work 2, rest 7, then work 5, rest 1.

Could you please tell me how many days of rest I got because of the holiday, in addition to the weekends I was supposed to take off? "

After O1 thought for a full 30 seconds, he gave an extremely accurate answer that was accurate to the day.

Invincible, really invincible.

Here is a more difficult one, which is the math problem from Jiang Ping's competition:

Don't ask me what the question means, I don't understand, I'm useless. This question has slaughtered all the big models before. This time, let's let o1 try it too.

After o1 thought for more than a minute, he gave the answer.

...

All...Yes...

I am cracked.

After trying it myself, I feel that Prompt may need to be re-explored in the future. In the era of fast thinking and big models represented by GPT, we have a lot of so-called step-by-step thinking and other things, but now they are all invalid and even have negative effects on o1.

The best way to write it is given by OpenAI:

  • Keep prompts simple and direct: Models excel at understanding and responding to short, clear instructions without requiring extensive coaching.

  • Avoid thought chaining prompts: Since these models reason internally, they don’t need prompts to “think step by step” or “explain your reasoning.”

  • Use delimiters for clarity: Use delimiters such as triple quotes, XML tags, or section headings to clearly indicate different parts of the input and help the model interpret the different parts appropriately.

  • Limiting additional context in Retrieval Augmented Generation (RAG): When providing additional context or documents, only the most relevant information is included to prevent the model from over-complicating its response.

Finally, I would like to talk about the length of time this thinking takes.

Now o1 thinks for one minute, but, if it is a real AGI, to be honest, the slower the thinking, the more exciting it might be.

What if he can really prove mathematical theorems, develop cancer drugs, or do astronomical research?

How many hours, days, or even weeks can each thought last?

The final result may shock everyone beyond belief.

Now, no one can imagine what AI will be like at that time.

In my opinion, the future of o1 is definitely more than just an ordinary ChatGPT.

Rather, it is the greatest cornerstone for us to move into the next era.

“There is nothing standing in our way of achieving AGI.”

Now, I firmly believe this sentence without hesitation.

The next era of star-studded brilliance.

Today.

It’s officially arrived.