OpenAI on Monday revealed its latest flagship model called GPT-4o (“o” for “omni”), and it’s seemingly the closest we have gotten to having an intelligent assistant as the “Jarvis” in the Iron Man movie. 

The selling point is that ChatGPT-4o can handle different modalities, which most of the existing AI models cannot do. What this means is that GPT-4o can accept and generate any combination of text, audio, and image requests.

The staged demo presented by the team on X (formerly Twitter) was so impressive that many people hyped it up. One big feat is that GPT-4o responds to audio inputs in as little as 232 milliseconds, which is similar to human response time during conversation.

“It feels like AI from the movies; and it’s still a bit surprising to me that it’s real,” OpenAI’s CEO Sam Altman wrote in a blog post Monday. “Getting to human-level response times and expressiveness turns out to be a big change.”

OpenAI has started rolling out GPT-4o’s text and image features to users. In the coming weeks, the audio and video capabilities will be released to “a small group of trusted partners in the API,” the company said. 

Notwithstanding, here are some of the things you can do with the ChatGPT-4o model. 

Things You Can Do With GPT-4o

Create Images with Legible Texts

Up until now, some AI image generators like Midjourney still struggle to make images with readable texts. OpenAI said GPT-4o now understands text descriptions much better and can make legible texts on images. 

Image Source: OpenAI Real-Time Translation

In a situation where a translator is needed, GPT-4o can act as one. In a video demonstration, OpenAI’s team showed that GPT-4o could repeat something said in English in Spanish, perhaps other languages, and back from Spanish to English.

Realtime translation with GPT-4o pic.twitter.com/J1BsrxwYdE

— OpenAI (@OpenAI) May 13, 2024

Look and Tell

For people who are visually impaired, or just for the fun of it, ChatGPT-4o can look and tell what is happening around your surroundings through the phone camera. In one case, the model was able to tell someone was having a birthday celebration when it noticed a cake and candle in the room.

@BeMyEyes with GPT-4o pic.twitter.com/nWb6sEWZlo

— OpenAI (@OpenAI) May 13, 2024

Solve Math Problems

GPT-4o can also look at math problems on a paper sheet or display screen and give the answer to them. Not just that, it can also tutor and guide you to learn how to solve the problem.

Math problems with GPT-4o and @khanacademy pic.twitter.com/RfKaYx5pTJ

— OpenAI (@OpenAI) May 13, 2024

AI in Visual Meeting

GPT-4o can join visual meetings and hold conservations with participants. It can also help users prepare for job interview meetings. 

Meeting AI with GPT-4o pic.twitter.com/rHkQ316MYj

— OpenAI (@OpenAI) May 13, 2024