Odaily Planet Daily News OpenAI released its latest flagship model GPT-4o, which can infer audio, vision and text in real time. The main concept is an anthropomorphic, supernatural, ultra-low-latency personal voice interactive assistant. According to information related to the official website of OpenAI and the official account of the Generate any combination of text, audio, and image output. It can respond to audio input in 232 milliseconds, with an average of 320 milliseconds, similar to human reaction times in conversation. It performs on par with GPT-4 Turbo in English and code, with significant improvements on non-English language text, while having a faster and 50% cheaper API. GPT-4o performs particularly well in visual and audio understanding compared to existing models. Text and image input are rolling out to the API and ChatGPT today, with voice and video input coming in the coming weeks.