OpenAI releases update: achieving real-time reasoning across audio, vision, and text

ChainCatcher news, according to Cointelegraph, OpenAI made four updates to its models in October to help its AI models better conduct conversations and improve image recognition capabilities. The first major update is the Realtime API, which allows developers to create AI-generated voice applications with a single prompt to achieve natural conversations similar to ChatGPT's advanced voice model. Previously, developers had to "stitch together" multiple models to create these experiences. Audio inputs usually need to be fully uploaded and processed before receiving a response, which means that real-time applications such as voice conversations with each other have high latency. With the streaming capabilities of the Realtime API, developers can now achieve instant and natural interactions, just like a voice assistant. The API runs on GPT-4, released in May 2024, and can reason across audio, vision, and text in real time.
Another update includes a fine-tuning tool for developers that enables them to improve AI responses generated from image and text inputs. The image-based fine-tuning enables the AI ​​to better understand images, thereby enhancing visual search and object detection capabilities. The process includes feedback from humans, who provide examples of good and bad responses for training.
In addition to the speech and vision updates, OpenAI also launched "model distillation" and "hint caching," which allow smaller models to learn from larger ones and reduce development costs and time by reusing already processed text. OpenAI expects revenue to rise to $11.6 billion next year, up from a projected $3.7 billion in 2024, according to Reuters.

OpenAI releases update: achieving real-time reasoning across audio, vision, and text

Explore More From Creator

Latest News