OpenAI releases update: achieving real-time reasoning across audio, vision, and text

CoinVoice recently learned that according to Cointelegraph, OpenAI made four updates to its models in October to help its AI models better conduct conversations and improve image recognition capabilities. The first major update is the Realtime API, which allows developers to create AI-generated voice applications with a single prompt to achieve natural conversations similar to ChatGPT's advanced voice model. Previously, developers had to "stitch together" multiple models to create these experiences. Audio inputs typically need to be fully uploaded and processed before receiving a response, which means that real-time applications such as voice conversations with each other have high latency. With the streaming capabilities of the Realtime API, developers can now achieve instant and natural interactions, just like a voice assistant. The API runs on GPT-4, which was released in May 2024, and can reason across audio, vision, and text in real time.
Another update includes a fine-tuning tool for developers that enables them to improve AI responses generated from image and text inputs. The image-based fine-tuning enables the AI ​​to better understand images, thereby enhancing visual search and object detection capabilities. The process includes feedback from humans, who provide examples of good and bad responses for training.
In addition to the speech and vision updates, OpenAI also launched "model distillation" and "hint caching," which allow smaller models to learn from larger models and reduce development costs and time by reusing processed text. According to Reuters, OpenAI expects revenue to rise to $11.6 billion next year, up from an estimated $3.7 billion in 2024. [Original link]
Explore More From Creator

Latest News

Explore More From Creator

Latest News

Trending Articles