OpenAI releases update to enable real-time reasoning across audio, vision, and text

Odaily Planet Daily News OpenAI made four updates to its models in October, helping its AI models to have better conversations and improve image recognition capabilities. The first major update is the Realtime API, which allows developers to create AI-generated voice applications using a single prompt to achieve natural conversations similar to ChatGPT's advanced voice model. Previously, developers had to "stitch together" multiple models to create these experiences. Audio inputs usually need to be fully uploaded and processed before receiving a response, which means that real-time applications such as voice conversations with each other have high latency. With the streaming capabilities of the Realtime API, developers can now achieve instant and natural interactions, just like voice assistants. The API runs on GPT-4, which was released in May 2024, and can reason across audio, vision, and text in real time. Another update includes fine-tuning tools for developers that enable them to improve AI responses generated from image and text inputs. Image-based fine-tuners enable AI to better understand images, thereby enhancing visual search and object detection capabilities. The process includes feedback from humans, who provide examples of good and bad reactions for training. In addition to the speech and vision updates, OpenAI also launched "model distillation" and "prompt caching," allowing smaller models to learn from larger models and reduce development costs and time by reusing processed text. According to Reuters, OpenAI expects revenue to increase to $11.6 billion next year, up from an estimated $3.7 billion in 2024. (Cointelegraph)
Explore More From Creator

Latest News

Explore More From Creator

Latest News

Trending Articles