According to Cointelegraph, artificial intelligence developer OpenAI has introduced several updates to its models at the beginning of October, aimed at improving conversational abilities and image recognition. On October 1, OpenAI announced four new tools designed to facilitate developers in building on its AI models.
One significant update is the Realtime API, which allows developers to create AI-generated voice applications using a single prompt. This tool supports low-latency, multimodal experiences by streaming audio inputs and outputs, enabling natural conversations similar to ChatGPT’s Advanced Voice Mode. Previously, developers had to combine multiple models to achieve these experiences, resulting in higher latency for real-time applications like speech-to-speech conversations. With the Realtime API’s streaming capability, developers can now enable immediate, natural interactions, much like voice assistants. The API operates on GPT-4, released in May 2024, which can reason across audio, vision, and text in real time.
Another update includes a fine-tuning tool for developers, allowing them to enhance AI responses generated from images and text inputs. The image-based fine tuners improve the AI's capacity to understand images, thereby enhancing visual search and object detection capabilities. This process involves human feedback, where examples of good and bad responses are provided to the AI. Additionally, OpenAI has introduced “model distillation” and “prompt caching,” which enable smaller models to learn from larger ones and reduce development costs and time by reusing already processed text.
The advanced capabilities of OpenAI's models are a significant selling point, as a substantial portion of the company's revenue comes from businesses building their own applications on top of OpenAI’s technology. According to Reuters, OpenAI projects its revenue to rise to $11.6 billion next year, up from an estimated $3.7 billion in 2024.