According to Foresight News, OpenAI plans to introduce voice and image conversation features in ChatGPT within the next two weeks for Plus and enterprise users. The voice conversation feature will be available on iOS and Android devices, while the image conversation feature will be accessible on all platforms.
The voice feature is supported by a new text-to-speech model that can generate human-like audio from text and a few seconds of sample voice. It uses the open-source speech recognition system Whisper to transcribe spoken language into text, generate an answer, and then convert the answer back into speech to play for the user. The image feature is powered by multimodal GPT-3.5 and GPT-4, applying language reasoning skills to various images, such as photos, screenshots, and documents containing text and images. Users can show one or more images to ChatGPT, which will attempt to recognize the content the user wants to inquire about and provide a corresponding response, such as exploring the contents of a refrigerator to plan meals or analyzing complex work-related data charts.