OpenAI can reproduce human voices—but it hasn't released the technology yet
Speech synthesis technology has come a long way since the introduction of the Speak & Spell toys in 1978. Now, using deep learning artificial intelligence models, software can not only create realistic-sounding sounds but also convincingly imitate existing sounds using small audio samples. OpenAI this week released Speech Engine, a text-to-speech artificial intelligence model used to create synthetic speech from 15-second recorded audio clips. But OpenAI isn't ready for a broad release of its technology yet, and the company initially planned to launch a pilot program earlier this month for developers to sign up for the speech engine API. But after thinking more about the ethical implications, the company decided to scale back its ambitions for now. The company said they hope the preview demonstrates the potential of speech engines and inspires the need to build social resilience to the challenges posed by increasingly compelling generative models. Overall, voice cloning technology isn't particularly new, but the idea that OpenAI is gradually letting anyone use its particular brand of voice technology is noteworthy. The company says the benefits of voice technology include providing reading assistance through natural voices, providing creators with global reach, providing personalized voice options for non-verbal individuals and helping patients regain their voices after surgery. But it also means that anyone with 15 seconds of someone's recorded voice can effectively clone it, which has obvious implications for potential abuse. So OpenAI is responsibly warning us all about this already existing technology, saying they are looking to phase out voice-based bank account authentication and educate the public about the "potential for deceptive AI content." and solutions such as accelerating the development of technology that can trace the origin of audio content.
$ARS
$AIGX