OpenAI initiated the rollout of ChatGPT’s Advanced Voice Mode on Tuesday, granting users initial access to GPT-4o’s hyperrealistic audio responses. The alpha version is currently accessible to a limited group of ChatGPT Plus users, with intentions to extend availability to all Plus users by the fall of 2024.
When OpenAI unveiled GPT-4o's voice in May, audiences were impressed by its swift responses and its striking resemblance to an actual human voice. The voice, named Sky, closely echoed the tone of the actress Scarlett Johansson from the movie “Her.” Post the demo, Johansson raised apprehensions regarding the resemblance and enlisted legal representation to safeguard her image. OpenAI refuted the use of Johansson’s voice and subsequently eliminated it from the demo. In June, OpenAI disclosed a postponement in the launch of the Advanced Voice Mode to bolster safety protocols.
One month later, OpenAI is commencing the implementation of this feature; however, the video and screen-sharing functions highlighted in the Spring Update will not be part of the initial alpha release. At present, select premium users will be able to utilize the voice feature that was showcased earlier.
Advanced Voice Mode sets itself apart from the current Voice Mode through the utilization of GPT-4o, a multimodal model that manages voice-to-text, text processing, and text-to-voice functions independently, leading to faster interactions. OpenAI asserts that GPT-4o has the capability to perceive emotional nuances conveyed through the user's voice, including feelings like sadness, excitement, or musical tones in the form of singing.
In this pilot stage, users of ChatGPT Plus will experience the hyperrealistic functions of Advanced Voice Mode. OpenAI is progressively introducing this feature to oversee its usage closely. Users in the alpha group will be notified within the ChatGPT app and will subsequently receive detailed instructions via email.
Since the May demo, OpenAI has conducted tests on GPT-4o's voice functionalities with more than 100 external red teamers speaking 45 different languages. A report on these safety initiatives is anticipated to be released in early August. The Advanced Voice Mode will only feature ChatGPT's four preset voices—Juniper, Breeze, Cove, and Ember—crafted in collaboration with professional voice actors. The Sky voice demonstrated in May has been discontinued. As per OpenAI spokesperson, Lindsay McCallum, ChatGPT is unable to replicate the voices of specific individuals and will prevent the generation of outputs deviating from the established preset voices.