OpenAI has introduced its new artificial intelligence model GPT-4o, which significantly enhances speech processing capabilities compared to GPT-4. The rollout of GPT-4o will be gradual, it will be added to all of the company's products for developers and consumers in the coming weeks, and is already available as an API.
During the announcement, OpenAI CTO Muri Murati emphasized that GPT-4o expands the capabilities of the previous GPT-4 model through multimodality, i.e., using not only textual data for training, but also video, audio, images, and other visual information. This has significantly improved GPT-4o's speech processing capabilities.
ChatGPT previously had a voice mode that converted text-based chatbot responses to speech using a text-to-speech model. With the GPT-4o, this feature has been significantly improved, turning ChatGPT into a more dynamic tool similar to a virtual assistant.
Users can now interact with ChatGPT in a conversational manner, even interrupting it in the middle of a response, and the model demonstrates real-time adaptability. In addition, the GPT-4o is able to recognize emotional nuances in the user's voice and can respond in different emotional styles, adding a level of personalization to the interaction.
Murati also announced that OpenAI will release a desktop version of ChatGPT along with an updated user interface. The company believes that this will simplify user interaction with increasingly complex AI models.
All in all, given that OpenAI is actively negotiating with Apple on a deal to integrate ChatGPT into the iPhone, today we have been shown what next-generation voice assistants can do. If this deal goes through, Apple will finally be able to significantly upgrade Siri, turning its virtual assistant into a truly useful tool.