Site icon AIT365

OpenAI Launches gpt-realtime and Expands Realtime API with New Production-Ready Features

OpenAI

OpenAI announced the general availability of its enhanced Realtime API and unveiled gpt-realtime, the company’s most advanced speech-to-speech model, designed for developers and enterprises building production-ready voice agents.

OpenAI’s gpt-realtime marks a significant leap forward in voice AI, delivering substantial advances in audio quality, instruction following, intelligence, and function-calling:

The upgraded Realtime API includes new capabilities that bolster developer flexibility and agent intelligence:

Also Read: Google Introduces Gemini 2.5 Flash Image: A State-of-the-Art Image Generation and Editing Model

OpenAI emphasized that both gpt-realtime and the Realtime API are built with enterprise-grade safety and privacy measures:

Since the Realtime API’s initial public beta launch in October 2024, thousands of developers have contributed feedback that shaped today’s production-ready release. The API’s single-model architecture processing audio directly without splitting into speech-to-text and text-to-speech pipelines reduces latency and preserves speech nuance, enabling more authentic conversational experiences.

Experts see gpt-realtime as a major advancement in voice AI. Josh Weisberg, Head of AI at Zillow, noted: “The new speech-to-speech model in OpenAI‘s Realtime API shows stronger reasoning and more natural speech allowing it to handle complex, multi-step requests like narrowing listings by lifestyle needs or guiding affordability discussions with tools like our BuyAbility score. This could make searching for a home on Zillow or exploring financing options feel as natural as a conversation with a friend, helping simplify decisions like buying, selling, and renting a home.”

Exit mobile version