OpenAI Launches Three Audio Models to Power Real-Time Voice Applications
OpenAI Audio Models

OpenAI Launches Three Audio Models to Power Real-Time Voice Applications

Mintesinot Niggusie

OpenAI has released three new audio models for its developer platform, expanding its capabilities into real-time voice agents that can translate, transcribe and respond during live conversations.

The models, GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper, are available for testing in the company’s developer playground, according to OpenAI. The company said GPT-Realtime-Translate supports speech translation across more than 70 languages.

GPT-Realtime-Whisper is designed for live speech-to-text processing, enabling real-time captions, meeting notes and workflow updates as users speak. GPT-Realtime-2 is positioned as the most advanced of the three, capable of handling complex requests, managing interruptions, and calling external tools.

The release extends OpenAI’s application programming interface (API) beyond traditional chat and transcription tools, moving toward agents that can actively process spoken input in real time. The company said early testers of the models include Zillow, Priceline and Deutsche Telekom.

OpenAI also outlined pricing for the new tools. GPT-Realtime-2 starts at $32 per million audio input tokens, while GPT-Realtime-Translate is priced at $0.034 per minute. GPT-Realtime-Whisper costs $0.017 per minute.