Replit: Audio Models Added to AI Integrations

ReplitJan 23, 2026

Replit added four GPT-4o audio models to its AI Integrations platform, enabling developers to build speech-to-text and text-to-speech applications without setting up separate API keys. The two transcription models (gpt-4o-transcribe and gpt-4o-transcribe-mini) handle speech-to-text conversion, while two audio output models (gpt-4o-audio and gpt-4o-audio-mini) generate spoken content. Use cases include podcast transcription tools, voice assistants, and speech-enabled app interfaces.

Sources & Mentions

2 external resources covering this update

Replit Introduces New AI Integrations for Multi-Model Development

InfoQ

Replit Release Notes – January 2026 Latest Updates

Releasebot

Four GPT-4o Audio Models Now Available

Replit's January 23, 2026 update brought audio capabilities to its AI Integrations platform, adding four OpenAI GPT-4o audio models that developers can use immediately without managing separate API credentials. The addition extends Replit's zero-setup AI tooling into the voice and audio space — an area of growing developer interest following widespread adoption of voice-enabled AI assistants.

The Four Models

The release adds two input models and two output models:

gpt-4o-transcribe — Full-quality speech-to-text transcription
gpt-4o-transcribe-mini — A smaller, faster transcription model for latency-sensitive applications
gpt-4o-audio — Full audio generation for text-to-speech use cases
gpt-4o-audio-mini — A compact audio output model optimized for performance

Developers can access all four through the same AI Integrations interface that already provides access to over 300 models, meaning no new accounts, no API key configuration, and no infrastructure setup.

What Developers Can Build

The transcription models open up use cases like podcast and meeting transcription tools, audio file processors, and accessibility features for existing apps. The audio output models support voice assistant interfaces, text-to-speech narration engines, and speech-enabled interaction layers for web and mobile applications.

For vibe coding specifically, this is notable: a developer can now describe a podcast transcription app or a voice-controlled to-do list in natural language, and Replit's Agent can scaffold the entire application — including the audio API calls — without the developer needing to know the details of the OpenAI audio API.

Also in the January 23 Release

The same release also added web search capabilities to Agent Automations (allowing scheduled agents to search the web natively without separate API keys) and added RulesSync support for replit.md configuration files, enabling synchronization of AI agent instructions across multiple projects.

Mentioned onInfoQ Releasebot