Cursor: Upgraded Voice Input with Batch Speech-to-Text

CursorApr 13, 2026

Cursor 3.1 upgrades voice dictation in the Agents Window by switching to a batch speech-to-text approach, delivering significantly more accurate transcriptions. Users press and hold Ctrl+M to record, with a new UI displaying a waveform, timer, and cancel/confirm controls. The system records the full voice clip before transcribing — replacing the previous streaming approach — resulting in higher-quality text output for agent prompts.

Sources & Mentions

4 external resources covering this update

Cursor 3.1: Tiled Layout and Upgraded Voice Input – Release Discussions

Cursor Community Forum

Cursor 3

Hacker News

New Cursor 3 ditches the classic IDE layout for an agent-first interface built around parallel AI fleets

The Decoder

What Is Cursor 3? Agents, Worktrees, and What's New

DataCamp

Upgraded Voice Input: More Accurate Dictation in the Agents Window

Cursor 3.1 ships a meaningful improvement to voice dictation in the Agents Window, replacing its previous speech-to-text implementation with a batch processing approach that produces noticeably more accurate transcriptions.

What Changed

Previously, voice input in Cursor used streaming speech-to-text, transcribing audio incrementally as it was spoken. While fast, this approach is more prone to mid-sentence errors and corrections. With the 3.1 upgrade, voice recording now captures the full audio clip first, then sends it for transcription in a single batch — a technique that consistently yields higher quality output for longer or more complex dictation.

The interaction model is straightforward: press and hold Ctrl+M to begin recording. A dedicated UI appears while recording is active, showing a live waveform so developers can confirm the microphone is picking up audio, a timer indicating recording duration, and buttons to either cancel the recording or confirm and transcribe it.

Why This Matters

Voice input is particularly useful when writing longer, more nuanced prompts for agents — where speaking is faster than typing and the exact wording matters for getting a useful response. Inaccurate transcription in these cases can send an agent off in the wrong direction. The switch to batch STT directly reduces that failure mode.

The new UI also addresses a common pain point: with the waveform and timer visible, users have immediate feedback that recording is working, eliminating the uncertainty of "did it hear me?" that plagued the previous implementation.

Part of Cursor 3.1

This voice input upgrade ships as part of Cursor 3.1 alongside the Tiled Layout for the Agents Window, branch selection before cloud agent launch, diff-to-file navigation, and a set of Agents Window performance and stability improvements.

Mentioned onCursor Community Forum Hacker News The Decoder DataCamp