Cursor: Upgraded Voice Input with Batch Speech-to-Text
Cursor 3.1 upgrades voice dictation in the Agents Window by switching to a batch speech-to-text approach, delivering significantly more accurate transcriptions. Users press and hold Ctrl+M to record, with a new UI displaying a waveform, timer, and cancel/confirm controls. The system records the full voice clip before transcribing β replacing the previous streaming approach β resulting in higher-quality text output for agent prompts.
Sources & Mentions
4 external resources covering this update
Upgraded Voice Input: More Accurate Dictation in the Agents Window
Cursor 3.1 ships a meaningful improvement to voice dictation in the Agents Window, replacing its previous speech-to-text implementation with a batch processing approach that produces noticeably more accurate transcriptions.
What Changed
Previously, voice input in Cursor used streaming speech-to-text, transcribing audio incrementally as it was spoken. While fast, this approach is more prone to mid-sentence errors and corrections. With the 3.1 upgrade, voice recording now captures the full audio clip first, then sends it for transcription in a single batch β a technique that consistently yields higher quality output for longer or more complex dictation.
The interaction model is straightforward: press and hold Ctrl+M to begin recording. A dedicated UI appears while recording is active, showing a live waveform so developers can confirm the microphone is picking up audio, a timer indicating recording duration, and buttons to either cancel the recording or confirm and transcribe it.
Why This Matters
Voice input is particularly useful when writing longer, more nuanced prompts for agents β where speaking is faster than typing and the exact wording matters for getting a useful response. Inaccurate transcription in these cases can send an agent off in the wrong direction. The switch to batch STT directly reduces that failure mode.
The new UI also addresses a common pain point: with the waveform and timer visible, users have immediate feedback that recording is working, eliminating the uncertainty of "did it hear me?" that plagued the previous implementation.
Part of Cursor 3.1
This voice input upgrade ships as part of Cursor 3.1 alongside the Tiled Layout for the Agents Window, branch selection before cloud agent launch, diff-to-file navigation, and a set of Agents Window performance and stability improvements.