GPT-5.3-Codex-Spark

Codex

OpenAI released a research preview of GPT-5.3-Codex-Spark on February 12, 2026 β€” a smaller, faster variant of GPT-5.3-Codex and the first model designed explicitly for real-time coding. Running on Cerebras' Wafer Scale Engine 3 chip, the model delivers more than 1,000 tokens per second and represents the first production milestone in OpenAI's multi-year partnership with Cerebras. The preview is available exclusively to ChatGPT Pro users via the Codex app, CLI, and IDE extension, with a 128k context window and text-only input at launch.


Overview

On February 12, 2026, OpenAI introduced GPT-5.3-Codex-Spark in research preview β€” a lighter, faster sibling of GPT-5.3-Codex built from the ground up for real-time coding interaction. The model is described as OpenAI's "first model designed for real-time coding," with an emphasis on near-instant responsiveness that changes how developers experience AI-assisted workflows.

Speed and Hardware

The defining characteristic of Codex-Spark is its inference speed. The model is optimized to feel near-instant, delivering more than 1,000 tokens per second β€” a figure made possible by running on Cerebras' Wafer Scale Engine 3 (WSE-3) hardware. The WSE-3 is the largest AI chip ever built, measuring 46,255 mmΒ² with 4 trillion transistors and 900,000 AI-optimized cores capable of delivering 125 petaflops of compute. This enables dramatically lower latency compared to GPT-5.3-Codex running on conventional GPU infrastructure.

This release marks the first output of OpenAI's partnership with Cerebras, a collaboration announced in early 2026. Codex-Spark is the first OpenAI model deployed in production on non-Nvidia hardware, a notable strategic development that signals OpenAI's intent to diversify its inference infrastructure.

Performance

Despite being a smaller model, Codex-Spark demonstrates strong performance on agentic software engineering benchmarks. On SWE-Bench Pro and Terminal-Bench 2.0, the model completes tasks in a fraction of the time compared to GPT-5.3-Codex while maintaining competitive accuracy. Early testers report tasks completing in 20–41 seconds where larger models previously took minutes.

Availability

During the research preview period, Codex-Spark is accessible to ChatGPT Pro subscribers through three surfaces: the Codex desktop app, the Codex CLI (via codex --model gpt-5.3-codex-spark), and the IDE extension. The model uses a 128k context window and accepts text-only input in this initial phase. Usage during the preview period is tracked under separate limits. OpenAI notes that the preview phase allows them to work with Cerebras to ramp up datacenter capacity and harden the end-to-end experience before broader rollout.