In a move towards the development of ultra-fast AI technology, the San Francisco-based artificial intelligence firm Open AI has announced the development of the GPT-5.3 Codex Spark, a coding tool that promises “near-instant coding.” Codex Spark, a research-level version of the product designed for development needs, is a major step forward for the Codex line of coding assistants, committing to the needs of the developer community with ultra-fast capabilities.
GPT-5.3-Codex-Spark is being introduced as a smaller, latency-focused variant of the powerful GPT-5.3-Codex model and is the first result of OpenAI’s collaboration with Cerebras, the AI hardware innovator. Optimized to deliver more than 1000 tokens per second, Codex-Spark allows developers to interact with code in near real-time making it ideal for tasks like targeted edits, logic refinement, interface enhancements, and rapid iteration during development cycles.
Real-Time Coding for Today’s Development Workflows
Unlike traditional coding AI models, which balance depth of reasoning and throughput, Codex-Spark prioritizes latency performance without sacrificing capability. It’s optimized for these interactive coding sessions-developers can just stop, redirect, and refine outputs on the fly as they work. That will make collaborative coding with AI feel a lot more fluid and responsive-most of all, in the dynamic environment where speed of iteration is so important.
At launch, Codex-Spark offers a 128k context window and supports text-only interactions. During its research preview phase, usage is subject to a dedicated rate limit that does not count against standard model quotas. High demand may at times lead to temporary access pacing as OpenAI scales infrastructure to ensure consistent performance.
Under-the-Hood: Architecture and Latency Improvements
To achieve ultra-fast performance, Codex-Spark runs on Cerebras’ Wafer Scale Engine 3, a specialized AI accelerator designed for high-speed inference. This partnership enabled OpenAI to add a new low-latency serving tier alongside its existing GPU-based infrastructure, allowing seamless integration of rapid response capabilities.
They also made some important latency optimization enhancements to their entire inference pipeline. These enhancements consist of a reduction of client to server round trip cost by 80%, a reduction of the cost per token by 30%, and a reduction of time to first token by 50%, thus improving not only Codex Spark but also any Codex model over time.
Also Read: Anthropic Unveils Claude Opus 4.6 – Most Capable AI Model to Date
Developer-Centric Experiences and Performance
Codex-Spark proves its efficiency in accomplishing critical software development benchmarks, SWE-Bench Pro and Terminal-Bench 2.0, in much less time compared to the performance of the whole model, i.e., GPT-5.3-Codex, while also ensuring that users remain in the state of flow during their programming sessions.
The model’s focus on speed also means it defaults to minimal, targeted edits, offering lightweight responses that developers can immediately act on. Additional functionality such as running tests or deeper analysis can be invoked at the developer’s request.
Perspectives from Cerebras
“What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning.” – Sean Lie, CTO and Co-Founder of Cerebras
This collaboration highlights the potential of hardware-software co-design in pushing the boundaries of real-time AI interaction, particularly for complex coding tasks where developer flow and responsiveness are paramount.
Availability and Future Expansion
Beginning today, GPT-5.3-Codex-Spark is rolling out as a research preview for ChatGPT Pro users and is accessible within the latest versions of the Codex app, CLI, and VS Code extension. OpenAI is also making Codex-Spark available via the API to a select group of design partners to explore integration possibilities and gather early feedback. Broader availability and additional enhancements such as multimodal capabilities and expanded context support are expected as OpenAI scales deployment and refines the user experience.
Throughout the preview, Codex-Spark will maintain the same safety training and standards used across OpenAI’s model family. Safety and cyber-relevant safeguards have been evaluated to ensure the model does not reach high-capability thresholds in sensitive domains such as cybersecurity or biological risk applications.
What’s Next for Codex
Codex-Spark marks the first step toward a coding ecosystem that blends ultra-fast real-time interaction with deep, long-running reasoning and execution. As developers adopt this new paradigm, OpenAI anticipates future models will further merge these complementary modes enabling rapid back-and-forth feedback loops while also supporting extensive background processing of ambitious tasks.
By removing latency as a barrier to productivity, GPT-5.3-Codex-Spark promises to reshape how software engineers, product teams, and creators collaborate with AI unlocking faster innovation and smoother coding workflows than ever before.


