OpenAI has announced a major partnership with Cerebras Systems, aimed at integrating ultra low-latency AI compute capacity into its global AI platform. The collaboration brings 750 megawatts (MW) of dedicated AI compute to OpenAI’s infrastructure, enhancing performance for real-time applications such as interactive chat, code generation, image creation, and AI agents.
Cerebras Systems, known for its purpose-built AI systems that consolidate compute, memory, and bandwidth on a single giant chip, will deliver low-latency inference capacity designed to accelerate long outputs from large AI models. By addressing bottlenecks found in conventional hardware, this partnership is expected to enable faster and more natural user experiences across OpenAI’s suite of services.
The phased integration will be carried out across OpenAI’s inference stack, allowing the company to expand these capabilities across diverse workloads through 2028.
“OpenAI’s compute strategy is to build a resilient portfolio that matches the right systems to the right workloads. Cerebras adds a dedicated low-latency inference solution to our platform. That means faster responses, more natural interactions, and a stronger foundation to scale real-time AI to many more people,” said Sachin Katti, Head of Infrastructure at OpenAI.
Also Read: NVIDIA Unveils Rubin AI Platform, Open Models & Autonomous Driving Innovations
From Cerebras’ perspective, this collaboration marks an important leap in how high-performance AI inference is delivered at scale. The company brings cutting-edge hardware designed to reduce latency and improve responsiveness for AI workloads that demand real-time feedback.
“We are delighted to partner with OpenAI, bringing the world’s leading AI models to the world’s fastest AI processor. Just as broadband transformed the internet, real-time inference will transform AI, enabling entirely new ways to build and interact with AI models,” said Andrew Feldman, Co-Founder and CEO of Cerebras.
The move is in line with overall industry trends for diversification of AI computational resources from traditional GPUs. OpenAI intends to achieve this with optimized silicon for inference to strike a balance between cost, performance, and scalability with its growing global customer base.
Once this new computing power becomes available, starting over the coming years, organizations utilizing OpenAI technology can look forward to optimization in the response time of valuable computing tasks involving complicated reasoning, interaction, and application orchestrations in real time, among others.


