OpenAI announces significant progress in its collaborations with the US Center for AI Standards and Innovation (CAISI) and the UK AI Security Institute (UK AISI) to enhance the security of AI systems. These partnerships reflect OpenAI’s commitment to developing and deploying AI that is both secure and useful, ensuring that advanced AI (AGI) benefits all of humanity.
OpenAI entered into voluntary agreements with CAISI and UK AISI early, believing frontier AI development must occur in close collaboration with allied governments possessing deep expertise in machine learning, national security, and metrology. The company reports concrete security improvements resulting from these collaborations, including:
-
Joint red-teaming of safeguards against biological misuse
-
End-to-end testing of products for security issues
-
Rapid feedback loops to resolve vulnerabilities
These improvements have strengthened safeguards in widely used AI products, elevated industry standards, increased AI adoption, and demonstrated how government and industry can partner to evaluate and improve AI security.
Collaborations with CAISI
For over a year, OpenAI has collaborated with CAISI to evaluate its models’ capabilities in cybersecurity, chemical-biological, and other domains relevant to national security. The partnership expanded recently to include emerging product security challenges and to red-team the security of OpenAI’s agentic AI systems.
In July, CAISI explored how external evaluators can help find and fix security vulnerabilities in agentic systems including OpenAI’s ChatGPT Agent product. An expert team at CAISI combined cybersecurity and AI agent security expertise to investigate new vulnerabilities. CAISI was given early access to ChatGPT Agent to build understanding of system architecture and later red-teamed the released system.
CAISI identified two novel security vulnerabilities in ChatGPT Agent that under certain circumstances could have permitted a sophisticated attacker to bypass security protections, remotely control computer systems accessible to the agent in that session, and impersonate the user on other websites.
Initially CAISI believed those vulnerabilities were unexploitable and therefore useless to attackers. Further analysis revealed a way to bypass OpenAI’s security protections by combining traditional cyber vulnerabilities with an AI agent hijacking attack.
CAISI’s proof-of-concept exploit chain successfully bypassed several AI-based security protections. The exploit carried a success rate of approximately 50%. The CAISI team used a multidisciplinary approach combining traditional software vulnerabilities with AI vulnerabilities. CAISI used OpenAI’s ChatGPT Agent itself to aid in the process of discovering these vulnerabilities.
OpenAI states that these attacks were immediately reported to the company and fixed within one business day. The collaboration with CAISI builds on OpenAI’s research and evaluation efforts. OpenAI asserts that finding these vulnerabilities required CAISI to innovate in chaining together multiple exploits and combining attacks, drawing on both cybersecurity and machine learning methods.
OpenAI confirms that these efforts benefit end users, and emphasizes that the intersection of AI agent security and traditional cybersecurity demands new best practices.
Collaborations with UK AISI
Working with UK AISI, OpenAI has red-teamed its safeguards against biological misuse as defined by OpenAI’s policies. This includes safeguards in both ChatGPT Agent and GPT-5. UK AISI received in-depth access to OpenAI systems supported by bespoke work to enable deeper customization and security testing. Components made available to UK AISI included:
-
Non-public prototypes of safeguard systems
-
“Helpful-only” model variants with certain guardrails removed
-
OpenAI’s internal policy guidance around biological misuse
-
Access to the chain of thought of OpenAI’s internal safety monitor models to better identify vulnerabilities
-
Selective disabling of certain mitigations and enforcement during testing
A multidisciplinary UK AISI team combined AI red-teaming techniques with biosecurity expertise to find universal jailbreaks against OpenAI’s biosecurity safeguards. The collaboration covered all aspects of deployed systems, from individual model responses through the full product experience.
Beginning in May, OpenAI and UK AISI held frequent ad-hoc meetings, approximately weekly, through to the release of ChatGPT Agent in July and GPT-5 in August. OpenAI created custom test configurations based on UK AISI feedback to enable better testing.
During this period UK AISI submitted more than a dozen detailed vulnerability reports. Some reports led to engineering-based product fixes, others to policy enforcement changes, and others to focused training to improve certain aspects of classifiers.
Also Read: Trustible & Carahsoft Partner to Deliver AI Governance to Government
OpenAI improved the robustness of its monitoring stack by testing it against universal jailbreaks identified by UK AISI. The end-to-end testing component proved especially beneficial. OpenAI made product configuration changes to address configuration vulnerabilities identified by UK AISI where malicious content could be input or exfiltrated without triggering moderation.
Close collaboration gave UK AISI access to non-public tools such as chains-of-thought of the monitor, non-public information about safeguard design, and temporary disabling of OpenAI’s account enforcement banning system. These resources allowed UK AISI to surface failures that real-world attackers with less access would find much more difficult to detect.
UK AISI’s assessment indicates that safeguards across the full moderation system were substantially strengthened over the course of the collaboration. Towards the end of the rapid iteration period, finding universal attacks that evaded the full moderation stack required sophisticated techniques and many moderation flags during their development. These conditions would likely trigger monitoring and result in banning from the platform for malicious actors.
These efforts complement internal and external stress‐testing already underway for biological safeguards, including 5,000+ hours of testing across three separate safeguards testing campaigns, and work with third-party partners.
Why it Matters
OpenAI believes that these collaborations represent some of the deepest public-private efforts to evaluate real-world frontier AI systems for security and misuse. Such collaborations enable the company to leverage government expertise while sharing frontier technology status with governments. The hands-on experience helps partner governments develop technically grounded best practices that accelerate both innovation and safety across the AI industry.
External expert analysis helps OpenAI discover issues it might otherwise miss and promotes accountability and trust. Ongoing collaborations provide deeper value than one-off evaluations.
The technical expertise contributed by CAISI and UK AISI has been critical to recent improvements in OpenAI’s safeguards and product security. Close technical partnerships with organizations having both resources and incentives to rigorously evaluate AI systems strengthen confidence in security.