Wednesday, April 2, 2025

Scale AI Partners with DoD’s Chief Digital and Artificial Intelligence Office to Test and Evaluate Large Language Models

Related stories

Hacking the Hackers: How GenAI is Predicting and Preventing Cyber Attacks

In the high-stakes arena of cybersecurity, the rules of...

Veltris Acquires BPK to Boost AI & Digital in Healthcare

Veltris, a digital product engineering services provider backed by...

Accenture & Schaeffler Advance Industrial Humanoid Robots

Accenture has joined forces with Schaeffler AG to reshape...

EDGNEX Data Centers partners with Hyperco

The acquisition is expected to further strengthen the expansion...

Chef Robotics Raises $43M Series A to Scale AI Robotics

Chef Robotics, a leader in AI-powered robotic systems for...
spot_imgspot_img

Scale AI, the leading test and evaluation (T&E) partner for frontier artificial intelligence companies, is partnering with the U.S. Department of Defense’s (DoD) Chief Digital and Artificial Intelligence Office (CDAO) to create a comprehensive T&E framework for the responsible use of large language models (LLMs) within the DoD.

Through this partnership, Scale will develop benchmark tests tailored to DoD use cases, integrate them into Scale’s T&E platform, and support CDAO’s T&E strategy for using LLMs. The outcomes will provide the CDAO a framework to deploy AI safely by measuring model performance, offering real-time feedback for warfighters, and creating specialized public sector evaluation sets to test AI models for military support applications, such as organizing the findings from after action reports.

This work will enable the DoD to mature its T&E policies to address generative AI by measuring and assessing quantitative data via benchmarking and assessing qualitative feedback from users. The evaluation metrics will help identify generative AI models that are ready to support military applications with accurate and relevant results using DoD terminology and knowledge bases.

Also Read: BigID Appoints New Chief Customer Officer for Next Phase of Market Leadership and Growth

The rigorous T&E process aims to enhance the robustness and resilience of AI systems in classified environments, enabling the adoption of LLM technology in secure environments.

Alexandr Wang, founder and CEO of Scale AI, emphasized Scale’s commitment to protecting the integrity of future AI applications for defense and solidifying the U.S.’s global leadership in the adoption of safe, secure, and trustworthy AI. “Testing and evaluating generative AI will help the DoD understand the strengths and limitations of the technology, so it can be deployed responsibly. Scale is honored to partner with the DoD on this framework,” said Wang.

For decades, T&E has been standard in product development across industries, ensuring products meet safety requirements for market readiness, but AI safety standards have yet to be codified. Scale’s methodology, published last summer, is the industry’s first comprehensive technical methodology for LLM T&E. Its adoption by the DoD reflects Scale’s commitment to understanding the opportunities and limitations of LLMs, mitigating risks, and meeting the unique needs of the military.

SOURCE: BusinessWire

Subscribe

- Never miss a story with notifications


    Latest stories

    spot_img