Tuesday, March 18, 2025

Patronus AI Debuts Multimodal LLM-as-a-Judge for Images

Related stories

TTEC Digital & Verint Partner for CX Automation on Cloud

TTEC Holdings Inc., a leading global CX (customer experience)...

H2O.ai & VAST Data Unlock Insights with Agentic AI

Integrated solution merges generative and predictive AI at exabyte...

DDN IndustrySync: Custom AI Solutions for Enterprises

DDN, the global leader in AI and data intelligence...

Google Cloud Unveils AI, Skills & Credits for UK: ‘Gemini for the United Kingdom’

Google Cloud and Google DeepMind CEOs headline "Gemini for...

TDK Unveils edgeRX: Future of Machine Health Monitoring

TDK Corporation proudly introduces TDK SensEI edgeRX, a groundbreaking...
spot_imgspot_img

E-commerce giant Etsy already leveraging technology to reduce AI hallucinations in product image captions

Patronus AI announced the launch of the industry’s first Multimodal LLM-as-a-Judge (MLLM-as-a-Judge), a groundbreaking evaluation capability that enables developers to score and optimize multimodal AI systems for image-to-text applications.

The new Judge-Image tool, powered by Google Gemini, allows AI engineers to iteratively measure and improve the quality of their multimodal AI applications by scanning for text presence, grid structure, spatial orientation, and object identification.

“Our mission has always been to advance scalable oversight of AI,” said Anand Kannappan, CEO and Co-founder of Patronus AI. “With the release of GPT-4o, Claude Opus, and Google’s Gemini over the last year, organizations have invested heavily in image generation to drive customer value. However, as these AI experiences scale, so does the unpredictability of LLM systems. Our MLLM-as-a-Judge addresses this critical challenge by providing transparent, reliable evaluation of multimodal systems.”

Also Read: Accenture Acquires Halfspace to Boost AI in the Nordics

The Judge-Image tool offers several out-of-box evaluation criteria, including:

  • Caption hallucination detection (standard and strict)
  • Primary and non-primary object description verification
  • Object location accuracy

Beyond validating image caption correctness, Judge-Image can test OCR extraction accuracy for tabular data, AI-generated brand asset accuracy, and scene description validity.

Prior research suggests that Google Gemini can serve as a more reliable MLLM judge compared to alternatives like OpenAI’s GPT-4V, exhibiting less egocentricity and a more equitable approach to judgment. Patronus AI‘s internal evaluation datasets confirmed that the Gemini backbone performed better compared to other multimodal LLMs.

Source: PRNewswire

Subscribe

- Never miss a story with notifications


    Latest stories

    spot_img