AI Insights - Ep.3: Rethinking AI Performance Metrics

26/03/2026

Cisco Podcast Network

0:00

27:26

In the latest episode of the Cisco AI Insights podcast, hosts Rafael Herrera and Sonia Marques are joined by Dr. Catarina Carvalho, a Cisco leader in machine learning engineering. Together, they unpack the complex academic paper " Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following," developed by researchers from the University of Maryland and the University of Waterloo. As the industry moves toward more reliable multimodal models, traditional pass-or-fail evaluation is no longer sufficient. This paper introduces a hierarchical framework that uses "LLM-as-a-judge" to evaluate outputs across five distinct criteria: visual grounding, logical coherence, factuality, reflection, and conciseness. Dr. Carvalho guides the discussion through the nuances of this "judge of judges" approach, exploring why human alignment remains the gold standard even as we automate evaluation processes. A special thank you to the teams at both The University of Waterloo and The University of Maryland, College Park, for developing this month's paper. If you are interested in reading the paper yourself, please visit this link: https://arxiv.org/pdf/2511.21662.