
Off-the-Shelf Large Language Models Are Unreliable Judges – Jonathan Choi (USC / WashU)
With the rapid rise of artificial intelligence, large language models (LLMs) are increasingly being considered for tasks once thought to be uniquely human—including legal interpretation. The idea of “AI judges” suggests appealing possibilities: consistent, fast, and ostensibly unbiased answers to legal questions. But how reliable are these models? Can their judgments truly be trusted? And do they withstand careful empirical scrutiny?
In this episode of the CLE Vlog Series, Prof. Jonathan Choi (University of Southern California & Washington University, St. Louis) joins Alessandro Tacconelli (ETH Zurich) to discuss his paper, “Off-the-Shelf Large Language Models Are Unreliable Judges.” Prof. Choi presents findings from a series of empirical experiments designed to test how well LLMs perform as legal interpreters. His results reveal that model judgments are highly sensitive to prompt phrasing, output processing methods, and training choices. Moreover, post-training adjustments in today’s most widely used models can push LLMs’ assessments far from empirically grounded predictions of language use. These insights raise serious questions about the credibility of LLMs in legal interpretation and cast doubt on their ability to capture the “ordinary meaning” of legal texts.
Paper Reference:
Jonathan Choi – University of Southern California / Washington University (St. Louis)
Large Language Models Are Unreliable Judges
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188865
Audio Credits for Trailer:
AllttA by AllttA
https://youtu.be/ZawLOcbQZ2w
Mais episódios de "Talking law and economics at ETH Zurich"



Não percas um episódio de “Talking law and economics at ETH Zurich” e subscrevê-lo na aplicação GetPodcast.








