AI Lawyers Are Already Better Than Law Professors at Reasoning—Say Law Professors

Law professors preferred answers generated by artificial intelligence over answers written by fellow professors, according to a recent study led by Stanford University that examined how large language models perform on legal reasoning tasks.

In the study, 16 professors from 14 U.S. law schools—including Stanford, Yale, New York University, the University of Chicago, Georgetown, UCLA, and the University of Virginia—created 40 contract law questions covering legal doctrine, case law, hypotheticals, and policy issues. Researchers saw it as an ideal way to test the capabilities of modern AI.

“Large language models (LLMs) are increasingly promoted as educational tutors, yet most evaluations focus on domains with a single ground truth,” the researchers wrote. “Many disciplines, however, hinge on judgment: reasoning, weighing ambiguity, and reaching defensible conclusions. Law provides a sharp test.”

In 2,918 blinded comparisons, professors selected the answer they would rather give a student. Google’s Gemini 2.5 Pro won 75.92% of its matchups against human instructors, while the tech giant’s NotebookLM won 74.75% of the time, giving AI-generated results the nod over humans in roughly three-quarters of responses.

According to the researchers, to determine whether the results reflected a broader professional consensus, the researchers analyzed how often professors agreed when evaluating the same answer pairs.

“Observed agreement exceeded the level expected if judgments were entirely idiosyncratic, indicating that the LLMs’ success reflects alignment with common disciplinary criteria,” they wrote.

The study found that AI models also outperformed human instructors across multiple categories, including recall questions relating to case, code, or doctrine, hypotheticals, and policy discussions.

“To probe whether any LLM advantage might be driven by surface-level writing style rather than substantive content, we additionally engineered a set of lexico-syntactic features—answer length, structural organization, reasoning nuance, legal anchors, confidence tone, clarity, and pedagogical support—and tested how much of the preference pattern they could explain,” the study said.

AI-generated answers were also flagged as harmful less often than those written by professors, with Gemini recording a 3.41% harmfulness rate and NotebookLM 3.64%, compared with 12.06% for human instructors. In a separate analysis of additional models, Anthropic’s Claude Opus 4.7 ranked first, followed by OpenAI’s ChatGPT 5.4 and Gemini 2.5 Pro, while every AI model evaluated outperformed human instructors on average.

The researchers cautioned that the study did not measure whether the answers matched each professor's individual teaching preferences, leaving open the possibility that AI-generated responses were viewed as generally acceptable rather than tailored to any one instructor's approach.

“While LLM responses are generally preferred over those of human instructors, our evaluation setting does not allow us to directly measure the extent to which instructor preferences are satisfied,” the study said. “It is at least theoretically possible that LLMs, although generally delivering stronger responses, still generate answers that are merely viewed as “good enough.”

The study comes as courts, law firms, and law schools increasingly grapple with how artificial intelligence should be used in the legal profession.

In March, the Los Angeles Superior Court began testing AI tools to help judges manage growing caseloads, while law schools are adding AI training programs.

“The potential benefits of these new technologies as a force multiplier in the practice of law just can’t be ignored,” Mississippi College School of Law Dean John P. Anderson previously told Decrypt. “Whether our students plan to be litigators or transactional attorneys, their future employers will expect familiarity with these AI tools. We want the firms hiring our students to be confident that every MC Law grad is competent in AI technologies.

At the same time, however, law firms continue to confront cases undermined by hallucinations and other AI-generated errors. In April, Law firm Sullivan & Cromwell admitted to a U.S. bankruptcy court that a recent filing in a high-profile case contained fake citations generated by AI.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

AI Lawyers Are Already Better Than Law Professors at Reasoning—Say Law Professors

Selected Articles by Decrypt

Table of Contents

Related Articles