🍔 Your Takeaways

  • Why asking "What's the best AI?" is the wrong question (and what to ask instead)

  • The workflow-specific model selection framework that matches tools to your actual legal work

  • Where context window matters more than accuracy scores in contract review

  • The hidden hallucination crisis in legal research (and the verification system that solves it)

  • How to use all the best AI Models without buying all the subscriptions

When we conduct audits for firms and in-house teams we are inevitably asked "What's the best AI for us?".

I know there's not a one-size-fits-all answer, but I certainly realize that's not obvious to those not in the weeds of AI day in, day out.

So in this week’s edition, I’m going to provide an AI model recommendation for 3 x common legal processes, as well as some implementation guidelines that we have learned through our work with clients. 

Liam Barnes

AI MODELS
1️⃣ Is there such a thing as a “best” legal AI?

You're in a partner meeting when someone asks, "Should we be using Harvey or Claude?"

Your ops team forwarded a Gemini demo invite yesterday. Someone from the executive committee wants to know which AI platform justifies the budget allocation.

You've sat through vendor demos. You've read the case studies. And you still don't have a decision framework that doesn't feel like guessing.

Here's the thing: you're asking the wrong question.

The same model that crushes contract review might fail spectacularly at legal research.

The "best" model for intake automation is completely different from the "best" for due diligence. In my experience, most firms optimize for the wrong variable—they chase the highest-scoring or even the most popular model instead of matching tools to specific workflows.

According to LegalBench benchmarks from VALS.ai updated through December 2025, an 87% accurate model solving 90% of your workflow beats a 90% accurate model solving 30% of it.

Let's start with the most commonly implemented workflow first…

CONTRACT REVIEW
Where Context Window Wins

Many firms who deal with lengthy contracts lose hours on contract review not because their AI lacks accuracy, but because it can't see the whole document at once.

Here's the pattern: You've got a 50-page MSA with exhibits and a complex indemnification schedule. 

You feed it into AI. The model chunks the document into sections to process separately. 

It misses how the indemnification clause in Section 12 connects to the liability caps in Section 8. 

You've spent hours reconciling disconnected outputs.

This tends to be the hidden problem many firms don't realize they're solving incorrectly.

Model

Context Window

Accuracy

Cost per 1M tokens

Best For

Gemini 3 Pro

1M tokens

87%

$1.25

Large, complex multi-exhibit contracts

Claude Sonnet 4.5

200K tokens

88%

$3.00

High-stakes multi-party deals

GPT-5.1

128K tokens

85%

$1.00

High-volume routine templates

My pick: Gemini 3 Pro

The key insight isn't raw accuracy scores. It's a context window (the amount of text the model can process at once without breaking it into smaller pieces).

Gemini 3 Pro's 1 million token context window often matters more than marginal performance differences.

Think of it like this: you can walk the entire document into the room at once instead of bringing pages one at a time. 

That context window means the model sees the whole contract, understands how indemnification connects to liability caps, and catches dependencies human reviewers sometimes miss in first passes.

Real-world results from hybrid AI plus human review: 98% accuracy when using proper verification systems, according to Concord.app deployment data.

Performance: 87% on LegalBench legal reasoning tasks, according to VALS.ai benchmarks from December 2025.

Safeguards:

Use RAG (Retrieval-Augmented Generation, which is AI with a direct line to verified sources instead of making stuff up from memory) with your firm's precedent contracts. Don't rely solely on the model's training knowledge of clause patterns.

Three-layer check: AI flags issues, cross-check against your clause database, then human partner review on anything flagged as high-risk.

Log every analysis with model version and flagged clauses for audit trail. This tends to become a competitive advantage when firms ask how you maintain consistency.

LEGAL RESEARCH
🔍 The Hallucination Problem You Need to Understand

Legal research carries higher risk than contract review. One hallucinated citation in a court filing can trigger sanctions.

Let me explain the corrected data clearly, because this distinction matters and resolves what seems like contradictory findings.

The Three Types of AI Legal Tasks:

When large language models (LLMs, which are basically sophisticated AI trained on massive amounts of text) are asked to recall specific legal facts from memory (like "Who wrote this opinion?" or "What's the holding?"), they can make stuff up - let’s just call it what it is. 

Models frequently invent answers rather than admit uncertainty (Tip: good prompting and safeguards can help protect against this).

When you give an LLM guardrails so that they only retrieve cases from verified databases using RAG, they tend to hallucinate less often versus when they are recalling their general knowledge.

When models reason about information you provide in the query (like "Here's the case facts and law, what's the analysis?"), accuracy hits 87% according to VALS.ai LegalBench benchmarks.

The key insight: Hallucinations tend to be more likely to happen when you ask AI to remember. Accuracy increases when you ask AI to reason with facts you provide.

Process tends to matter as much as model selection.

Model winner: Perplexity (with real-time web search)

This isn't a pure LLM. It's a hybrid retrieval plus generation system. Every citation includes a visible link to the original source. You can verify immediately instead of discovering hallucinations during discovery.

Why this matters: for legal research specifically, verifiable sources often beat raw model accuracy.

Tool/Model

Hallucination Rate

Verifiability

Cost

Best For

Perplexity

Varies (shows sources)

High (live links)

$20/month per user

Initial research with source verification

Gemini 3 Pro Deep Think

13% (reasoning tasks)

Medium

$1.25 per 1M tokens

Complex legal reasoning with provided materials

Lexis+ AI / Westlaw

17-34%

High (database-backed)

$300-500/month per user

Comprehensive precedent research

Generic LLMs (recall)

69-88%

Low

$0.50-3.00 per 1M tokens

Not recommended for legal research

Safeguards:

For EVERY citation from any model: Verify against original source (Google Scholar, official court database, jurisdiction-specific system) before using in client work.

Check not just existence: Verify it's still "good law," applies to your jurisdiction, and hasn't been overruled. Even verified sources can retrieve inapplicable authority.

Build monthly hallucination tracking system. Use this data to identify which types of queries your chosen model struggles with most.

CLIENT INTAKE
🕴️ The Right Balance of Accuracy, Empathy, and Cost

Client intake has lower legal risk than research or contracts, but getting it right matters. 

Poor classification delays cases. Missed urgency signals can blow deadlines. And an inappropriate tone can set the client relationship back before you barely started.

The challenge isn't finding the fastest model—any AI is exponentially faster than manual intake. 

What matters is balancing practice area classification accuracy, urgency detection, client communication quality, and cost efficiency for high-volume scenarios.

Model winner: GPT-5.1

For most firms, GPT-5.1 hits the sweet spot. 85-92% accuracy on practice area classification is often sufficient when downstream human review catches errors. 

It excels at urgency detection (reliably flagging "tomorrow," "arrested," "emergency" scenarios), and at the lowest cost per interaction, it makes economic sense for high-volume intake.

Alternative: Claude Sonnet 4.5 tends to be better for practices where client communication quality matters more than cost—think family law, immigration, or criminal defense where distressed callers need empathetic, nuanced responses.

Model

Practice Area Accuracy

Urgency Detection

Tone/Empathy

Cost per 1M tokens

Best For

GPT-5.1

85-92%

Excellent

Good

$1.00

High-volume intake, cost efficiency

Claude Sonnet 4.5

89-93%

Very Good

Excellent

$3.00

Distressed clients, empathy-critical practices

Gemini 3 Pro

86-91%

Good

Good

$1.25

Balanced approach

Safeguards:

Set confidence thresholds: Route low-confidence classifications to paralegal review rather than auto-assigning to attorney.

Log misclassifications monthly: Use patterns to retrain intake prompts and improve accuracy over time.

Real-world impact: legal intake AI implementations report 35% higher client conversion rates when using AI classification, according to LawPractice.ai and golawhustle.com legal intake AI analysis from November 2025.

HOW TO USE ALL THE BEST MODELS
🏡 The Secret to Using All The Best Models (Without the Chaos)

You might be thinking: "Does this mean I need a bunch of different subscriptions? Do my lawyers need to log into Gemini for contracts, Perplexity for research, and GPT for intake?"

Absolutely not. That would be a productivity nightmare.

The most sophisticated firms aren't asking their lawyers to toggle between tools. They are building unified workflows that route the work to the right model automatically.

Here is how it works in practice:

Imagine a single "Contract Review" portal on your firm's intranet.

  1. An attorney uploads the PDF.

  2. The system automatically sends it to Gemini 3 Pro because it detects a 50-page document (leveraging that context window).

  3. It then routes specific flagged clauses to a reasoning model for a second opinion.

  4. The final output appears in the attorney's inbox or DMS.

The lawyer never touches the models directly. They just touch the process.

Why this approach wins:

  • Simplicity: Your team uses one interface, not five.

  • Cost Control: You pay for the specific API calls you use, often 10x cheaper than per-seat enterprise licenses.

  • Automation: You aren't just swapping a human for a chatbot; you are automating the handoffs, the logging, and the filing—speeding up the entire cycle, not just the reading part.

This is the difference between "buying AI tools" and "building AI capabilities." One adds complexity to your day; the other removes it.

When we help firms and in-house teams implement this, we don't start by buying subscriptions. We start by mapping the workflow, then we wire the right models behind the scenes so the technology feels invisible.

Related Legal AI News:

  • Aderant Enters Strategic Partnership with Harvey Read more

  • AI Firms Face Legal Wrath of Copyright Holders Read more

  • Australian Lawyers Face Regulators After AI Hallucinations Read more

🛠️ 10 Second Explainers - AI Tools & Tech

  • Context Window: The amount of text a model can process at once without breaking it into smaller pieces. Larger windows mean the AI sees more connections across documents, like reading an entire contract versus reading it page by page.

  • Hallucination: When AI invents information that isn't true, like citing cases that don't exist or misrepresenting holdings. Occurs most frequently when models try to recall facts from memory rather than retrieve from verified sources.

  • Reasoning vs Retrieval: Reasoning is when AI analyzes information you provide to it (high accuracy at 87%). Retrieval is when AI fetches information from a database.

"The greatest impact of AI on the law will not be in simply automating or replacing tasks currently undertaken by human lawyers. It will be in delivering outcomes through entirely new channels."

Richard Susskind, Author & Legal Futurist
READER POLL

How do you currently choose which AI model to use for different legal workflows?

A) We use one model for everything
B) We match different models to different workflows
C) We're still evaluating and haven't committed to any models yet
D) We rely on whatever our enterprise vendor provides
E) We don't use AI for legal work yet

[Reply with your letter choice] - I'll share the results in the next edition.

My Final Take…

The firms and in-house teams making model selection decisions deliberately (not just defaulting to the most expensive option) will inevitably get better outcomes.

But the real winners won't just be the ones picking the right models. They will be the ones who make those models invisible.

By building workflows that route work to the best tool automatically, you stop asking lawyers to be "prompt engineers" and let them go back to being lawyers—just faster and more accurate ones.

Of course, getting there requires context. When we build this for clients, we look at your existing tech stack, your budget, and exactly where your current process is bleeding time.

The key is treating AI not as a software purchase, but as a workflow design challenge.

— Liam Barnes

Need help building these invisible workflows?

We help legal teams move beyond "chatbots" to build automated systems that match the right tool to the right task.

Grab some time to chat

(if you don’t see a suitable time, just shoot me an email [email protected])

How Did We Do?

Your feedback shapes what comes next.
Let us know if this edition hit the mark or missed.

Too vague? Too detailed? Too long? Too Short? Too pink?

Was this week’s newsletter forwarded to you?

Sign up, it’s free.