TECH 2026-04-16 22 min

Opus + RAG vs Fine-tuned LLM + RAG: Two Approaches to Legal AI — LEX vs Harvey

Harvey spent $100M+ and 10B tokens fine-tuning a case law model with OpenAI. We connected Opus to 100M+ court decisions from EDRSR via RAG. Both paths work — but for different realities.

Opus + RAG vs Fine-tuned LLM + RAG: Two Approaches to Legal AI

Harvey spent $100M+ and trained a custom model on the entire US case law corpus. We connected Claude Opus to 100M+ court decisions from EDRSR via RAG. Both work. But these are fundamentally different engineering and business decisions.

When an ordinary AI startup from Ukraine applies to Google for Startups Cloud Program and receives a five-figure dollar grant — that's not luck. It's validation of the approach. Google saw the same thing we see: 100M+ court decisions, an open data corpus unmatched in scale anywhere in Europe, and a team that has already built a production RAG system on top of it. Google Cloud resources — TPU pods, compute credits, engineering support — are not charity. It's an investment in Ukraine's jurisdiction becoming the first proving ground for open-weight legal AI based on DeepSeek v3, trained on real data from a real legal system. Harvey spent $100M on a partnership with OpenAI for US case law. We're doing the same for Ukraine — with a grant from Google, an open model, and a corpus assembled from public registries.

Context: Why This Comparison Matters

Harvey AI is the most prominent legal AI company in the world. $5B+ valuation, 42% of the US top-100 law firms as clients, a partnership with OpenAI at the level of custom model training. Their approach is the industry benchmark.

LEX AI is a Ukrainian legal AI platform built on a fundamentally different architecture: a foundation model (Claude Opus) + RAG over the complete corpus of the Unified State Register of Court Decisions (EDRSR) — 100+ million documents.

Both systems solve the same problem: help a lawyer find relevant case law, analyze it, and apply it. But their architectural approaches are diametrically opposed.

Harvey's Approach: Fine-tuned LLM + RAG

Architecture

Harvey built a three-tier system:

1. Foundation Layer — GPT-4/GPT-5 as the base model, deployed on Azure

2. Domain Fine-tuning Layer — pre-training and post-training on 10 billion tokens of legal data:

The complete US case law corpus (starting with Delaware, then expanding nationwide)
Legal reasoning patterns
Specialized terminology and citation formats

3. Client Customization Layer — adaptation for specific firms:

Firm document templates
Style guides
Internal precedents

Search System

Separately from the model, Harvey built a custom retrieval system:

Voyage AI embeddings (voyage-law-2-harvey) — trained on 20B+ tokens of case law
Custom legal embeddings achieved 25% reduction in irrelevant results compared to generic embeddings
Hybrid search (vector + keyword)
Legal-specific preprocessing and postprocessing
Integration with LexisNexis for Shepardization (checking whether a precedent is still good law)

Results

97% — the rate at which lawyers in blind testing chose the fine-tuned model's response over GPT-4
0.2% hallucination rate (vs. 17-33% for generic models)
Every sentence backed by a citation to an actual case
Multi-model orchestration: different models for drafting, research, and jurisdiction-specific queries

Cost of This Approach

$100M+ in investment (Series C from Sequoia, Google Ventures, et al.)
Partnership with OpenAI at the level of custom model training
Team of 200+ engineers
Months of training and verification per iteration
Lock-in to a single jurisdiction (US case law) with enormous effort required to scale

LEX's Approach: Opus + RAG

Architecture

Our approach is fundamentally different — we don't train the model, we build infrastructure around it:

1. Foundation Model — Claude Opus (as-is, no fine-tuning)

1M context window
Strongest reasoning among publicly available models
Native understanding of Ukrainian language

2. RAG over the complete EDRSR corpus:

100+ million court decisions
Full-text search (PostgreSQL GIN indexes with 'simple' language for Cyrillic)
Semantic search (Qdrant + OpenAI embeddings)
Semantic Sectionizer — splits documents into logical sections (articles, parts, clauses)

3. MCP (Model Context Protocol) — structured interface between model and data:

QueryPlanner classifies intent and selects search strategy
DocumentService retrieves and caches documents
LegislationService handles legislation (understands "Article 124 of the Constitution")
EdsrFtsService — full-text search across the entire EDRSR

Search System

Lawyer's query
    │
    ▼
QueryPlanner (intent classification)
    │
    ├── Semantic Search (Qdrant)
    │   └── embeddings: text-embedding-ada-002
    │
    ├── Full-text Search (PostgreSQL)
    │   └── GIN indexes, 'simple' language config
    │
    └── Legislation Lookup (RADA API)
        └── intelligent sectioning
    │
    ▼
Context Assembly (relevant chunks)
    │
    ▼
Claude Opus (reasoning + generation)
    │
    ▼
Response with source citations

Results

Full coverage of Ukrainian jurisdiction (100M+ decisions — the entire EDRSR)
Citations with references to specific cases
Understanding of martial law context, mobilization, new legislation
Real-time corpus updates (new decisions enter the system automatically)
Legislation, registries, and parliamentary data in a single interface

Cost of This Approach

Team: 1 developer + Claude Code (735 commits in 25 days)
Zero model training costs
API costs: pay-per-use (Opus + embeddings)
Infrastructure: 1 prod server, Docker Compose, PostgreSQL + Qdrant
Time to production: weeks, not months

Comparison: What Actually Differs

1. Where Legal Knowledge Lives

A fine-tuned model "knows" jurisprudence at an intuitive level. It has seen millions of cases during training and developed patterns of legal reasoning. When a lawyer asks about piercing the corporate veil, the model doesn't just search — it "remembers" the key precedents.

Opus + RAG "knows" jurisprudence through context. The model receives relevant case fragments via RAG and applies its generic reasoning to analyze them. Opus doesn't "remember" case law — but it can read and analyze it better than any specialized model of smaller scale.

2. Hallucinations and Reliability

Harvey achieved a 0.2% hallucination rate through:

Fine-tuning on real cases (the model has "seen" them)
Post-processing with citation verification
Shepardization via LexisNexis

LEX minimizes hallucinations through:

Grounding — the model responds only based on provided context
Explicit instructions — the system prompt requires source citations
Verification — QueryPlanner checks that real documents were found
Constitutional constraints — the model is explicitly instructed not to draw conclusions beyond the provided data

3. Updatability

This is the biggest advantage of the RAG approach.

A fine-tuned model is a snapshot of the corpus at the time of training. A new Supreme Court decision handed down yesterday doesn't exist for the model until the next fine-tuning cycle (weeks to months).

A RAG system updates in real time. A decision entered into EDRSR this morning is available for search by tonight. For a jurisdiction under martial law, where new legislation appears every week, this is critical.

4. Scaling to New Jurisdictions

Harvey scales with difficulty: each new jurisdiction means a new cycle of data collection, training, and verification. US case law ≠ EU case law ≠ Ukrainian judicial practice. Reasoning patterns differ. Legal terminology differs. The hierarchy of sources differs.

RAG scales easily: connect a new document corpus, configure embeddings, update the search pipeline. We've already connected:

EDRSR (100M+ decisions)
Legislation via RADA API
OpenReyestr (business entity registry)
Parliamentary data (deputies, bills, votes)

5. Reasoning Customization

Fine-tuning lets you embed legal reasoning into the model:

The model "understands" legal argumentation
It can independently build chains of precedents
Less dependent on search quality

Prompt engineering + RAG lets you control reasoning:

Transparent logic (you can read the prompt)
Easy to change strategy (update the prompt, not retrain the model)
Constitutional constraints via RLHF principles in the prompt

Why We Chose RAG Over Fine-tuning

1. Economic Reality

Fine-tuning a legal model is a $10M+ project even for a minimum viable product. Harvey raised $100M+ and has a team of 200+ people. For the Ukrainian market, where the entire legal tech TAM is a fraction of what a single Am Law 100 firm earns, such investment makes no economic sense.

The RAG approach let us ship to production with a one-person team and a budget for API calls.

2. Iteration Speed

Fine-tuning cycle: collect data → clean → train → evaluate → deploy. Weeks to months.

RAG cycle: update the prompt → deploy. Minutes.

When the Grand Chamber of the Supreme Court adopts a new legal position that changes interpretation across an entire field — a RAG system adapts in hours, not months.

3. Foundation Model Quality

In 2023, when Harvey started fine-tuning, GPT-4 was the best model available, and its reasoning on legal tasks was "good but not sufficient." Fine-tuning made sense.

In 2026, Claude Opus has a 1M context window and reasoning that surpasses specialized models. The gap between "generic Opus + the right context" and "fine-tuned GPT + retrieval" has narrowed significantly. Foundation models have caught up with fine-tuned specialized models on reasoning quality — and continue improving with every release.

4. Ukrainian Jurisdiction

Ukrainian law is not common law. There is no stare decisis (binding precedent). Case law is advisory in nature. This means:

Precise precedent citation is less critical than in US law
Knowing current legislation + Supreme Court legal positions matters more
The corpus changes constantly (martial law, new statutes every week)
RAG with real-time updates is a perfect fit for this context

5. Transparency and Control

A fine-tuned model is a black box. You don't know why it generated a particular response. Which weights fired? Which cases did it "recall"?

RAG is transparent. You can see:

Which documents were found (search results)
What entered the context (retrieved chunks)
What the model received as input (prompt)
How it arrived at the answer (reasoning in output)

For a legal system where every response can affect a person's fate, transparency is not a nice-to-have — it's a requirement.

Where Fine-tuning Still Wins

Honesty demands acknowledgment: there are tasks where Harvey's fine-tuned model is objectively better:

1. Legal reasoning without context — when a lawyer asks a general legal question without a specific case, a fine-tuned model gives a better answer because it "knows" jurisprudence. RAG depends on search quality.

2. Chains of precedent — a fine-tuned model can independently build an argument through a series of related precedents because it "saw" those connections during training. RAG may miss a precedent if the search didn't find it.

3. Legal document stylistics — a model trained on millions of legal texts better mimics the style of legal writing. A generic model requires more prompt engineering.

4. Scale — when processing hundreds of contracts at once (due diligence), a fine-tuned model is more efficient because it doesn't need retrieval at every step.

The Future: Convergence of Approaches

The boundary between RAG and fine-tuning is blurring:

Harvey is building RAG on top of its fine-tuned model (their case law search is RAG)
We are exploring domain-specific embeddings (an analogue of voyage-law, but for Ukrainian jurisprudence)
Both are moving toward agentic workflows — multi-step systems where the model decides what to search for

The truth is that "fine-tuning vs RAG" is a false dichotomy. Harvey uses both fine-tuning and RAG. We use RAG and will be adding elements of domain adaptation (custom embeddings, constitutional RLHF).

The ultimate architecture for legal AI is a spectrum:

Pure RAG ←──────────────────────────────────→ Pure Fine-tuning
  │                                                    │
  LEX (Opus + EDRSR)            Harvey (custom GPT + RAG)
  │                                                    │
  Cheap, fast,                          Expensive, slow,
  transparent, updatable                deep, precise

The optimum for each jurisdiction, team, and budget lies somewhere between these poles.

LEX + Google + DeepSeek v3: Fine-tuning for Ukrainian Jurisdiction

We're not just comparing approaches — we're moving toward fine-tuning ourselves. LEX AI is working with Google on a task analogous to Harvey + OpenAI, but for Ukrainian law.

Why DeepSeek v3

DeepSeek v3 is an open-weight model with a Mixture-of-Experts architecture (671B parameters, 37B active per query). For fine-tuning on Ukrainian jurisdiction, it's the ideal foundation:

Open weights — full control over training, no API provider lock-in
MoE efficiency — inference cost is several times lower than dense models of comparable scale
Strong multilingual capabilities — quality Cyrillic and Ukrainian language support out of the box
Legal reasoning — baseline reasoning on par with GPT-4o, providing a high starting point for domain adaptation

What We're Training

The fine-tuning corpus: 100M+ court decisions from EDRSR, Ukrainian legislation, Supreme Court legal positions. This is the same dataset that currently lives in our RAG system — but instead of feeding it into context every time, we're embedding legal knowledge directly into the model weights.

Key directions:

Pre-training on the full EDRSR corpus — the model will "see" all of Ukraine's case law
Post-training on "lawyer query → quality response" pairs with legal annotators
Constitutional RLHF — reward signal based on the Constitution of Ukraine (described in our previous article)
Custom embeddings for Ukrainian legal text (analogous to Harvey's voyage-law-2-harvey)

Google's Role

Google Cloud provides training infrastructure: TPU pods for pre-training on hundreds of millions of documents, distributed training tools, and expertise in optimizing MoE models. The partnership enables us to do work that previously required a team of 200+ engineers.

How This Changes LEX

The final LEX architecture will be hybrid:

Lawyer's query
    │
    ▼
Fine-tuned DeepSeek v3 (legal reasoning in weights)
    +
RAG (current decisions, new legislation)
    +
Constitutional RLHF (ethical constraints)
    │
    ▼
Response with deep legal reasoning
+ current sources
+ constitutional guarantees

This is what Harvey built for US common law at $100M+ with OpenAI. We're building the same for Ukrainian jurisdiction with Google and DeepSeek — on open data, with an open model, for a market where access to justice is not a business metric but a matter of survival.

Conclusions

For Ukrainian legal tech in 2026, RAG + Opus is the right choice. Not because fine-tuning is bad. But because:

Foundation models have become smart enough for RAG to perform on par with fine-tuned specialized models
Ukrainian jurisdiction demands real-time updates that fine-tuning cannot provide
The economics of the Ukrainian market don't allow spending $100M on model training
RAG transparency is critical for a legal system where an error is not a bug but a human rights violation

Harvey took the right path for their context: US common law, 500B market, 100M in investment. We're taking the right path for ours: Ukrainian law, martial law, a team of one person and an AI partner.

Different realities — different architectures. But the goal is one: to make justice more accessible.

Sources:

Registration: legal.org.ua