TECH 22 min

Opus + RAG vs Fine-tuned LLM + RAG: Two Approaches to Legal AI — LEX vs Harvey

Harvey spent $100M+ and 10B tokens fine-tuning a case law model with OpenAI. We connected Opus to 100M+ court decisions from EDRSR via RAG. Both paths work — but for different realities.

Opus + RAG vs Fine-tuned LLM + RAG: Two Approaches to Legal AI

Harvey spent $100M+ and trained a custom model on the entire US case law corpus. We connected Claude Opus to 100M+ court decisions from EDRSR via RAG. Both work. But these are fundamentally different engineering and business decisions.

When an ordinary AI startup from Ukraine applies to Google for Startups Cloud Program and receives a five-figure dollar grant — that's not luck. It's validation of the approach. Google saw the same thing we see: 100M+ court decisions, an open data corpus unmatched in scale anywhere in Europe, and a team that has already built a production RAG system on top of it. Google Cloud resources — TPU pods, compute credits, engineering support — are not charity. It's an investment in Ukraine's jurisdiction becoming the first proving ground for open-weight legal AI based on DeepSeek v3, trained on real data from a real legal system. Harvey spent $100M on a partnership with OpenAI for US case law. We're doing the same for Ukraine — with a grant from Google, an open model, and a corpus assembled from public registries.


Context: Why This Comparison Matters

Harvey AI is the most prominent legal AI company in the world. $5B+ valuation, 42% of the US top-100 law firms as clients, a partnership with OpenAI at the level of custom model training. Their approach is the industry benchmark.

LEX AI is a Ukrainian legal AI platform built on a fundamentally different architecture: a foundation model (Claude Opus) + RAG over the complete corpus of the Unified State Register of Court Decisions (EDRSR) — 100+ million documents.

Both systems solve the same problem: help a lawyer find relevant case law, analyze it, and apply it. But their architectural approaches are diametrically opposed.


Harvey's Approach: Fine-tuned LLM + RAG

Architecture

Harvey built a three-tier system:

1. Foundation Layer — GPT-4/GPT-5 as the base model, deployed on Azure

2. Domain Fine-tuning Layer — pre-training and post-training on 10 billion tokens of legal data:

3. Client Customization Layer — adaptation for specific firms:

Search System

Separately from the model, Harvey built a custom retrieval system:

Results

Cost of This Approach


LEX's Approach: Opus + RAG

Architecture

Our approach is fundamentally different — we don't train the model, we build infrastructure around it:

1. Foundation Model — Claude Opus (as-is, no fine-tuning)

2. RAG over the complete EDRSR corpus:

3. MCP (Model Context Protocol) — structured interface between model and data:

Search System

Lawyer's query
    │
    ▼
QueryPlanner (intent classification)
    │
    ├── Semantic Search (Qdrant)
    │   └── embeddings: text-embedding-ada-002
    │
    ├── Full-text Search (PostgreSQL)
    │   └── GIN indexes, 'simple' language config
    │
    └── Legislation Lookup (RADA API)
        └── intelligent sectioning
    │
    ▼
Context Assembly (relevant chunks)
    │
    ▼
Claude Opus (reasoning + generation)
    │
    ▼
Response with source citations

Results

Cost of This Approach


Comparison: What Actually Differs

1. Where Legal Knowledge Lives

| | Harvey (Fine-tuned) | LEX (Opus + RAG) | |—|—|—| | In model weights | Yes — 10B tokens of case law baked into the model | No — the model is generic | | In retrieval | Yes — custom embeddings + search | Yes — Qdrant + PostgreSQL FTS | | In context | Partially — reasoning is already trained | Fully — everything via prompt |

A fine-tuned model "knows" jurisprudence at an intuitive level. It has seen millions of cases during training and developed patterns of legal reasoning. When a lawyer asks about piercing the corporate veil, the model doesn't just search — it "remembers" the key precedents.

Opus + RAG "knows" jurisprudence through context. The model receives relevant case fragments via RAG and applies its generic reasoning to analyze them. Opus doesn't "remember" case law — but it can read and analyze it better than any specialized model of smaller scale.

2. Hallucinations and Reliability

Harvey achieved a 0.2% hallucination rate through:

LEX minimizes hallucinations through:

3. Updatability

This is the biggest advantage of the RAG approach.

A fine-tuned model is a snapshot of the corpus at the time of training. A new Supreme Court decision handed down yesterday doesn't exist for the model until the next fine-tuning cycle (weeks to months).

A RAG system updates in real time. A decision entered into EDRSR this morning is available for search by tonight. For a jurisdiction under martial law, where new legislation appears every week, this is critical.

4. Scaling to New Jurisdictions

Harvey scales with difficulty: each new jurisdiction means a new cycle of data collection, training, and verification. US case law ≠ EU case law ≠ Ukrainian judicial practice. Reasoning patterns differ. Legal terminology differs. The hierarchy of sources differs.

RAG scales easily: connect a new document corpus, configure embeddings, update the search pipeline. We've already connected:

5. Reasoning Customization

Fine-tuning lets you embed legal reasoning into the model:

Prompt engineering + RAG lets you control reasoning:


Why We Chose RAG Over Fine-tuning

1. Economic Reality

Fine-tuning a legal model is a $10M+ project even for a minimum viable product. Harvey raised $100M+ and has a team of 200+ people. For the Ukrainian market, where the entire legal tech TAM is a fraction of what a single Am Law 100 firm earns, such investment makes no economic sense.

The RAG approach let us ship to production with a one-person team and a budget for API calls.

2. Iteration Speed

Fine-tuning cycle: collect data → clean → train → evaluate → deploy. Weeks to months.

RAG cycle: update the prompt → deploy. Minutes.

When the Grand Chamber of the Supreme Court adopts a new legal position that changes interpretation across an entire field — a RAG system adapts in hours, not months.

3. Foundation Model Quality

In 2023, when Harvey started fine-tuning, GPT-4 was the best model available, and its reasoning on legal tasks was "good but not sufficient." Fine-tuning made sense.

In 2026, Claude Opus has a 1M context window and reasoning that surpasses specialized models. The gap between "generic Opus + the right context" and "fine-tuned GPT + retrieval" has narrowed significantly. Foundation models have caught up with fine-tuned specialized models on reasoning quality — and continue improving with every release.

4. Ukrainian Jurisdiction

Ukrainian law is not common law. There is no stare decisis (binding precedent). Case law is advisory in nature. This means:

5. Transparency and Control

A fine-tuned model is a black box. You don't know why it generated a particular response. Which weights fired? Which cases did it "recall"?

RAG is transparent. You can see:

For a legal system where every response can affect a person's fate, transparency is not a nice-to-have — it's a requirement.


Where Fine-tuning Still Wins

Honesty demands acknowledgment: there are tasks where Harvey's fine-tuned model is objectively better:

1. Legal reasoning without context — when a lawyer asks a general legal question without a specific case, a fine-tuned model gives a better answer because it "knows" jurisprudence. RAG depends on search quality.

2. Chains of precedent — a fine-tuned model can independently build an argument through a series of related precedents because it "saw" those connections during training. RAG may miss a precedent if the search didn't find it.

3. Legal document stylistics — a model trained on millions of legal texts better mimics the style of legal writing. A generic model requires more prompt engineering.

4. Scale — when processing hundreds of contracts at once (due diligence), a fine-tuned model is more efficient because it doesn't need retrieval at every step.


The Future: Convergence of Approaches

The boundary between RAG and fine-tuning is blurring:

The truth is that "fine-tuning vs RAG" is a false dichotomy. Harvey uses both fine-tuning and RAG. We use RAG and will be adding elements of domain adaptation (custom embeddings, constitutional RLHF).

The ultimate architecture for legal AI is a spectrum:

Pure RAG ←──────────────────────────────────→ Pure Fine-tuning
  │                                                    │
  LEX (Opus + EDRSR)            Harvey (custom GPT + RAG)
  │                                                    │
  Cheap, fast,                          Expensive, slow,
  transparent, updatable                deep, precise

The optimum for each jurisdiction, team, and budget lies somewhere between these poles.


LEX + Google + DeepSeek v3: Fine-tuning for Ukrainian Jurisdiction

We're not just comparing approaches — we're moving toward fine-tuning ourselves. LEX AI is working with Google on a task analogous to Harvey + OpenAI, but for Ukrainian law.

Why DeepSeek v3

DeepSeek v3 is an open-weight model with a Mixture-of-Experts architecture (671B parameters, 37B active per query). For fine-tuning on Ukrainian jurisdiction, it's the ideal foundation:

What We're Training

The fine-tuning corpus: 100M+ court decisions from EDRSR, Ukrainian legislation, Supreme Court legal positions. This is the same dataset that currently lives in our RAG system — but instead of feeding it into context every time, we're embedding legal knowledge directly into the model weights.

Key directions:

Google's Role

Google Cloud provides training infrastructure: TPU pods for pre-training on hundreds of millions of documents, distributed training tools, and expertise in optimizing MoE models. The partnership enables us to do work that previously required a team of 200+ engineers.

How This Changes LEX

The final LEX architecture will be hybrid:

Lawyer's query
    │
    ▼
Fine-tuned DeepSeek v3 (legal reasoning in weights)
    +
RAG (current decisions, new legislation)
    +
Constitutional RLHF (ethical constraints)
    │
    ▼
Response with deep legal reasoning
+ current sources
+ constitutional guarantees

This is what Harvey built for US common law at $100M+ with OpenAI. We're building the same for Ukrainian jurisdiction with Google and DeepSeek — on open data, with an open model, for a market where access to justice is not a business metric but a matter of survival.


Conclusions

| Criterion | Harvey (Fine-tuned + RAG) | LEX (Opus + RAG) | |———-|—————————|——————-| | Reasoning quality | Embedded legal reasoning | Generic reasoning + context | | Hallucinations | 0.2% (verified) | Low (grounded RAG) | | Updatability | Weeks to months | Hours | | New jurisdictions | New training cycle | New document corpus | | Launch cost | 10M+ | 10K | | Transparency | Black box | Full transparency | | Time to production | Months | Weeks | | Reasoning customization | Via training (slow) | Via prompt (fast) |

For Ukrainian legal tech in 2026, RAG + Opus is the right choice. Not because fine-tuning is bad. But because:

  1. Foundation models have become smart enough for RAG to perform on par with fine-tuned specialized models
  2. Ukrainian jurisdiction demands real-time updates that fine-tuning cannot provide
  3. The economics of the Ukrainian market don't allow spending $100M on model training
  4. RAG transparency is critical for a legal system where an error is not a bug but a human rights violation

Harvey took the right path for their context: US common law, 500B market, 100M in investment. We're taking the right path for ours: Ukrainian law, martial law, a team of one person and an AI partner.

Different realities — different architectures. But the goal is one: to make justice more accessible.


Sources:


Registration: legal.org.ua