Opus + RAG vs Fine-tuned LLM + RAG: Two Approaches to Legal AI — LEX vs Harvey
Harvey spent $100M+ and 10B tokens fine-tuning a case law model with OpenAI. We connected Opus to 100M+ court decisions from EDRSR via RAG. Both paths work — but for different realities.
Opus + RAG vs Fine-tuned LLM + RAG: Two Approaches to Legal AI
Harvey spent $100M+ and trained a custom model on the entire US case law corpus. We connected Claude Opus to 100M+ court decisions from EDRSR via RAG. Both work. But these are fundamentally different engineering and business decisions.
When an ordinary AI startup from Ukraine applies to Google for Startups Cloud Program and receives a five-figure dollar grant — that's not luck. It's validation of the approach. Google saw the same thing we see: 100M+ court decisions, an open data corpus unmatched in scale anywhere in Europe, and a team that has already built a production RAG system on top of it. Google Cloud resources — TPU pods, compute credits, engineering support — are not charity. It's an investment in Ukraine's jurisdiction becoming the first proving ground for open-weight legal AI based on DeepSeek v3, trained on real data from a real legal system. Harvey spent $100M on a partnership with OpenAI for US case law. We're doing the same for Ukraine — with a grant from Google, an open model, and a corpus assembled from public registries.
Context: Why This Comparison Matters
Harvey AI is the most prominent legal AI company in the world. $5B+ valuation, 42% of the US top-100 law firms as clients, a partnership with OpenAI at the level of custom model training. Their approach is the industry benchmark.
LEX AI is a Ukrainian legal AI platform built on a fundamentally different architecture: a foundation model (Claude Opus) + RAG over the complete corpus of the Unified State Register of Court Decisions (EDRSR) — 100+ million documents.
Both systems solve the same problem: help a lawyer find relevant case law, analyze it, and apply it. But their architectural approaches are diametrically opposed.
Harvey's Approach: Fine-tuned LLM + RAG
Architecture
Harvey built a three-tier system:
1. Foundation Layer — GPT-4/GPT-5 as the base model, deployed on Azure
2. Domain Fine-tuning Layer — pre-training and post-training on 10 billion tokens of legal data:
- The complete US case law corpus (starting with Delaware, then expanding nationwide)
- Legal reasoning patterns
- Specialized terminology and citation formats
3. Client Customization Layer — adaptation for specific firms:
- Firm document templates
- Style guides
- Internal precedents
Search System
Separately from the model, Harvey built a custom retrieval system:
- Voyage AI embeddings (
voyage-law-2-harvey) — trained on 20B+ tokens of case law - Custom legal embeddings achieved 25% reduction in irrelevant results compared to generic embeddings
- Hybrid search (vector + keyword)
- Legal-specific preprocessing and postprocessing
- Integration with LexisNexis for Shepardization (checking whether a precedent is still good law)
Results
- 97% — the rate at which lawyers in blind testing chose the fine-tuned model's response over GPT-4
- 0.2% hallucination rate (vs. 17-33% for generic models)
- Every sentence backed by a citation to an actual case
- Multi-model orchestration: different models for drafting, research, and jurisdiction-specific queries
Cost of This Approach
- $100M+ in investment (Series C from Sequoia, Google Ventures, et al.)
- Partnership with OpenAI at the level of custom model training
- Team of 200+ engineers
- Months of training and verification per iteration
- Lock-in to a single jurisdiction (US case law) with enormous effort required to scale
LEX's Approach: Opus + RAG
Architecture
Our approach is fundamentally different — we don't train the model, we build infrastructure around it:
1. Foundation Model — Claude Opus (as-is, no fine-tuning)
- 1M context window
- Strongest reasoning among publicly available models
- Native understanding of Ukrainian language
2. RAG over the complete EDRSR corpus:
- 100+ million court decisions
- Full-text search (PostgreSQL GIN indexes with
'simple'language for Cyrillic) - Semantic search (Qdrant + OpenAI embeddings)
- Semantic Sectionizer — splits documents into logical sections (articles, parts, clauses)
3. MCP (Model Context Protocol) — structured interface between model and data:
- QueryPlanner classifies intent and selects search strategy
- DocumentService retrieves and caches documents
- LegislationService handles legislation (understands "Article 124 of the Constitution")
- EdsrFtsService — full-text search across the entire EDRSR
Search System
Lawyer's query
│
▼
QueryPlanner (intent classification)
│
├── Semantic Search (Qdrant)
│ └── embeddings: text-embedding-ada-002
│
├── Full-text Search (PostgreSQL)
│ └── GIN indexes, 'simple' language config
│
└── Legislation Lookup (RADA API)
└── intelligent sectioning
│
▼
Context Assembly (relevant chunks)
│
▼
Claude Opus (reasoning + generation)
│
▼
Response with source citations
Results
- Full coverage of Ukrainian jurisdiction (100M+ decisions — the entire EDRSR)
- Citations with references to specific cases
- Understanding of martial law context, mobilization, new legislation
- Real-time corpus updates (new decisions enter the system automatically)
- Legislation, registries, and parliamentary data in a single interface
Cost of This Approach
- Team: 1 developer + Claude Code (735 commits in 25 days)
- Zero model training costs
- API costs: pay-per-use (Opus + embeddings)
- Infrastructure: 1 prod server, Docker Compose, PostgreSQL + Qdrant
- Time to production: weeks, not months
Comparison: What Actually Differs
1. Where Legal Knowledge Lives
| | Harvey (Fine-tuned) | LEX (Opus + RAG) | |—|—|—| | In model weights | Yes — 10B tokens of case law baked into the model | No — the model is generic | | In retrieval | Yes — custom embeddings + search | Yes — Qdrant + PostgreSQL FTS | | In context | Partially — reasoning is already trained | Fully — everything via prompt |
A fine-tuned model "knows" jurisprudence at an intuitive level. It has seen millions of cases during training and developed patterns of legal reasoning. When a lawyer asks about piercing the corporate veil, the model doesn't just search — it "remembers" the key precedents.
Opus + RAG "knows" jurisprudence through context. The model receives relevant case fragments via RAG and applies its generic reasoning to analyze them. Opus doesn't "remember" case law — but it can read and analyze it better than any specialized model of smaller scale.
2. Hallucinations and Reliability
Harvey achieved a 0.2% hallucination rate through:
- Fine-tuning on real cases (the model has "seen" them)
- Post-processing with citation verification
- Shepardization via LexisNexis
LEX minimizes hallucinations through:
- Grounding — the model responds only based on provided context
- Explicit instructions — the system prompt requires source citations
- Verification — QueryPlanner checks that real documents were found
- Constitutional constraints — the model is explicitly instructed not to draw conclusions beyond the provided data
3. Updatability
This is the biggest advantage of the RAG approach.
A fine-tuned model is a snapshot of the corpus at the time of training. A new Supreme Court decision handed down yesterday doesn't exist for the model until the next fine-tuning cycle (weeks to months).
A RAG system updates in real time. A decision entered into EDRSR this morning is available for search by tonight. For a jurisdiction under martial law, where new legislation appears every week, this is critical.
4. Scaling to New Jurisdictions
Harvey scales with difficulty: each new jurisdiction means a new cycle of data collection, training, and verification. US case law ≠ EU case law ≠ Ukrainian judicial practice. Reasoning patterns differ. Legal terminology differs. The hierarchy of sources differs.
RAG scales easily: connect a new document corpus, configure embeddings, update the search pipeline. We've already connected:
- EDRSR (100M+ decisions)
- Legislation via RADA API
- OpenReyestr (business entity registry)
- Parliamentary data (deputies, bills, votes)
5. Reasoning Customization
Fine-tuning lets you embed legal reasoning into the model:
- The model "understands" legal argumentation
- It can independently build chains of precedents
- Less dependent on search quality
Prompt engineering + RAG lets you control reasoning:
- Transparent logic (you can read the prompt)
- Easy to change strategy (update the prompt, not retrain the model)
- Constitutional constraints via RLHF principles in the prompt
Why We Chose RAG Over Fine-tuning
1. Economic Reality
Fine-tuning a legal model is a $10M+ project even for a minimum viable product. Harvey raised $100M+ and has a team of 200+ people. For the Ukrainian market, where the entire legal tech TAM is a fraction of what a single Am Law 100 firm earns, such investment makes no economic sense.
The RAG approach let us ship to production with a one-person team and a budget for API calls.
2. Iteration Speed
Fine-tuning cycle: collect data → clean → train → evaluate → deploy. Weeks to months.
RAG cycle: update the prompt → deploy. Minutes.
When the Grand Chamber of the Supreme Court adopts a new legal position that changes interpretation across an entire field — a RAG system adapts in hours, not months.
3. Foundation Model Quality
In 2023, when Harvey started fine-tuning, GPT-4 was the best model available, and its reasoning on legal tasks was "good but not sufficient." Fine-tuning made sense.
In 2026, Claude Opus has a 1M context window and reasoning that surpasses specialized models. The gap between "generic Opus + the right context" and "fine-tuned GPT + retrieval" has narrowed significantly. Foundation models have caught up with fine-tuned specialized models on reasoning quality — and continue improving with every release.
4. Ukrainian Jurisdiction
Ukrainian law is not common law. There is no stare decisis (binding precedent). Case law is advisory in nature. This means:
- Precise precedent citation is less critical than in US law
- Knowing current legislation + Supreme Court legal positions matters more
- The corpus changes constantly (martial law, new statutes every week)
- RAG with real-time updates is a perfect fit for this context
5. Transparency and Control
A fine-tuned model is a black box. You don't know why it generated a particular response. Which weights fired? Which cases did it "recall"?
RAG is transparent. You can see:
- Which documents were found (search results)
- What entered the context (retrieved chunks)
- What the model received as input (prompt)
- How it arrived at the answer (reasoning in output)
For a legal system where every response can affect a person's fate, transparency is not a nice-to-have — it's a requirement.
Where Fine-tuning Still Wins
Honesty demands acknowledgment: there are tasks where Harvey's fine-tuned model is objectively better:
1. Legal reasoning without context — when a lawyer asks a general legal question without a specific case, a fine-tuned model gives a better answer because it "knows" jurisprudence. RAG depends on search quality.
2. Chains of precedent — a fine-tuned model can independently build an argument through a series of related precedents because it "saw" those connections during training. RAG may miss a precedent if the search didn't find it.
3. Legal document stylistics — a model trained on millions of legal texts better mimics the style of legal writing. A generic model requires more prompt engineering.
4. Scale — when processing hundreds of contracts at once (due diligence), a fine-tuned model is more efficient because it doesn't need retrieval at every step.
The Future: Convergence of Approaches
The boundary between RAG and fine-tuning is blurring:
- Harvey is building RAG on top of its fine-tuned model (their case law search is RAG)
- We are exploring domain-specific embeddings (an analogue of voyage-law, but for Ukrainian jurisprudence)
- Both are moving toward agentic workflows — multi-step systems where the model decides what to search for
The truth is that "fine-tuning vs RAG" is a false dichotomy. Harvey uses both fine-tuning and RAG. We use RAG and will be adding elements of domain adaptation (custom embeddings, constitutional RLHF).
The ultimate architecture for legal AI is a spectrum:
Pure RAG ←──────────────────────────────────→ Pure Fine-tuning
│ │
LEX (Opus + EDRSR) Harvey (custom GPT + RAG)
│ │
Cheap, fast, Expensive, slow,
transparent, updatable deep, precise
The optimum for each jurisdiction, team, and budget lies somewhere between these poles.
LEX + Google + DeepSeek v3: Fine-tuning for Ukrainian Jurisdiction
We're not just comparing approaches — we're moving toward fine-tuning ourselves. LEX AI is working with Google on a task analogous to Harvey + OpenAI, but for Ukrainian law.
Why DeepSeek v3
DeepSeek v3 is an open-weight model with a Mixture-of-Experts architecture (671B parameters, 37B active per query). For fine-tuning on Ukrainian jurisdiction, it's the ideal foundation:
- Open weights — full control over training, no API provider lock-in
- MoE efficiency — inference cost is several times lower than dense models of comparable scale
- Strong multilingual capabilities — quality Cyrillic and Ukrainian language support out of the box
- Legal reasoning — baseline reasoning on par with GPT-4o, providing a high starting point for domain adaptation
What We're Training
The fine-tuning corpus: 100M+ court decisions from EDRSR, Ukrainian legislation, Supreme Court legal positions. This is the same dataset that currently lives in our RAG system — but instead of feeding it into context every time, we're embedding legal knowledge directly into the model weights.
Key directions:
- Pre-training on the full EDRSR corpus — the model will "see" all of Ukraine's case law
- Post-training on "lawyer query → quality response" pairs with legal annotators
- Constitutional RLHF — reward signal based on the Constitution of Ukraine (described in our previous article)
- Custom embeddings for Ukrainian legal text (analogous to Harvey's voyage-law-2-harvey)
Google's Role
Google Cloud provides training infrastructure: TPU pods for pre-training on hundreds of millions of documents, distributed training tools, and expertise in optimizing MoE models. The partnership enables us to do work that previously required a team of 200+ engineers.
How This Changes LEX
The final LEX architecture will be hybrid:
Lawyer's query
│
▼
Fine-tuned DeepSeek v3 (legal reasoning in weights)
+
RAG (current decisions, new legislation)
+
Constitutional RLHF (ethical constraints)
│
▼
Response with deep legal reasoning
+ current sources
+ constitutional guarantees
This is what Harvey built for US common law at $100M+ with OpenAI. We're building the same for Ukrainian jurisdiction with Google and DeepSeek — on open data, with an open model, for a market where access to justice is not a business metric but a matter of survival.
Conclusions
| Criterion | Harvey (Fine-tuned + RAG) | LEX (Opus + RAG) | |———-|—————————|——————-| | Reasoning quality | Embedded legal reasoning | Generic reasoning + context | | Hallucinations | 0.2% (verified) | Low (grounded RAG) | | Updatability | Weeks to months | Hours | | New jurisdictions | New training cycle | New document corpus | | Launch cost | 10M+ | 10K | | Transparency | Black box | Full transparency | | Time to production | Months | Weeks | | Reasoning customization | Via training (slow) | Via prompt (fast) |
For Ukrainian legal tech in 2026, RAG + Opus is the right choice. Not because fine-tuning is bad. But because:
- Foundation models have become smart enough for RAG to perform on par with fine-tuned specialized models
- Ukrainian jurisdiction demands real-time updates that fine-tuning cannot provide
- The economics of the Ukrainian market don't allow spending $100M on model training
- RAG transparency is critical for a legal system where an error is not a bug but a human rights violation
Harvey took the right path for their context: US common law, 500B market, 100M in investment. We're taking the right path for ours: Ukrainian law, martial law, a team of one person and an AI partner.
Different realities — different architectures. But the goal is one: to make justice more accessible.
Sources:
- Customizing models for legal professionals — OpenAI
- Harvey AI's $5B Legal Fine-Tuning Case Study
- How Harvey Built Trust in Legal AI — Medium
- Harvey makes lawyers more efficient with Azure AI — Microsoft
Registration: legal.org.ua