Articles on AI in law, legal technology, court decision analysis, fine-tuning LLMs on case law, and digital transformation of legal practice.

Few-Shot Degradation in Morphologically Rich Languages: Cross-Domain and Cross-Lingual Evidence from Ukrainian

Follow-up to our tokenizer fertility study. Five experiments across SIB-200, EU Acts (24 languages), and ULP datasets. Tokenizer fertility is domain-invariant (1.63x on news vs 1.60x on legal). Few-shot degradation is task-dependent, not language-intrinsic. Ukrainian costs 20-40% more to tokenize than cognate Slavic languages.

ACADEMIC 2026-05-14 15 min read (experiments in progress)

#Few-Shot Learning #Tokenizer #Ukrainian NLP #Cross-Lingual #SIB-200 #Slavic Languages

Persistent Memory Architecture for Long-Horizon Autonomous Missions with Operator Rotation

Cross-domain validation on three independent datasets confirms: context redundancy ~50-60% is systemic in autonomous agents; operator rotation costs +136% dialogue rounds (Hedges' g=0.81); context completeness predicts correction need (r=-0.60, g=1.73).

ACADEMIC 2026-05-14 PDF, 22 pages

#Проблеми програмування #БПЛА #Пам\ #, #, #, #,

Automatic Construction of a Legal Citation Graph from 100 Million Ukrainian Court Decisions: Large-Scale Extraction, Topological Analysis, and Ontology-Driven Clustering

Half a billion citation edges extracted from 100.7 million Ukrainian court decisions reveal that judicial citation structure encodes legal domain boundaries without supervision and predicts future legislative importance with near-perfect accuracy (AUC = 0.9984).

ACADEMIC 2026-05-13 30 min read (full paper)

#Citation Graph #Legal NLP #EDRSR #Ontology #Network Analysis

Edit-Trace Oversight: Scalable Alignment Signal from Agentic Workflows

Edit-traces from production agentic workflows produce alignment signal that is denser, more outcome-predictive, and distributionally unlike conventional RLHF preference data. 80.7% of edits are substantive rewrites; binary rejection correlates with 78% positive outcomes — the strongest oversight signal.

ACADEMIC 2026-05-11 45 min read (full paper)

#arXiv preprint #RLHF #Edit-Trace #Alignment #Agentic Workflows

From Ontology-Controlled Systems to Oversight-Controlled Training: Formal Foundations for Human–LLM Alignment Signal Validation

Ontology-based filtering of human oversight signal predicts downstream outcome quality: sessions classified as full oversight by a formal domain constitution exhibit 3-6x higher rejection rate, concentrating the most informative alignment action.

ACADEMIC 2026-05-11 40 min read (full paper)

#Cybernetics & Systems Analysis #Ontology #OWL 2 DL #Alignment #Formal Methods

Workflow Memory for Long-Horizon Agentic Composition: Architecture, Dual-Mode Retrieval, and Retrieval-Correction Signal

Sixty percent of context tokens in current LLM agentic sessions are wasted — redundant re-explanation of decisions already made in prior sessions. The key insight: the memory layer produces alignment data (retrieval-correction signal), not just consumes it.

ACADEMIC 2026-05-10 50 min read (full paper)

#arXiv preprint #Memory Architecture #Agentic AI #RAG #Oversight

Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study

Tokenizer fertility varies 1.6x across foundation models on Ukrainian legal text, yet this cost-critical dimension is absent from model selection practice. Qwen 3 consumes 60% more tokens than Llama-family; NVIDIA Nemotron Super 3 (120B) outperforms Mistral Large 3 at 1/3 the cost.

ACADEMIC 2026-05-10 35 min read (full paper)

#arXiv preprint #Tokenizer #Ukrainian NLP #Foundation Models #Legal AI

RAG Highlights, Training Orients: What to Do About Heterogeneity in Court Practice

A comment under the previous article nailed it: "the problem has shifted from access to practice to managing its heterogeneity." Precise framing. We break down why authority weights in RAG are only half the answer, what training your own model actually adds, and why production needs both layers.

TECH 2026-04-18 8 min

#RAG #DPO #MoE #ЄДРСР #Legal AI #ML Training

2 TB of Ukrainian Law + DeepSeek V3 860B on GCP: What We'd Get

We have ~1.5 TB of EDRSR with vectors + ~550 GB of registries, legislation, Spanish sources, and EU-Lex running in prod. If we push all of this through an MoE model the size of DeepSeek V3, scaled to 860B on TPU v5p — what comes out? We break down the dataset, architecture, compute cost, and model properties.

TECH 2026-04-18 9 min

#DeepSeek V3 #MoE #TPU v5p #GCP #ЄДРСР #ML Training

How We Vectorize 33.7M Ukrainian Court Decisions via Voyage AI

EDRSR is the open-access Unified State Register of all Ukrainian court decisions. 44M+ vectors in Qdrant, 14.3M civil cases already processed out of 33.7M. Here's the pipeline: chunking, concurrency, checkpoint/resume, a dedicated EC2 for Qdrant, and the cost math.

TECH 2026-04-18 7 min

#Voyage AI #Qdrant #ЄДРСР #RAG #Vector Search #PostgreSQL

SneakyPiper: 16.7M Entities, 31K Dark-Web Subjects, 30+ OSINT Sources in Production

Our OSINT product SneakyPiper.com runs due diligence for US businesses. Under the hood: 16.7M OpenSanctions entities, 31K AI-classified dark-web forum subjects, a live feed of ransomware victims and GitHub credential leaks. Here's what lives in production — by the numbers.

TECH 2026-04-17 10 min

#OSINT #Due Diligence #Sanctions #Dark Web #Open Data #Panoptic

ML Engineer Competencies We Look For: 9 Things We Want to See on the Resume

Google Cloud asks 5 questions before allocating GPUs. We break them down into 9 ML competencies — from LoRA on 70B and continued pre-training DeepSeek-V3 685B to RLHF with constitutional alignment and capacity planning for a $200K+ training run. Concrete examples from our real stack.

TECH 2026-04-17 12 min

#Machine Learning #LLM #Hiring #RLHF #Fine-tuning #Vertex AI

What We Delegate to Independent Developers: a PR Instead of an Interview, Claude Code Welcome

Concrete task buckets waiting for contributors: OpenData adapters, ML experiments, frontend, performance, tests. Our only "interview" is your first pull request. AI-assisted code is welcome — we write with Claude Code every day.

TECH 2026-04-17 8 min

#Open Source #Hiring #Community #Claude Code #Contributing

Open Doors: Looking for Independent AI/ML Engineers and Open-Source Contributors

LEX AI is opening its platform as open source. We welcome strong engineers — AI/ML, backend, data, frontend — to contribute or join the team. What's already open, who we're looking for, and how to get involved.

TECH 2026-04-17 6 min

#Open Source #Hiring #Community #AI/ML #Careers

Fast Builds in AWS: Moving CI/CD Runners to the Cloud and Saying Goodbye to Laptop OOM

Your laptop is not a 32-CPU machine. npm install competes with Docker for disk. TypeScript OOMs on a large monorepo, and Playwright cannot exploit parallelism. We break down how to move GitHub Actions runners to AWS — from c7g Spot to actions-runner-controller on EKS — and get a 3-5× build speedup without local hell.

TECH 2026-04-17 12 min

#AWS #CI/CD #GitHub Actions #DevOps #Performance

Opus + RAG vs Fine-tuned LLM + RAG: Two Approaches to Legal AI — LEX vs Harvey

Harvey spent $100M+ and 10B tokens fine-tuning a case law model with OpenAI. We connected Opus to 100M+ court decisions from EDRSR via RAG. Both paths work — but for different realities.

TECH 2026-04-16 22 min

#LLM #Fine-tuning #RAG #Claude Opus #Harvey AI #OpenAI #Google #DeepSeek #EDRSR #Legal AI

How I Made 1,200+ Commits in 50 Days: Claude Code as a Full Engineering Partner

800+ sessions, 10,000+ messages, 1,200+ commits, 328,000 lines of code, 40,000+ bash commands — and zero hired developers. Real usage statistics of 50 days of continuous work with Claude Code building a legal tech platform.

TECH 2026-04-12 15 min

#Claude Code #AI #Productivity #Startups #DevOps #MCP

AI Model Safety on Open Registries: Asimov's Laws as an Ethical Framework

How to ensure that a model with access to 50M+ records doesn't become a tool for pressuring the innocent? Asimov's Three Laws adapted for legal AI, threat scenarios, and architectural solutions for RLHF training on GCP.

LEGAL 2026-04-02 18 min

The Long Tail Problem in RLHF Training of a Legal AI Model

5 categories cover 90% of the EDRSR corpus. How Long Tail destroys RLHF, why the model becomes a "civilist," and what strategies we are implementing on GCP for $240K over 6 months.

TECH 2026-04-02 16 min

Constitution of Ukraine as Reward Signal: Constitutional RLHF

How Articles 3, 28, 32, 62 of the Constitution become reward functions in RLHF training. Presumption of innocence as a hardcoded rule, constitutional collisions, and a benchmark of 500+ scenarios.

LEGAL 2026-04-02 20 min

Experimental AI Court: Simulating Legal Proceedings Across All Instances

Three separate models — judge, prosecutor, advocate — with information isolation reproduce adversarial proceedings. Instance specialization, result trees, and adversarial training on GCP.

TECH 2026-04-02 22 min

LegalTech LLM Constitution: A Rulebook for Legal AI Models

30 articles, 9 sections, open license. Lex AI initiates an industry standard for LegalTech models — from presumption of innocence to wartime protections, with direct implementation in the reward model.

LEGAL 2026-04-02 24 min

Distributed Monolith: When Microservices Are Just a Monolith with Network Latency

3 services, 1 PostgreSQL, shared Redis, one docker-compose — and the illusion of independence. How to spot a distributed monolith in your own architecture, when it's actually useful, and when it's time for real separation.

TECH 2026-04-01 14 min

#Architecture #Microservices #Monolith #Scaling #DevOps

How We Sync 380M+ Records from 40+ Data Sources That Keep Crashing

Multi-IP import, automated scheduler, freshness monitoring, international expansion — data pipeline engineering for open data across 6 jurisdictions. From the first 404 to stable nightly updates of 110+ tables.

TECH 2026-03-28 15 min

#OpenData #Data Pipeline #DevOps #Моніторинг #API #PostgreSQL

CI/CD with Blue-Green Preview and Self-Healing Tests

How we built a pipeline that doesn't crash at 3 AM: blue-green with approval gate, prod safety guard, and 8 PRs in 3 hours to tame Vitest OOM.

TECH 2026-03-28 18 min

#CI/CD #Blue-Green #Vitest #GitHub Actions #DevOps

Analysis of Grand Chamber of the Supreme Court Case Law for March 2026: What the Review Missed

An in-depth analysis of 5 Grand Chamber of the Supreme Court cases and TCC fine rulings based on full decision texts and separate opinions of justices. Found factual errors, overlooked separate opinions by Justices Mazur, Pohribnyi, and Yemets, a key proportionality finding, and inaccuracies regarding party composition.

LEGAL 2026-03-28 20 min

#Судова практика #Велика Палата ВС #ТЦК #Земельне право #Газ #Прокурор

LEX AI Security: GDPR Audit, 10 Fixes, and 7 Layers of Protection

5 parallel white-hat agents audited the platform for GDPR and OWASP Top 10 compliance. Found 23 vulnerabilities — from SQL injection to Google Ads firing before consent. Fixed 10 critical issues in one session. Full security architecture breakdown: Cloudflare, TLS 1.3, CSP, rate limiting, WebAuthn, E2EE.

TECH 2026-03-26 15 min

#Security #GDPR #OWASP #Cloudflare #WebAuthn #E2EE

340 Million Records and 64 Tools: The Complete Data Map of LEX AI

EDRSR, sanctions, patents, attorneys, judges, legislation, parliament, registries — every open data source currently running in production. What we have, how to use it, and what's coming next.

TECH 2026-03-26 12 min

#OpenData #MCP #ЄДРСР #Sanctions #IP #Rada

86 Ready-Made Queries for LEX AI: One Per Tool

We compiled 66 queries, each activating a specific platform tool — from court decision search to trademark verification. Plus 20 complex queries using 2–3 tools simultaneously. All designed for minimal LLM usage — maximum precision, minimum cost.

TECH 2026-03-26 12 min

#MCP #Tools #Prompts #LegalTech #Реєстри #ЄДРСР

How AI Is Changing the Work of Ukrainian Lawyers in 2026

56 tools instead of 12 browser tabs. Semantic search across 45M decisions. Full-text analysis in seconds. Due diligence in one query. Not a replacement for a lawyer — an exoskeleton for their mind.

LEGAL 2026-03-24 10 min

#AI #LegalTech #Адвокат #Автоматизація

Entering the Spanish Market: How a Ukrainian LegalTech Platform Adapts to European Law

Importing Spanish legal data from BOE and CENDOJ. Geo-detection of locale. Automatic localization in 4 languages. New MCP tools for Spanish legislation. From Kyiv to Madrid — one codebase.

TECH 2026-03-24 8 min

#Spain #i18n #Expansion #EU #LegalTech

API for Developers: How to Integrate 56+ Legal MCP Tools into Your Product

6 documentation tabs: Overview, 56-tool catalog, authentication, code examples (curl/TS/Python/SSE), MCP client configs (Claude Desktop/Cursor/VS Code), pricing. From registration to first request — 5 minutes.

TECH 2026-03-24 9 min

#API #Documentation #MCP #Developer #Integration

Diia.Sign for Business: Technical Challenges of Government Service Integration

ECDSA + SHA256 for hashing. Redis key mismatch between start and verify. QR code and deep link. Business data updates on every login. 4 fixes in 24 hours. A real integration story — unfiltered.

TECH 2026-03-24 8 min

#Diia #Auth #Integration #ECDSA #Government

41.8 Million Records from Ukrainian State Registries — Now Available via AI

11 state registries from data.gov.ua imported into the platform: enforcement proceedings, debtors, notaries, bankruptcy, legal acts and more — all accessible to lawyers through AI chat.

TECH 2026-03-22 7 min

#OpenData #NAIS #MCP #data.gov.ua

Developer Platform: 56 Legal AI Tools via a Single API

We launched platform.legal.org.ua — a portal for developers who want to integrate legal AI into their products. API keys, usage analytics, documentation for 56 tools, examples for Python and TypeScript. MCP SSE, REST, batch — three transports to choose from. From signup to first request — 5 minutes.

TECH 2026-03-21 7 min

#API #DeveloperPlatform #MCP #Integration

AI for Military Lawyers: Searching 273K+ Decisions in Seconds

126,934 decisions under Art. 407 of the Criminal Code. 26,926 cases on draft evasion. 1,721 cassation rulings. Full-text search across 110M+ documents. Legislative texts in 2 seconds. Appeal chains. All on one platform.

LEGAL 2026-03-21 8 min

#MilitaryLaw #AI #CourtPractice #EDRSR #CriminalLaw

EDRSR: Data Pipeline for 60 Million Court Decisions

60 million full texts. 283 GB across 4 shards. Custom RTF parser with depth-tracking for Windows-1251 Cyrillic. Two-phase ETL with idempotent upsert via temp tables. Application-level sharding by doc_id with independent backup domains. PostgreSQL shared memory exhaustion and three layers of defense. All on open government data.

TECH 2026-03-12 15 min

#EDRSR #OpenData #PostgreSQL #DataPipeline #Python #Sharding

How We Cut Chat Latency: 7 Phases of Optimization

From 12 seconds to 2.8 — a story of how we transformed a slow legal chat into a tool that is a pleasure to use

TECH 2026-03-12 9 min

#Performance #Chat #SSE #Optimization

AWS Bedrock as LLM Provider: From OpenAI Fallback to Claude + Nova Pro

One SDK instead of two libraries. IAM instead of API keys. Data in the EU instead of the US. A single bill instead of two invoices. Here is how we moved the entire fallback layer to AWS Bedrock — and why it changed more than we expected.

TECH 2026-03-12 7 min

#AWS #Bedrock #LLM #CostOptimization

Debtors Registry and NBU Banks: New Tools for Due Diligence

LEX AI now checks counterparties in the Unified Debtors Registry and verifies banks through the NBU registry — automatically, in a single request. 18 registries instead of 16.

LEGAL 2026-03-12 5 min

#DueDiligence #Registry #Compliance #LegalTech

Server-Side Evidence Extraction: How We Moved Evidence Analysis to the Backend

The frontend parsed evidence from response text using regex — mobile Safari froze for a second. We moved evidence extraction to the backend, added an SSE evidence event, and now the client simply renders ready-made objects. Time to first evidence: from 2.1s to 0.8s.

TECH 2026-03-12 6 min

#Architecture #Evidence #SSE #Performance

From a Single Server to the Cloud: How We Scale legal.org.ua on Google Cloud

Cloud Run with autoscaling to zero. Cloud SQL with automatic backups. Qdrant on a dedicated VM. All infrastructure at $280-430/mo with the ability to scale from 10 to 10,000 users without architecture changes.

TECH 2026-03-08 11 min

#GCP #Cloud Run #Infrastructure #Scaling

Legal Consultation Marketplace: From the Unified Attorney Registry to Monobank Payments

Attorney verification via the Unified Attorney Registry (ERAU) in 2 seconds. 3-step onboarding. Consultation request with documents from the vault. Real-time chat between client and attorney. Escrow payment via Monobank. 10% platform commission. Full cycle — from "I need a lawyer" to a paid consultation.

TECH 2026-03-07 9 min

#Marketplace #LegalTech #Payments #ЄРАУ

MCP Tokens and Claude Desktop Integration: Legal AI on Your Desktop

One token. One command. 56 legal AI tools right in Claude Desktop. Court practice search, legislation analysis, counterparty verification — without opening a browser. Create a token in your profile, paste a command in the terminal, and LEX AI becomes an extension of your desktop.

LEGAL 2026-03-05 5 min

#MCP #ClaudeDesktop #Integration #Productivity

Why We Ditched Round-Robin Between OpenAI and Anthropic

We integrated OpenAI and Anthropic with round-robin routing. On the architecture diagram it looked perfect. In production it nearly killed our product. The same prompt produced different results depending on the provider. Debugging a 5-step agentic cycle? That is not engineering — it is archaeology. We ripped it all out. Hardcoded a single provider. Best line of code all year.

TECH 2026-02-28 8 min

#LLM #Architecture #OpenAI #AWS Bedrock

How We Built an MCP Server with 56 Tools for Legal AI

One endpoint. Three services. 58 MCP tools. Triple transport: stdio for Claude Desktop, HTTP REST for web apps, SSE for streaming. Every tool call goes through an 11-step pipeline with cost tracking at each stage. The number of tools will grow. The architecture does not care.

TECH 2026-02-25 10 min

#MCP #Architecture #TypeScript #BuildInPublic

Semantic Search Across 5,000+ Legislation Articles: Embeddings, Chunking, and Qdrant

Keywords find what you already know. Semantic search finds what you need. We split 12 Ukrainian codes into 5,191 articles, vectorized each one using VoyageAI embeddings, and now the query "liability for poor-quality repairs" finds articles that contain none of those words.

TECH 2026-02-20 7 min

#Embeddings #Qdrant #SemanticSearch #NLP

RAG for Legal Documents: HallucinationGuard and CitationValidator in Production

AI confidently cites non-existent articles and fabricates case numbers. In the legal domain, this is not just an error — it is malpractice. We built two layers of protection: HallucinationGuard verifies every claim, CitationValidator validates every citation. Zero tolerance for fabrication.

TECH 2026-02-15 7 min

#RAG #Hallucinations #LegalAI #Validation

From Monolith to MCP: How Model Context Protocol Transformed Our Architecture

We started as a REST API with 10 endpoints. Now we have 70 MCP tools across 3 services with triple transport. MCP gave us what REST could not: a standard way for AI to discover and use tools on its own. AI becomes the client, not you.

TECH 2026-02-10 6 min

#MCP #Migration #Architecture #REST

Authentication via Diia: How We Integrated National Digital Identity into a Legal Platform

A passport on your smartphone — now the key to legal AI. We integrated Diia.Signature for authentication: deep link on mobile, QR code on desktop, ECDSA + SHA256 for hashing, and lawyers verify their identity with the same app they use to show documents at checkpoints. No passwords. No registration. One tap — and you are in.

TECH 2026-02-05 7 min

#Diia #Auth #DigitalIdentity #Ukraine

MCP Connect: How We Connected Nextcloud, Google Drive, and 1,400+ Open Datasets to Legal AI

A lawyer stores contracts in Nextcloud, correspondence in Google Drive, and searches court practice in EDRSR. Three different systems, three different windows, zero connection between them. MCP Connect unifies everything in one interface: AI analyzes your contract from Nextcloud, finds relevant practice from EDRSR, and verifies the counterparty in registries — in a single request.

LEGAL 2026-01-30 6 min

#MCP #Nextcloud #OpenData #Integration

AI Will Not Replace Lawyers — But a Lawyer with AI Will Replace One Without

AI will not replace lawyers. But the lawyer across the street who uses AI? That is your real competition. Their practice analysis covers 300 cases instead of 30. Their due diligence checks 16 registries in 2 seconds. They are not billing fewer hours — they are billing the same hours for a dramatically better outcome.

LEGAL 2026-01-25 9 min

#LegalInnovation #FutureOfLaw #LawyersOfLinkedIn

Searching Court Decisions by Meaning, Not by Keywords

You search for "compensation for apartment flooding" and miss the case where the court writes about "tortious liability for property damage resulting from engineering infrastructure failure." Keywords find words. Semantic search finds meaning.

LEGAL 2026-01-20 5 min

#SemanticSearch #CourtPractice #LegalResearch

How AI Analyzes Millions of Court Decisions — and What It Means for Your Practice

A human reviews 30-40 decisions per session. AI processes 200-300 per minute. But it is not about speed — it is about completeness. When you see the full picture rather than a fragment, strategic decisions become qualitatively different.

LEGAL 2026-01-15 6 min

#AI #CourtPractice #BigData #LegalAnalytics

Due Diligence with AI: From Registries to Beneficiaries in a Single Request

Counterparty verification: 4 registry websites, 30 minutes of manual work, and you can still miss enforcement proceedings. Or: one request, 2 seconds, 18 registries, full picture — EDRPOU, founders, beneficiaries, debtors, enforcement proceedings, bankruptcy, NBU banks.

LEGAL 2026-01-10 5 min

#DueDiligence #Registry #Compliance #LegalTech

Confidentiality and AI: How We Protect Client Data on a Legal Platform

Lawyers cannot use ChatGPT for client matters — data ends up on OpenAI servers. We built a platform where every matter is isolated, every action is in an audit trail, legal holds block deletion, and GDPR is not a checkbox — it is architecture.

LEGAL 2026-01-05 6 min

#GDPR #DataPrivacy #Compliance #Security

LEX AI Blog