Server-Side Evidence Extraction: How We Moved Evidence Analysis to the Backend
The frontend parsed evidence from response text using regex — mobile Safari froze for a second. We moved evidence extraction to the backend, added an SSE evidence event, and now the client simply renders ready-made objects. Time to first evidence: from 2.1s to 0.8s.
Server-Side Evidence Extraction: How We Moved Evidence Analysis to the Backend
When client-side parsing could no longer keep up — we moved evidence processing where it belongs.
The Problem
LEX AI returns more than just text to the user. Every response contains evidence: fragments of court decisions, legislation articles, document excerpts. Previously, this entire stream arrived as a single text block, and the frontend had to parse it into structured cards on its own.
On desktop, this worked acceptably. On mobile devices — it did not.
Symptoms we observed:
| Problem | Cause | |—|—| | UI freezes for 300-800 ms | Parsing large responses blocked the main thread | | Incorrect evidence highlighting | Regex heuristics did not cover all formats | | Logic duplication | Each client (web, mobile, MCP) wrote its own parser | | Degradation at scale | More evidence = slower rendering |
When a response contained 15-20 pieces of evidence (a typical situation for court practice analysis), mobile Safari simply froze for a second. Users noticed.
The Architectural Decision
Instead of optimizing the client-side parser, we reframed the question: why parse on the client at all what the backend already knows?
When ChatService calls tools (search_court_decisions, get_legislation_section, vault_search), it receives structured data. Then the LLM generates a text response, and the client tries to extract the same structure back from the text. This is a redundant cycle.
Solution: the backend extracts evidence during response generation and sends them as separate SSE events.
Data Flow: Before and After
Before:
Backend: LLM generates text with evidence mixed in
-> SSE: answer (one large block)
-> Frontend: regex parsing, card construction
-> Render
Now:
Backend: LLM generates text
-> EvidenceExtractor classifies tool_result
-> SSE: evidence { type, title, source, content, relevance_score }
-> SSE: answer (clean text without embedded evidence)
-> Frontend: render ready-made objects
SSE Protocol
We extended the existing SSE stream with a new evidence event. The full set of events now looks like this:
| Event | Purpose | Payload | |—|—|—| | thinking | Processing indicator | { stage: string } | | tool_result | Tool call result | { tool, result, cost } | | evidence | Structured evidence | { type, title, source, content, relevance_score } | | answer | Text fragment of response | { delta: string } | | complete | Stream completion | { total_cost, evidence_count } |
The evidence object has strict typing:
interface EvidenceBlock {
type: 'court_decision' | 'legislation' | 'document' | 'legal_position';
title: string;
source: string;
content: string;
relevance_score: number;
}
The relevance_score field (0-1) allows the frontend to sort evidence by relevance and collapse less important items by default.
Backend Evidence Extraction
EvidenceExtractor operates at the tool_result processing stage. When ChatService receives a result from a tool, it passes it to the extractor before the LLM begins generating the final response.
For classification (court_decision vs legislation vs document), we use an LLM at the quick-model level (gpt-4o-mini). This adds 50-100 ms per piece of evidence but saves significantly more on the client and guarantees correct classification.
The critical point: extraction happens in parallel with response generation. While the LLM writes text, evidence is already flying to the client. The user sees cards in the EvidencePanel even before the text response is complete.
Fallback Mechanism
We did not remove the client-side parser. It remains as a fallback:
if (receivedEvidenceEvents.length > 0) {
// Use server-side evidence
renderStructuredEvidence(receivedEvidenceEvents);
} else {
// Fallback: parse from response text
const extracted = parseEvidenceFromText(fullAnswer);
renderStructuredEvidence(extracted);
}
This protects against three scenarios: the backend is not yet updated (gradual deploy), the extractor crashed with an error, the connection broke mid-stream and evidence events were lost.
Results
| Metric | Before | After | |—|—|—| | Time to first evidence in UI | 2.1 sec | 0.8 sec | | Main thread blocking (mobile) | 300-800 ms | < 50 ms | | Classification accuracy | ~82% | ~96% | | Client bundle size | baseline | -4 KB (removed regex patterns) |
The biggest gain is on mobile. UI jank virtually disappeared because the frontend no longer does heavy parsing. EvidencePanel simply renders ready-made objects.
Conclusions
This migration confirmed a principle we follow at LEX AI: data should be structured as close to the source as possible. The backend knows what it returned from the tool. Forcing the client to guess from text is architectural debt that we finally closed.
The fallback layer makes the migration safe: even if server-side extraction is temporarily unavailable, the user will see evidence. Just a bit slower.