LEX — AI Legal Platform for Law Firms

AI-powered legal analysis platform for law firms and corporate counsel.

Features

Resources

Blog Articles

Technology

Built on AWS (EC2, Bedrock Claude AI, ALB, WAF, S3, ACM, KMS). PostgreSQL, Redis, Qdrant vector database. TypeScript, React, Node.js.

Start free — 50 credits on registration. Sign up

TECH 8 min

Why We Ditched Round-Robin Between OpenAI and Anthropic

We integrated OpenAI and Anthropic with round-robin routing. On the architecture diagram it looked perfect. In production it nearly killed our product. The same prompt produced different results depending on the provider. Debugging a 5-step agentic cycle? That is not engineering — it is archaeology. We ripped it all out. Hardcoded a single provider. Best line of code all year.

Why We Ditched Round-Robin Between OpenAI and Anthropic — and What We Use Instead

Building a legal AI platform taught us: multi-provider LLM routing looks great on architecture diagrams but breaks in production.


The Idea That Made Perfect Sense

When we started building LEX AI — a platform for analyzing millions of Ukrainian court decisions — we did what every AI-first team does: integrated multiple LLM providers.

OpenAI for structured output. Anthropic for deep legal analysis. Round-robin between them for resilience and cost optimization.

On paper it looked elegant. In production it was a nightmare.

What Went Wrong

1. Response Format Fragmentation

Our agentic pipeline runs up to 5 iterations of tool-calling per user request. Each iteration expects a normalized response: tool_calls, finish_reason, structured JSON.

OpenAI and Anthropic return these differently. We built a normalization layer. It handled 90% of cases. The remaining 10% — empty responses, incomplete JSON, unexpected stop reasons — caused silent failures deep in the loop.

One bug took us 3 days to find: Anthropic occasionally returned a valid response with stop_reason: "end_turn" instead of "tool_use", which our normalizer passed through, but the next iteration treated as a final answer. The user got a half-baked analysis with zero indication that something went wrong.

2. One Prompt — Two Different Behaviors

Legal AI lives and dies by prompt precision. Our system prompt instructs the model to act as a Ukrainian legal assistant, classify intents, select tools, and respond in a structured format.

Claude followed Ukrainian-language instructions more accurately. GPT generated cleaner JSON tool calls. When the model changed on each iteration of the agentic cycle, the result quality became a coin flip.

3. Debugging Became Archaeology

When a user reported a bad result, we looked at the trace:

Which step broke? The model or the normalization? Can we reproduce? No — the next run routes differently.

4. The "Cost Optimization" That Wasn't

Round-robin was supposed to balance costs. Instead:

5. Two Sets of Everything

Each provider has its own: rate limits, retry strategies, error formats, SDK updates. Our "unified" retry layer was actually two retry layers in a trench coat.

What We Do Now

We switched to strategy-based provider selection with OpenAI as the primary and AWS Bedrock as the alternative — and invested the saved complexity into budget-aware model selection:

| Budget | OpenAI | AWS Bedrock | Use Case | |——–|——–|————-|———-| | quick | gpt-5-nano | Amazon Nova Micro | classification, routing | | standard | gpt-5-mini | Amazon Nova Lite | tool execution, summarization | | deep | gpt-5.1 | Amazon Nova Pro | legal analysis, pattern extraction |

The LLM_PROVIDER_STRATEGY variable controls selection: openai-first (default) or bedrock-first (if AWS credentials are available). One API format. One error handler. One retry logic. Predictable costs. Reproducible results.

How to Properly Use Multiple Providers

Task routing, not round-robin — assign each provider specific task types permanently.

Fallback, not alternation — Provider B activates only when Provider A returns 429 or 500.

Multi-key single provider — multiple API keys from a single provider with rotation to bypass rate limits.

Why AWS Bedrock Is a Game Changer

| | Direct API Key | AWS Bedrock | |—|—|—| | Models | Single provider | Claude + Llama + Mistral via one SDK | | Security | API key in .env | IAM roles, no keys in code | | Data | Goes to provider's cloud | Stays in your AWS region | | Billing | Separate invoices | Single AWS bill | | Rate limits | Hard, per-key | Provisioned Throughput |

The @deprecated tag on our getNextProvider() method is the best line of code we wrote all year.


Epilogue: March 2026

When we wrote this article, the Anthropic API fallback was a temporary solution. In March 2026 we finally closed this chapter: PR #722 replaced direct Anthropic API with AWS Bedrock.

What did this mean in practice? One SDK (@aws-sdk/client-bedrock-runtime) instead of two client libraries. IAM authentication instead of API key rotation. Data stays in eu-central-1 — our DPO finally stopped worrying. Single billing through AWS Cost Explorer instead of separate invoices from OpenAI and Anthropic.

The budget tiers we dreamed about now work through Bedrock: quick goes to Nova Micro, standard to Nova Lite, deep to Nova Pro. OpenAI remains the primary for the main pipeline, but the entire fallback chain is now on AWS.

Turns out, the decision to ditch round-robin was right not just tactically, but strategically. We didn't just pick a single provider — we chose an infrastructure platform that scales with the product. That @deprecated tag is still in the code. As a reminder.