From a Single Server to the Cloud: How We Scale legal.org.ua on Google Cloud
Cloud Run with autoscaling to zero. Cloud SQL with automatic backups. Qdrant on a dedicated VM. All infrastructure at $280-430/mo with the ability to scale from 10 to 10,000 users without architecture changes.
From a Single Server to the Cloud: How We Scale legal.org.ua on Google Cloud
How we migrated a legal AI platform from Docker Compose on a single server to full-fledged cloud infrastructure with automatic scaling.
Why Migration Became Necessary
legal.org.ua is a platform for lawyers with AI analysis of court decisions, semantic search across legislation, and registries. Under the hood — 3 microservices, PostgreSQL, Redis, Qdrant (vector DB), MinIO, and a React frontend.
The initial infrastructure was a single VPS with Docker Compose. It worked for the MVP but created risks:
| Problem | Consequence | |———|————| | Single server | Server goes down = total downtime | | Fixed resources | Cannot scale under load | | Manual deploys | SSH → git pull → docker compose up | | Manual backups | Risk of data loss |
We needed infrastructure that scales automatically, has automatic backups, and costs reasonable money for a startup.
Choosing a Cloud: Why Google Cloud
We considered AWS, GCP, and Hetzner Cloud. We chose GCP for several reasons:
Cloud Run — the main argument. Serverless containers with pay-per-use pricing and the ability to scale to zero. For a legal platform with daytime traffic (lawyers work 9 to 6), this means we pay almost nothing at night and on weekends.
Cloud SQL — managed PostgreSQL with automatic backups, point-in-time recovery, and one-click vertical scaling.
Region europe-west1 (Belgium) — closest to Ukraine with the best pricing among European GCP regions.
Architecture: Hybrid Approach
The key decision — not everything in serverless. We split services by nature:
Cloudflare (DNS + CDN + WAF)
|
Cloud Load Balancer (HTTPS)
+———-+———-+
Cloud Run Cloud Run Cloud Run
(mcp_backend) (mcp_rada) (openreyestr)
+———-+———-+
+——-+——-+——-+——–+
Cloud SQL Memorystore GCE VM GCS
(PG 15) (Redis 7) (Qdrant) (files)
Stateless Services → Cloud Run
Our 4 backend services do not maintain state between requests — ideal candidates for Cloud Run:
| Service | What It Does | CPU | RAM | Autoscaling |
|———|————-|—–|—–|————-|
| mcp-backend | Court decisions, AI chat, 36 tools | 2 vCPU | 4 GiB | 1 → 4 instances |
| mcp-rada | Deputies, bills, voting | 1 vCPU | 1 GiB | 0 → 2 instances |
| mcp-openreyestr | State register, beneficiaries | 1 vCPU | 1 GiB | 0 → 2 instances |
| document-service | Document processing | 2 vCPU | 4 GiB | 0 → 3 instances |
Note the min instances: the main backend always has at least 1 instance (cold start is unacceptable for AI chat with SSE streaming), while auxiliary services scale to zero when nobody is using them.
Stateful Services → Managed or VM
- PostgreSQL → Cloud SQL (managed, automatic backups, point-in-time recovery)
- Redis → Memorystore (managed, sub-millisecond latency)
- Qdrant → GCE VM (no managed option, needs persistent storage)
- MinIO → GCS (Google Cloud Storage with S3-compatible API)
Networking: Security by Default
All infrastructure lives in a private VPC network. No service has a public IP except the Load Balancer.
VPC: secondlayer-vpc
+– services-subnet 10.0.0.0/20 (Cloud Run VPC Connector)
+– data-subnet 10.0.16.0/20 (Cloud SQL, Qdrant VM)
+– VPC Connector 10.8.0.0/28 (Cloud Run → private network)
Cloud NAT provides outbound internet for VMs without public IPs. IAP (Identity-Aware Proxy) — SSH access to VMs via Google authentication instead of an open port 22.
Firewall rules are simple: only internal traffic between subnets, SSH via IAP, and health checks from Google Load Balancer are allowed.
Cloud SQL: Two Instances
We deliberately split PostgreSQL into two instances:
secondlayer-main (db-custom-2-8192) — main backend and parliament data:
- Database
secondlayer_prod: court decisions, documents, AI analytics, users - Database
rada_prod: deputies, bills, voting
openreyestr-db (db-custom-1-4096) — State Register of legal entities:
- Pre-imported database with millions of records
- Read-heavy workload, rarely written
- Separate instance prevents lock contention with the main database
Both instances have:
- Private IP only (not accessible from the internet)
- Automatic nightly backups at 3:00
- Point-in-time recovery
max_connections=500(sufficient for Cloud Run with connection pooling)
Qdrant on a Dedicated VM
Qdrant is the vector database for semantic search. GCP has no managed option, so we deployed it on a separate VM:
- e2-standard-4 (4 vCPU, 16 GiB RAM) — sufficient for millions of vectors
- 100 GB persistent disk (pd-balanced) — data survives VM deletion
- Docker container with
–restart=always
Persistent disk is the key detail. Even if the VM crashes or needs an upgrade, data stays on the disk. We can change the VM type in 5 minutes without losing indexes.
GCS Instead of MinIO: Zero Code Changes
One of the most elegant decisions: Google Cloud Storage has an S3-compatible API. Our code uses the AWS S3 SDK to work with MinIO. For migration, we only changed the endpoint:
# Before (MinIO)
MINIO_ENDPOINT=minio-stage
MINIO_PORT=9000
# After (GCS)
MINIO_ENDPOINT=storage.googleapis.com
MINIO_PORT=443
MINIO_USE_SSL=true
Not a single line of code was changed. The same upload pipeline, the same presigned URLs, the same logic.
Secrets: Secret Manager Instead of .env Files
On the VPS, secrets lived in .env files. It works, but:
- The file could end up in git
- No audit of who accessed what when
- Key rotation = manual update on the server
GCP Secret Manager solves all three problems. Every secret has versions, access auditing, and integrates directly with Cloud Run via –set-secrets.
We created 12 secrets: OpenAI API keys, ZakonOnline tokens, JWT secret, database passwords, and others.
Cost: 280 to 430/mo
Full breakdown:
| Component | Specification | $/mo | |———–|————–|——| | Cloud Run (4 services) | Autoscaling | $76 | | Cloud SQL (2 instances) | PG 15, SSD, auto backups | $150 | | Memorystore Redis | 2 GiB, Basic | $50 | | GCE VM (Qdrant) | e2-standard-4, 100 GB disk | 105 | | GCS + CDN | ~50 GB of files | 8 | | Networking (LB, NAT, VPC) | | $33 | | Artifact Registry | Docker images | 3 | | Total | | ~430 |
Optimization to $280/mo
- Consolidate Cloud SQL — openreyestr as a separate database in the main instance: -$55
- 1-year commitment on Cloud SQL: -$37
- Spot VM for Qdrant (if restart is acceptable): -$60
Scaling Strategy
Horizontal (Automatic)
Cloud Run scales automatically by concurrency. When load increases — instances are added. When it drops — excess instances are shut down.
08:00 mcp-backend: 1 instance (quiet morning)
10:00 mcp-backend: 2 instances (workday)
14:00 mcp-backend: 4 instances (peak activity)
22:00 mcp-backend: 1 instance (evening)
02:00 mcp-rada: 0 instances (nobody searches for deputies at night)
Vertical (Manual, As Needed)
| Trigger | Action | |———|——–| | Cloud SQL CPU > 80% | Upgrade to db-custom-4-16384 | | Redis > 85% RAM | Resize to 4 GiB | | Qdrant VM > 80% RAM | Upgrade to e2-standard-8 |
What Changes as You Grow
10 → 100 users: current architecture handles it without changes.
100 → 1,000 users: add Cloud SQL read replica ($95/mo), increase Cloud Run max instances to 8.
1,000+ users: migrate to GKE Autopilot for more granular control, Qdrant cluster (3 nodes), Cloud SQL HA.
Frontend: GCS + Cloud CDN
React SPA (Vite build) is just static files. Instead of a Cloud Run container, we host them on GCS with Cloud CDN:
- Cost: ~1/mo (instead of ~15 for a Cloud Run container)
- Latency: files served from the nearest edge to the user
- Cache hit ratio: >95% for JS/CSS bundles
Cloudflare Stays
We did not replace Cloudflare with GCP Cloud Armor. Cloudflare remains the first layer of protection:
- Free WAF — protection from SQL injection, XSS
- DDoS protection — automatic attack absorption
- Edge caching — static assets served from the Kyiv PoP
- Origin CA — SSL certificate already configured
Cloudflare DNS A-record points to the Google Cloud Load Balancer IP. Traffic: user → Cloudflare edge → GCP LB → Cloud Run.
CI/CD: Automated Deployment
GitHub Actions workflow on merge to main:
- Build
packages/shared(shared types) - In parallel: build 4 Docker images → push to Artifact Registry
- Deploy each service to Cloud Run
gsutil rsyncthe frontend to GCS
Rollback is one command: Cloud Run lets you switch traffic to a previous revision in seconds.
What Is Next
This architecture is the foundation we build on. Next steps:
- Cloud Scheduler — automatically reduce min-instances at night
- Cloud SQL Insights — slow query monitoring
- Prometheus + Grafana on the Qdrant VM — custom metrics
- Workload Identity Federation — GitHub Actions without service account keys
The goal — infrastructure that scales with the product, rather than becoming its limitation.
If you are building a legal or any other SaaS on microservices — Cloud Run + Cloud SQL is an excellent start. You pay for what you actually use, not for idle servers.