Monolith to extracted services
Problem
The client’s platform had grown organically over five years into a single Node.js monolith handling everything from user authentication to report generation to real-time event processing. The report generation module was the critical failure point: it was CPU-intensive, occasionally memory-hungry, and during peak hours it would saturate the process pool and cause latency spikes across completely unrelated features like login and dashboard loads. Every part of the system shared the same deployment unit, so a fix to a billing bug required re-testing and re-deploying the entire application.
Deploys had become a weekly ritual with a dedicated war-room slot. The team had accumulated a checklist of manual verification steps because previous deploys had caused cascading failures in unexpected places — a change to the report scheduler had once taken down the authentication flow because they shared an in-process queue. Engineers had started treating the deploy process as inherently dangerous, which created a self-reinforcing cycle: fewer deploys meant bigger batches, bigger batches meant higher risk, higher risk meant fewer deploys. Feature work was outpacing the team’s confidence in shipping it.
What we did
We applied the strangler fig pattern rather than proposing a rewrite. The goal was to extract the report generation module into a standalone Cloud Run service while keeping the monolith fully operational throughout — no flag days, no cutover weekends. We started by mapping every call site inside the monolith that touched report generation: direct function calls, shared database writes, in-process event emissions. This gave us a clear boundary for the API contract we needed to define before writing a single line of new infrastructure.
The extracted service communicated with the monolith through two channels: a synchronous REST API for request/response workflows where the caller needed an immediate result, and a Pub/Sub topic for fire-and-forget jobs where the monolith could enqueue a report request and move on. Cloud Run’s concurrency model let us configure the service to handle CPU-bound work without stealing resources from the rest of the platform — we set conservative concurrency limits and let Cloud Run scale out horizontally under load rather than trying to tune a shared process pool. Terraform managed the full infrastructure definition, which meant the entire extraction — Cloud Run service, Pub/Sub topics and subscriptions, IAM bindings, and VPC connector config — was reviewable as code and reproducible across environments. We ran the old and new code paths in parallel during a two-week shadow phase, comparing outputs before fully cutting over traffic.
Result
The extraction went live without a maintenance window. Because we had kept the monolith intact throughout and used the strangler fig to route traffic incrementally, there was no moment of high risk — just a gradual shift in the routing layer until the old code path was handling zero traffic and could be deleted. The team removed roughly 4,000 lines from the monolith in the cleanup pass, which was more satisfying than they expected.
Deploy frequency increased across the board almost immediately, and not just for the extracted service. Once teams saw that changes to report generation no longer required a full monolith deploy, they started questioning other implicit coupling in their release process. The report service itself went from deploying once a week to multiple times per day because the risk surface was narrow and the deployment was fast — a Cloud Run deploy with no dependencies on other teams’ code. During the next peak load event, the extracted service scaled from two to eleven instances under Pub/Sub backpressure without any manual intervention and without any visible latency impact on the monolith. The on-call team noted it was the first peak period in over a year where they did not have to manually intervene in the deployment or restart any processes.
Key highlights
- Deploy frequency increased from weekly to 5+ per day
- Extracted service handles 3x peak load independently
- Zero downtime during extraction — strangler fig pattern
- Reduced blast radius of failures to single service
Tech stack
Have a similar challenge?
Book a call