CI/CD 2 weeks

Task queues and event-driven processing

Problem

The client’s API server was doing far too much work inside the request/response cycle. When a user triggered a report export, the handler would synchronously query multiple database tables, aggregate rows, render a PDF, upload it to Cloud Storage, and email a download link — all before returning an HTTP response. On a good day this took eight to twelve seconds. On a bad day, with the database under load or Cloud Storage momentarily slow, it hit the gateway timeout at 30 seconds and the user saw a 504. They’d retry. The handler would run again from scratch, duplicating work, hammering the database a second time, occasionally producing two copies of the same report.

Webhook delivery had the same structural problem. Outbound webhooks to customer endpoints were fired inline, meaning a slow or unresponsive third-party server could hold a thread open for the full timeout window. The retry logic was bolted on top of the synchronous path — if the first attempt failed, the API would try again immediately, in the same request context, with no backoff. Data import jobs were the worst offender. CSV uploads from enterprise customers could contain tens of thousands of rows. The endpoint would parse, validate, and write every row before acknowledging the upload. Memory pressure was unpredictable; a particularly large file could push the API instance into OOM territory, killing in-flight requests from other users entirely. The team had tried bumping instance memory and increasing timeouts, but that just masked the root cause: synchronous processing has no place in a latency-sensitive request handler.

What we did

We started by drawing a clear boundary between work that must happen before a response and work that merely needs to happen eventually. Anything that crossed that boundary — report generation, bulk imports, outbound webhooks — became a task. The API handler’s only job became creating a well-formed task record, enqueuing it, and returning a 202 Accepted with a job ID the client could poll. We used Google Cloud Tasks for operations that needed explicit scheduling, rate limiting, or per-task retry configuration, and Pub/Sub for fan-out scenarios like webhook delivery where multiple subscribers might care about the same event.

The worker layer ran on Cloud Run, which gave us clean horizontal scaling tied directly to queue depth. Each worker type was its own service with its own scaling parameters — the report worker scaled conservatively because jobs were CPU-intensive, the webhook dispatcher scaled aggressively because jobs were mostly network I/O and fast. Dead-letter topics were configured on every queue with a threshold of five consecutive failures; anything that landed in the DLQ triggered a PagerDuty alert and was logged with full context for manual inspection. Terraform managed the entire queue topology — queue definitions, IAM bindings between Cloud Tasks and the Cloud Run service accounts, DLQ subscriptions, and the alerting policy on DLQ message count. We also added idempotency keys to every task payload so that workers could safely retry without producing duplicate side effects — a lesson learned from the old inline-retry behavior that had been creating duplicate reports.

Result

The immediate impact on API response times was dramatic. Endpoints that had been averaging eight to twelve seconds at the 95th percentile dropped to under 200 milliseconds once the heavy work was offloaded. Users got instant feedback that their job was queued, and a status endpoint let the frontend poll for completion. The perceived experience improved more than the raw numbers suggest, because a spinner with meaningful progress is far less frustrating than a blank screen with no indication of whether anything is happening.

Throughput scaled without requiring any changes to the API tier itself. During a load test we ran before go-live, we pushed ten times the previous peak request rate through the API layer and saw queue depth grow, workers spin up within seconds, and job completion times stay flat. Webhook reliability moved from 94% to 99.8% delivery within the first two weeks of operation. The remaining 0.2% were endpoints that were consistently unreachable, all of which surfaced in the DLQ with enough context to contact the affected customers. Before this architecture, those failures were invisible. The dead-letter queue turned unknown unknowns into known, actionable problems — and that change in operational visibility was arguably as valuable as the performance improvement.

Key highlights

API P95 response time dropped from 12s to 180ms
10x throughput increase without scaling the API tier
Dead-letter queue catches failures — zero silent drops
Webhook delivery success rate went from 94% to 99.8%

Tech stack

Cloud TasksPub/SubCloud RunTerraform

Have a similar challenge?

Book a call