AI + Infrastructure March 5, 2026 9 min read

AI Agents Need Infrastructure Context, Not Just Code

Your AI coding assistant is blind to production. Here's how to give it sight.

The gap between code and production

Modern AI coding tools are impressive in their domain: application code. They understand functions, classes, types, tests. But production systems are more than code. They're code running on specific infrastructure, with specific configurations, talking to specific databases through specific network paths.

When a production incident occurs, the question is rarely "what does this function do?" It's "why is this function failing right now, in this environment, with this data?" Answering that question requires context that lives outside the codebase: infrastructure state, recent deployments, runtime metrics, and distributed traces.

The three context layers AI agents need

To be genuinely useful for production operations, an AI agent needs three layers of context:

1. Infrastructure context (IaC): What services exist, how they're connected, what resources they use, how environments differ. This comes from your Terraform/Pulumi/CDK codebase and state files.

2. Change context (CI/CD): What changed recently, who changed it, what tests passed or failed, what the diff was. This comes from your deployment pipeline's event stream.

3. Runtime context (Observability): What's happening right now — request traces, error rates, latency distributions, resource utilization. This comes from your OpenTelemetry instrumentation and monitoring stack.

Most teams have bits and pieces of each layer, but they're not connected. The infrastructure is in Terraform but it's not linked to the CI/CD pipeline. The CI/CD pipeline produces deploy events but they're not correlated with traces. The traces exist but they don't reference infrastructure state.

Connecting these three layers is what turns an AI agent from a code assistant into an operations partner.

A connected example: from alert to resolution

Here's how this works in practice when the layers are connected:

1. An alert fires: "Error rate on /api/checkout exceeded 5% for the last 5 minutes."

2. The AI agent queries the observability layer: pulls recent error traces for /api/checkout. Finds that 90% of errors are timeouts on a call to the inventory-service.

3. The AI agent queries the change layer: checks the CI/CD pipeline for recent deploys. Finds that inventory-service was redeployed 12 minutes ago.

4. The AI agent queries the infrastructure layer: reads the Terraform state to understand the inventory-service's configuration. Notices the deploy changed the database connection pool size from 20 to 5 (a configuration error in the Terraform variables).

5. The AI agent produces a diagnosis: "The error rate spike on /api/checkout is caused by timeouts to inventory-service. The inventory-service was redeployed 12 minutes ago with a reduced database connection pool (20 to 5 connections). Under current traffic, this causes connection exhaustion and query timeouts. Recommended fix: revert the connection pool size in terraform/modules/inventory/variables.tf and re-apply."

Total time from alert to actionable diagnosis: under 2 minutes. No human had to open CloudWatch, no one had to search git logs, no one had to read Terraform diffs. The AI agent traversed the connected context layers and produced a specific, actionable answer.

Building the connected context layer

This isn't science fiction — the pieces exist today. The work is in connecting them:

- Export your Terraform state to a queryable format (S3 + a thin API, or tools like Spacelift/Env0) - Emit structured deploy events from your CI/CD pipeline (GitHub Actions outputs, webhook events) - Instrument your services with OpenTelemetry and export to a trace backend with an API (Tempo, Honeycomb) - Build an agent integration layer that can query all three sources and correlate events by timestamp and service name

The investment is measured in weeks, not months. And unlike hiring a platform team, the system gets better as your infrastructure grows — more services means more context for the agent, not more work for humans.

The competitive advantage is compounding

Teams that build this connected context layer today are gaining an advantage that compounds over time. Every incident resolved by the AI agent generates data that makes future resolutions faster. Every infrastructure change logged in Terraform makes the context richer. Every trace collected makes the debugging surface more complete.

Meanwhile, teams without this foundation are stuck in the same loop: alert fires, engineer wakes up, spends 45 minutes gathering context, finds the issue, fixes it manually, goes back to sleep. Repeat.

The infrastructure investment that makes AI agents effective isn't optional overhead. It's the highest-leverage technical investment a startup can make right now.

We build the infrastructure foundation that makes AI agents actually useful for operations. IaC, CI/CD, and observability — connected and AI-ready.

Book a call