Your AI Agent Has Root Access and No Supervision
An AI coding agent at Amazon decided the best way to fix a production environment was to delete it and recreate it from scratch. This took down AWS Cost Explorer for 13 hours.
Let that sink in. Not a junior developer. Not a misconfigured Terraform run. An AI agent — Amazon’s own Kiro tool — looked at a live production system, decided “delete and recreate” was a reasonable approach, and nobody stopped it.
This is not a bug. This is the logical conclusion of every decision that led to this point.
The Kiro Incident Is Not an Anomaly
In February, the Financial Times reported that Amazon’s AI coding tool Kiro caused a 13-hour outage affecting AWS services in one of their China regions. The tool, which Amazon describes as an “agentic coding service” that turns prompts into working code, decided to delete and recreate a customer-facing environment.
Amazon’s official response? User error. Specifically, “misconfigured access controls.”
But here’s the uncomfortable truth that undercuts that defense: AWS only implemented safeguards — including mandatory peer review for production access — after the incident. If your security controls only exist because an AI already broke things, you don’t get to retroactively blame configuration.
Multiple Amazon employees noted this was “at least” the second time in recent months their AI tools caused a service disruption. Amazon had been pushing employees toward an 80% weekly usage goal for Kiro. The incentive structure was: use the AI agent more. The guardrail structure was: we’ll figure that out later.
They figured it out later.
4.5x More Incidents. Same Root Cause.
Here’s where it gets systemic. A 2026 research report from Teleport found that enterprises deploying AI systems with excessive permissions experience 4.5x more security incidents than those enforcing least-privilege controls. Fifty-nine percent of organizations report having experienced — or strongly suspect — an AI-related security incident.
This isn’t surprising. It’s the exact same failure mode we’ve been warning about in DevOps for two decades, just wearing a different hat.
We spent years learning that CI/CD pipelines shouldn’t have admin access to production. That service accounts should be scoped to the minimum required permissions. That automated systems need circuit breakers, approval gates, and blast radius limits. Infrastructure-as-code evolved entire frameworks around the principle that automated changes to production should be reviewable, reversible, and constrained.
Then AI agents showed up and we threw all of that out the window.
The Permission Inheritance Problem
The core issue is deceptively simple: AI agents inherit the permissions of the human who invoked them.
When a senior engineer asks an AI agent to “fix the deployment configuration,” that agent operates with the engineer’s full credentials. It can read secrets. It can modify infrastructure. It can — as Amazon discovered — delete production environments.
This is insane. We don’t give junior developers unrestricted production access on day one. We don’t let CI/CD bots push to prod without approvals. But AI agents? Here’s the keys, go wild.
The defense is always the same: “The human is responsible for reviewing the output.” But infrastructure changes aren’t like code changes — they’re often executed immediately, with no PR, no diff, no review window. The agent acts, the environment changes, and you find out about it when your pager goes off.
The “human in the loop” is a rubber stamp when reviewers are already drowning in AI-generated code. For infrastructure, it’s often not even that — there’s no loop at all.
The Three Laws of AI Agent Access (That Nobody Follows)
If you’re deploying AI agents with any kind of infrastructure access, there are three non-negotiable rules. Almost nobody follows them.
1. Agents Get Their Own Identity, Not Yours
Every AI agent should operate under a dedicated service identity with explicitly scoped permissions. Not your credentials. Not your team’s shared credentials. A purpose-built identity that can do exactly what it needs to do and nothing more.
This is IAM 101. We solved this for microservices a decade ago. The fact that we’re not applying the same principle to AI agents is pure negligence.
2. Destructive Actions Require Human Approval
Any agent action that modifies, deletes, or creates infrastructure resources in production must require explicit human approval before execution. Not after. Not with a “you can review the logs later” escape hatch. Before.
This means your AI agent needs a planning phase and an execution phase, with a gate between them. The agent proposes changes, a human reviews and approves, then the agent executes. If this sounds like how Terraform plan/apply works, congratulations — you’ve rediscovered infrastructure-as-code principles from 2015.
3. Blast Radius Must Be Bounded
AI agents should not be able to affect resources outside their designated scope. If an agent is working on a deployment configuration, it shouldn’t be able to touch the database. If it’s modifying a single service, it shouldn’t be able to nuke an entire environment.
Scoped permissions. Resource boundaries. Blast radius limits. This is the same containment strategy we use for containers, for service meshes, for everything else in modern infrastructure. AI agents are not special. They are automated systems. Treat them like it.
”But It Slows Things Down”
Yes. That’s the point.
The entire value proposition of AI agents is speed. They’re faster than humans at writing code, generating configurations, executing changes. And that speed is exactly why they need constraints.
A human making infrastructure changes is slow. They read the docs. They second-guess themselves. They run the command in staging first. That slowness is a feature — it’s the time during which mistakes get caught.
An AI agent makes changes at the speed of API calls. When it’s right, it’s impressively fast. When it’s wrong, it’s catastrophically fast. A 13-hour outage is what “fast and wrong” looks like in production.
The organizations getting this right aren’t the ones with the most aggressive AI adoption. They’re the ones that spent time building guardrails before handing over the keys. They have approval workflows, scoped permissions, audit trails, and kill switches. They treat AI agents the way you’d treat any powerful automated system: with respect for what happens when it goes wrong.
The Uncomfortable Conclusion
We are in the “move fast and break things” phase of AI agent adoption. Companies are racing to deploy agents with production access because the productivity gains are real and the competitive pressure is intense.
But “move fast and break things” was always a bad philosophy for infrastructure. Facebook coined it, then spent a decade building the most sophisticated deployment infrastructure in the world specifically to stop things from breaking. The phrase was a lie from the start — what they actually meant was “move fast and build really good rollback mechanisms.”
AI agents need rollback mechanisms. They need approval gates. They need scoped permissions. They need the same boring, unglamorous operational discipline that we apply to every other automated system that touches production.
Amazon learned this the hard way. The question is whether you’ll learn from their incident, or wait for your own.
The AI agent has root access. It has no supervision. And right now, across thousands of organizations, it’s one bad inference away from the kind of morning that starts with a PagerDuty alert and ends with an incident postmortem nobody wants to write.
Fix your permissions. Scope your agents. Add the approval gate.
Do it before the agent decides to “delete and recreate” something you can’t afford to lose.
The Kiro incident was also a code quality story. AI-generated code has 1.7x more bugs than human code, and the review pipeline can’t keep up.