The Machines Are Writing the Outages Now

Nearly 40% of Your Codebase Was Written by Something That Can’t Explain It

Here’s a number that should make every engineering leader uncomfortable: a December 2025 analysis of 470 open-source GitHub pull requests found that code co-authored by generative AI contained approximately 1.7 times more major issues than human-written code.

And yet, nearly 40% of committed code is now AI-generated. We’ve doubled the input and degraded the quality. That’s not a productivity story. That’s a pipeline bomb.

The AWS Kiro outage was a permissions failure — an AI agent with too much access and too little oversight. But there’s a quieter, more pervasive failure happening every day: AI-generated code that looks right, passes CI, gets a cursory review, and ships with bugs that humans wouldn’t have written.

The Quality Numbers Are Grim

The breakdown from that analysis is worth staring at:

75% more logic and correctness errors — wrong control flow, broken dependencies, flawed conditionals
2.74x higher security vulnerability rate — the kind of stuff that shows up on a CISA advisory six months later
194 incidents per 100 PRs — nearly two issues per pull request

This isn’t a tooling problem. It’s a systemic quality problem. And the pipeline we’ve built to catch it — code review — was never designed for this volume or this failure profile.

The Review Pipeline Is the Bottleneck Nobody’s Scaling

Here’s the thing that kills me. We keep talking about AI code generation like it’s a production problem. It’s not. It’s a review problem.

AWS themselves have admitted that “review capacity, not developer output, is the limiting factor in delivery.” Think about that. The constraint isn’t writing code. It hasn’t been for a while. The constraint is having enough qualified humans who can read, understand, and validate what the machines are producing.

And we’re making it worse every day.

Before AI coding tools, a senior engineer might review 3-5 pull requests per day. Each one was written by a human colleague who understood the system, followed team conventions, and could explain their reasoning in a code review comment.

Now that same senior engineer is reviewing 10-15 PRs per day, half of which were generated by an AI that has no understanding of the system’s invariants, no knowledge of last month’s incident, and no ability to explain why it chose one approach over another. The AI doesn’t attend the post-mortem. It doesn’t remember the outage. It will cheerfully regenerate the exact same pattern that caused it.

You haven’t 3x’d your engineering output. You’ve 3x’d the load on your most senior people while giving them worse material to work with.

The Comprehension Gap

Over 40% of junior developers admit to deploying AI-generated code they don’t fully understand. This creates a compounding problem: if the author doesn’t understand the code, who reviews it?

In traditional development, the author is the first line of defense. They know why they made each decision. They can explain trade-offs during review. When a reviewer asks “why did you do it this way?” the author has an answer.

With AI-generated code, the author often is the reviewer — and neither of them understands the code. The original prompt becomes obsolete the moment the code is generated. The code itself becomes the only source of truth, and code is terrible at explaining why it does what it does.

The Compounding Problem

Here’s what really scares me. AI-generated bugs don’t behave like human bugs.

Human bugs cluster around the hard parts. Complex state machines, race conditions, edge cases in business logic. Senior engineers develop an intuition for where human bugs live, because humans make predictable mistakes.

AI bugs are distributed uniformly across the codebase. They show up in the easy parts. Off-by-one errors in pagination logic. Wrong HTTP status codes. Missing null checks on paths that obviously need them. The kind of bugs that make a reviewer think “surely this is correct, it’s so simple” — and then skip the close read.

This means traditional code review heuristics fail. You can’t just focus your review energy on the complex parts anymore. Everything is suspect. Every line needs the same level of scrutiny, and nobody has time for that when you’re drowning in AI-generated PRs.

What Actually Works

I’m not saying stop using AI coding tools. That ship has sailed. But I am saying the industry needs to stop pretending this is a pure productivity win and start treating it like what it is: a risk management problem.

1. Separate generation from integration. AI can write the first draft. A human must own the integration. Not “review and approve” — actually own. Rewrite sections. Add the comments the AI didn’t. Delete the code that doesn’t need to exist.

2. Invest in review tooling, not generation tooling. Every dollar spent making AI write code faster without a corresponding dollar on AI-assisted review is making the problem worse. Automated invariant checking, property-based testing, and semantic diff tools should be table stakes.

3. Treat AI PRs as untrusted input. Same as you’d treat a third-party API response. Validate. Sanitize. Don’t assume it’s correct because it compiled.

4. Cap the ratio. If more than 50% of your merged code in a sprint was AI-generated, your review process probably isn’t keeping up. That’s not a productivity flex. That’s a leading indicator of an incident.

5. Run the post-mortem before the outage. Look at your AI-generated code from last month. Audit it like an external dependency. You might be surprised what you find.

The Next AWS Outage Is Already Written

Somewhere right now, an AI coding agent is generating a pull request that will pass CI, get a cursory review from an overloaded senior engineer, merge to main, deploy to production, and bring down a service. Maybe it’s a mishandled retry that creates a thundering herd. Maybe it’s a connection pool misconfiguration. Maybe it’s a subtle type coercion bug that only manifests under load.

The code is already written. It’s sitting in a PR queue. The only question is whether someone catches it before it ships.

The Kiro outage got the headlines because it was dramatic — an AI agent deleting a production environment. But the slower, quieter failure is worse: thousands of AI-generated PRs merging every day with bugs that humans wouldn’t have written, reviewed by engineers who don’t have time to catch them.

We optimized for code generation velocity while systematically underinvesting in code comprehension, review, and validation.

We built the machines that write the code. We forgot to build the machines that check it.