The Review Bottleneck Is Eating Your Engineering Team

Your team is shipping twice as many pull requests as last quarter. Your engineering manager is thrilled. Your dashboards are green. Your velocity metrics have never looked better.

Nobody mentions that review times have nearly doubled too. Or that three production incidents last month traced back to AI-generated code that passed review in under four minutes. Or that your senior engineers — the ones who actually understand the system — are drowning in a tidal wave of machine-generated diffs they’re expected to rubber-stamp between standups.

The bottleneck didn’t disappear. It moved. And it moved to the one place your team can least afford to fail.

The Numbers Nobody Wants to Talk About

Recent data from teams with high AI adoption tells a clear story: 98% more PRs merged, 91% longer review times. Read that again. Your team is producing almost double the output, but the time to actually verify that output has nearly doubled too.

This isn’t a tooling problem. This is a fundamental shift in where engineering effort lives. For decades, the hard part was writing code. Now the hard part is reading it — and reading it well enough to catch the subtle bugs that LLMs are spectacularly good at hiding behind clean syntax and reasonable-looking variable names.

AI-generated code has a signature. Not in the syntax — that’s usually fine. The signature is in the architecture. It tends toward the obvious solution. It reaches for the pattern it’s seen most often in training data. It writes code that looks correct, passes linting, satisfies the type checker, and quietly introduces assumptions about state management that contradict the rest of your codebase.

Good luck catching that in a four-minute review.

The Comprehension Debt Crisis

Technical debt has a cousin nobody named until recently: comprehension debt. It’s the gap between the code that exists in your repository and your team’s understanding of what that code actually does.

Before AI coding tools, comprehension debt accumulated slowly. A developer writes code, they understand it. Their reviewer reads it, they mostly understand it. Knowledge diffuses through the team organically. It’s imperfect, but it works.

Now? A developer prompts an AI, gets 400 lines of working code, glances at it, submits a PR. The reviewer skims it — it’s syntactically clean, tests pass, the description says what it does. Approved. Merged.

Nobody understands it. Not deeply. Not the way you need to understand code when it breaks at 3 AM and the on-call engineer is staring at a stack trace that points into the middle of a function nobody remembers writing, because nobody did write it.

Comprehension debt compounds faster than technical debt. Technical debt slows you down linearly. Comprehension debt creates a cliff — everything seems fine until someone needs to debug, refactor, or extend a module, and discovers that the team’s mental model of the system diverged from reality six months ago.

What Actually Works

I’ve watched teams handle this well and teams handle this catastrophically. The difference isn’t whether they use AI — it’s whether they restructured their review process to account for the new reality.

The teams that are winning do three things:

1. They Treat AI-Generated Code as Untrusted by Default

Not in a paranoid way. In the same way you’d treat code from a new contractor who’s brilliant but has never seen your codebase. The code might be excellent. It also might make assumptions that are wrong in your specific context.

This means reviews of AI-assisted PRs take longer, not shorter. If your review times went down after adopting AI tools, that’s not efficiency — that’s negligence.

2. They Enforce Authorship Understanding

The developer who submits the PR must be able to explain every line. Not “the AI wrote it and tests pass.” Actually explain the approach, the tradeoffs, and why this solution fits the existing architecture.

Some teams do this with mandatory PR descriptions that include a “design decisions” section. Others do synchronous code walkthroughs for anything over 200 lines. The mechanism matters less than the principle: if you can’t explain it, you can’t ship it.

3. They Cap AI-Generated Scope per PR

This is the unsexy one, and it’s the most effective. Instead of letting AI generate an entire feature in one shot, teams break the work into small, reviewable chunks. Each chunk is small enough that a human can hold the full context in their head during review.

Yes, this means you ship the 400-line feature as four 100-line PRs. Yes, this is slower on paper. But it’s faster in practice, because each PR actually gets reviewed instead of rubber-stamped.

The Vibe Coding Hangover

Andrej Karpathy coined “vibe coding” in early 2025. By early 2026, he was already calling it passé, pushing “agentic engineering” as the replacement term. The rebranding is telling — the industry realized that “just vibes” wasn’t a great pitch for production software.

But renaming the approach doesn’t fix the underlying problem. Whether you call it vibe coding, agentic engineering, or AI-assisted development, the fundamental challenge is the same: someone still has to understand what the code does.

The open-source world figured this out the hard way. Daniel Stenberg shut down cURL’s bug bounty after AI-generated submissions became unmanageable. Mitchell Hashimoto banned AI-generated code from Ghostty. Steve Ruiz closed all external PRs to tldraw. These aren’t Luddites — these are maintainers who realized that the cost of reviewing low-quality AI submissions exceeded the value of accepting them.

Enterprise teams are hitting the same wall, they just don’t have the luxury of closing PRs from their own developers.

The Real Skill Shift

The industry narrative is that AI is replacing developers. That’s wrong, but not in the comforting way people think.

AI isn’t replacing developers. It’s replacing the easy part of development and concentrating all the effort on the hard part. Writing code was never the bottleneck for experienced engineers. Understanding systems, making architectural decisions, debugging complex interactions, reviewing code for subtle correctness issues — that’s where the actual work lives. And AI just doubled the volume of that work.

The developers who thrive in this environment aren’t the ones who generate the most code. They’re the ones who can read, understand, and evaluate code at scale. The skill that matters most in 2026 isn’t prompting — it’s reviewing.

If your team is optimizing for generation speed and ignoring review quality, you’re building on sand. The production incidents are coming. The only question is whether you’ll fix your process before or after they arrive.

What To Do Monday Morning

Stop celebrating PR velocity. Start measuring review quality. Track how many production incidents trace back to insufficiently reviewed AI-generated code. Make that number visible.

Assign your strongest engineers to review, not generation. This feels backwards — why would you put your best people on “just reading code”? Because reading code is the hard part now, and getting it wrong is expensive.

Invest in tooling that helps reviewers, not generators. AI-powered code review that flags architectural inconsistencies, assumption mismatches, and pattern deviations from your existing codebase. The irony of using AI to review AI-generated code isn’t lost on me, but it’s where the leverage actually is.

And for the love of everything — stop approving PRs you didn’t actually read. The green checkmark is supposed to mean something.