Rethinking Software Engineering Teams

AI tools aren't the bottleneck. Your org structure is.

Mar 02, 2026

Something interesting is happening across the industry right now.

Teams gave their devs Claude Code. Engineers started writing code at 5x the old pace. Features that used to take a sprint were getting drafted in a day.

But then a quieter question started surfacing:

“If devs are writing code 5x faster... why aren’t we shipping 5x faster?”

Because writing code faster and shipping products faster are two very different things. And that gap is precisely the problem.

I’ve been thinking about this in four stages. Once you see it, you can’t unsee it.

Phase 1: The Production Line (Where We All Started)

This is the world we all grew up in. Software gets built like a factory assembly line. PM writes the spec. Design makes the mocks. Dev builds the thing. QA breaks the thing. Ship.

Each lane is a different team. Each team hands off to the next. The cycle time is the sum of all the lanes.

  ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
  │  PM      │────▶│  DESIGN  │────▶│   DEV    │────▶│    QA    │──▶ Ship
  │  (human) │     │  (human) │     │  (human) │     │  (human) │
  │ ████████ │     │ ████████ │     │ ████████ │     │ ████████ │
  └──────────┘     └──────────┘     └──────────┘     └──────────┘
    ~2 weeks         ~3 weeks         ~4 weeks         ~3 weeks

  Throughput per lane:

  PM      ████████████████████████████   ~2 wk
  Design  ████████████████████████████   ~3 wk
  Dev     ████████████████████████████   ~4 wk
  QA      ████████████████████████████   ~3 wk
          ──────────────────────────────
          Balanced. Predictable.           Total: ~12 weeks

It’s slow. Everyone knows it’s slow. But it’s predictably slow. Every lane takes roughly the same amount of time. The bottleneck is everywhere, which means the bottleneck is nowhere.

This is the world that AI was supposed to fix.

Phase 1.5: The Trap (Where A Lot of Teams Are Right Now)

AI tools rolled out. But they didn’t roll out evenly.

Dev got Claude Code, Cursor — the works. Suddenly a senior dev is generating code at 5x the old pace. Features that used to take a sprint are getting drafted in a day.

Design got some help too. AI-assisted prototyping, concept generation. Maybe a 1.5x improvement.

PM? About the same. QA? Barely touched.

Now look what happens to the pipeline:

  ┌──────────┐  ┌────────┐  ┌────┐         ┌───────────────────────┐
  │  PM      │─▶│ DESIGN │─▶│DEV │────────▶│          QA           │──▶ Ship
  │  (human) │  │ (human │  │(h+ │         │  (human, no AI help)  │
  │          │  │  + AI) │  │AI) │         │                       │
  │ ████████ │  │ ██████ │  │ ██ │         │ ████████████████████  │
  └──────────┘  └────────┘  └────┘         └───────────────────────┘
    ~2 wk        ~2 wk      ~1 wk     ⚠ BOTTLENECK   ~5 wk
                  (1.5x)     (5x)                      (swamped)

Dev went from 4 weeks to 1 week. But QA didn’t get faster — it got slower. It’s absorbing 5x the volume with the same headcount, the same manual processes, the same regression suite that takes three days to run.

  PM      ████████████████████████████   ~2 wk  (no change)
  Design  ████████████████████░░░░░░░░   ~2 wk  (modest gains)
  Dev     ████░░░░░░░░░░░░░░░░░░░░░░░   ~1 wk  (5x faster)
  QA      ████████████████████████████████████   ~5 wk  ◀── CONSTRAINT
          ─────────────────────────────────────

End-to-end: ~10 weeks. Not 5x faster. Barely faster at all.

This is the Theory of Constraints playing out in real-time. You didn’t eliminate the bottleneck — you just moved it downstream. And the faster dev ships, the worse the QA bottleneck gets.

I hear this from teams constantly. “Dev is shipping so fast now, but we can’t get anything through QA.” Or: “We have 47 PRs waiting for review.” Or: “We’re shipping faster but quality is dropping because we’re cutting corners on testing to keep up.”

Phase 1.5 is a mirage. The velocity charts look great. The end-to-end delivery doesn’t.

Phase 2: Give AI to Everyone (Where Smart Teams Are Heading)

The natural next step: if uneven AI adoption created bottlenecks, give AI tools to every lane. PM gets AI. Design gets AI. Dev has AI. QA gets AI. Level the playing field.

This is a real improvement. It works. It’s worth doing.

  ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
  │  PM      │────▶│  DESIGN  │────▶│   DEV    │────▶│    QA    │──▶ Ship
  │ (human   │     │ (human   │     │ (human   │     │ (human   │
  │  + AI)   │     │  + AI)   │     │  + AI)   │     │  + AI)   │
  │ ██████   │     │ ██████   │     │ ██████   │     │ ██████   │
  └──────────┘     └──────────┘     └──────────┘     └──────────┘
    ~1 week          ~1.5 weeks       ~1 week          ~1.5 weeks

  Throughput per lane:

  PM      ██████████████░░░░░░░░░░░░░░   ~1 wk    (2x faster)
  Design  ████████████████░░░░░░░░░░░░   ~1.5 wk  (2x faster)
  Dev     ██████████████░░░░░░░░░░░░░░   ~1 wk    (4x faster)
  QA      ████████████████░░░░░░░░░░░░   ~1.5 wk  (2x faster)
          ──────────────────────────────
          Balanced again. Every lane improved.  Total: ~5 weeks

A genuine 2-3x improvement. You should absolutely pursue this.

But here’s the thing most people miss: it’s still a production line.

You still have four separate teams. Four separate handoffs. Four queues. Four sets of context lost in translation. AI made each station faster, but the architecture of how work flows is exactly the same.

  Phase 2 is still sequential:

  PM ━━━▶ Design ━━━━▶ Dev ━━━▶ QA ━━━━▶ Ship
  (fast)  (fast)       (fast)   (fast)
          ║                      ║
          ║  still a handoff     ║  still a handoff
          ║  still waiting       ║  still waiting

Phase 2 is like putting faster engines in every car on a single-lane road. The cars are faster. The road is still one lane.

5 weeks is great. But the real question is: why do we need four separate teams at all?

But First: What The Diagrams Don’t Show You

Every diagram above is a lie. A generous, best-case lie.

Real software development is never a clean left-to-right pipeline.

  What the diagram shows:

  PM ──▶ Design ──▶ Dev ──▶ QA ──▶ Ship


  What actually happens:

  PM ──▶ Design ──▶ Dev ──▶ QA ──┐
              ▲        ▲         │
              │        └─────────┤  "This doesn't match the spec"
              │                  │
              └──────────────────┤  "This flow doesn't work,
                                 │   we need to redesign"
                                 │
         PM ◀────────────────────┘  "Users are hitting a bug
                                     in production, we need to
                                     rethink the whole approach"

QA finds a bug — back to dev. Dev discovers the design breaks at edge cases — back to design. A production incident forces PM to reprioritize everything. Design and dev go back and forth three times before the interaction feels right.

These rework loops cross team boundaries every time. And that’s where three things break:

Context has to be re-hydrated at every boundary. The designer who made that mock is on a different project now. The dev has to write up a ticket, attach screenshots, and hope she can reload the mental model she had three weeks ago. Both sides pay this tax. Every loop. Every time.

Capacity utilization collapses. Dev is blocked waiting for design. Design is idle waiting for QA to surface issues. QA has nothing for two weeks, then gets slammed with five features at once. The chunks of work don’t fit neatly across teams. A bug that takes two hours to fix takes five days to schedule.

  Theoretical capacity:

  PM      ████████████████████████████  100%
  Design  ████████████████████████████  100%
  Dev     ████████████████████████████  100%
  QA      ████████████████████████████  100%


  Actual capacity:

  PM      ████░░██░░░░████░░░░██░░░░░░  ~50%
  Design  ░░████░░░░██░░░░████░░░░░░██  ~45%
  Dev     ██░░████░░░░██░░░░░░████░░██  ~55%
  QA      ░░░░░░██░░████░░██░░░░████░░  ~40%
          ──────────────────────────────
          ░░ = idle / blocked / context-switching / wrong priority

Coordination eats the remaining capacity. Every rework loop requires: file a ticket, triage it, assign it, wait for a sprint slot, re-explain context, review the fix, re-test. For a two-hour fix, you spend eight hours on coordination.

  Where time actually goes on a "12-week" feature:

  Productive work:   ██████████████████████░░░░░░░░░░░░░░  ~35%
  Rework:            ████████████░░░░░░░░░░░░░░░░░░░░░░░░  ~20%
  Waiting/blocked:   ██████████████░░░░░░░░░░░░░░░░░░░░░░  ~25%
  Coordination:      ████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░  ~20%
                     ──────────────────────────────────────
                     Only ~35% of the 12 weeks is actual building.

This is true in Phase 1, Phase 1.5, and Phase 2. Even in Phase 2 — where every lane is faster — the rework loops still cross team boundaries, context still has to be re-hydrated, and scheduling is still a problem in itself. Phase 2 shrinks the loops from weeks to days, but they’re still there. Roughly 40% of Phase 2’s 5 weeks is still structural waste.

The production line was never just slow because the stations were slow. It was slow because the architecture — sequential handoffs, cross-team rework, context loss, scheduling overhead — was eating most of the time.

Which is exactly why Phase 3 changes everything.

Phase 3: One Person, A Team of AI Agents (The Real Unlock)

Phase 3 isn’t about giving AI tools to existing roles. It’s about merging the roles entirely — because AI agents can now own each function, and a single person can orchestrate all of them.

One person. A PM agent that writes specs and user stories. A design agent that generates mocks and flows. A dev agent that writes, reviews, and refactors code. A QA agent that writes tests, runs regressions, and flags issues.

You’re not a worker on the assembly line. You’re the conductor of an AI orchestra.

                      ┌───────────────────┐
                      │                   │
                      │    YOU            │
                      │    (orchestrator) │
                      │                   │
                      └─────────┬─────────┘
                                │
                  ┌─────────────┼──────────────┐
                  │             │              │
           ┌──────┴─────┐ ┌─────┴──────┐ ┌─────┴──────┐
           │            │ │            │ │            │
           ▼            ▼ ▼            ▼ ▼            ▼
   ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
   │  PM Agent  │ │Design Agent│ │ Dev Agent  │ │  QA Agent  │
   │            │ │            │ │            │ │            │
   │  Writes    │ │  Generates │ │  Writes    │ │  Writes &  │
   │  specs,    │ │  mocks,    │ │  code,     │ │  runs      │
   │  user      │ │  flows,    │ │  reviews,  │ │  tests,    │
   │  stories,  │ │  assets    │ │  refactors │ │  flags     │
   │  priorities│ │            │ │            │ │ regressions│
   └──────┬─────┘ └──────┬─────┘ └──────┬─────┘ └──────┬─────┘
          │              │              │              │
          └──────────────┴──────┬───────┴──────────────┘
                                │
                                ▼
                            Ship it

This doesn’t just speed up the stations. It dissolves every structural problem we just talked about.

Context never leaves. You talked to the PM agent this morning, reviewed the design agent’s output over lunch, checked the dev agent’s code after that, and read the QA agent’s test results before dinner. The full context lives in one head. No re-hydration. No tickets explaining what happened three weeks ago. No designer who’s already moved on.

Rework loops shrink from weeks to minutes. Rework isn’t the enemy — slow rework across team boundaries is. When the loop is tight and the context is shared, rework is just iteration.

  Phase 1/2 rework loop (multi-team):

  QA finds bug
    └──▶ File ticket in dev backlog
           └──▶ Wait for next sprint (~1-2 weeks)
                  └──▶ Dev picks it up, needs design input
                         └──▶ Ping design team, wait (~3-5 days)
                                └──▶ Design responds
                                       └──▶ Dev fixes
                                              └──▶ Back to QA queue (~3-5 days)
                                                     └──▶ QA re-tests

  Elapsed time for one rework loop: 2-4 weeks


  Phase 3 rework loop (one orchestrator):

  QA agent flags bug
    └──▶ You see it immediately
           └──▶ Tell dev agent to fix, ping design agent to review
                  └──▶ Both respond in minutes
                         └──▶ QA agent re-tests

  Elapsed time for one rework loop: minutes to hours

Capacity utilization jumps dramatically. No idle time waiting for another team. If the QA agent finds a bug, you route it to the dev agent right now. No scheduling. No sprint boundaries. No “we’ll get to it next week.” Sure, agents still iterate, still have gaps — but 80% utilization at machine speed is a different universe from 50% utilization at human speed.

  Phase 1/2 actual capacity:

  PM      ████░░██░░░░████░░░░██░░░░░░  ~50%
  Design  ░░████░░░░██░░░░████░░░░░░██  ~45%
  Dev     ██░░████░░░░██░░░░░░████░░██  ~55%
  QA      ░░░░░░██░░████░░██░░░░████░░  ~40%


  Phase 3 actual capacity:

  PM Agent      ████████░░████████░░██████  ~80%
  Design Agent  ██████░░████████░░████████  ~80%
  Dev Agent     ████████████░░██████████░░  ~85%
  QA Agent      ██████░░██████████░░██████  ~80%
                ──────────────────────────────
                Agents still iterate. Still idle sometimes.
                But the loops are minutes, not weeks.
                And 80% at machine speed beats 50% at human speed
                by orders of magnitude.

And here’s what most people underestimate: AI agents don’t work at human speed. They work at machine speed. A PM agent doesn’t need two weeks to write a spec — it needs two hours. A QA agent doesn’t need a week to run regressions — it needs an afternoon.

Parallel execution. Machine speed. Dramatically higher capacity utilization. Rework loops that cost minutes, not weeks.

  Phase 3 (parallel, machine speed, one orchestrator):

            Day 1      Day 2      Day 3      Day 4
         ┌──────────┬──────────┬──────────┬──────────┐
  PM     │ ░░░░░░░░ │ ░░░░░░░░ │          │          │
  Agent  │          │          │          │          │
         ├──────────┼──────────┼──────────┼──────────┤
  Design │ ░░░░░░░░ │ ░░░░░░░░ │ ░░░░░░░░ │          │
  Agent  │          │          │          │          │
         ├──────────┼──────────┼──────────┼──────────┤
  Dev    │          │ ░░░░░░░░ │ ░░░░░░░░ │ ░░░░░░░░ │
  Agent  │          │          │          │          │
         ├──────────┼──────────┼──────────┼──────────┤
  QA     │          │ ░░░░░░░░ │ ░░░░░░░░ │ ░░░░░░░░ │
  Agent  │          │          │          │          │
         └──────────┴──────────┴──────────┴──────────┘
                      Continuous, not gated.
                      One person driving all four.
                      Machine speed, not human speed.

  PM Agent      ██░░░░░░░░░░░░░░░░░░░░░░░   ~1 day
  Design Agent  ██░░░░░░░░░░░░░░░░░░░░░░░   ~1 day
  Dev Agent     ██░░░░░░░░░░░░░░░░░░░░░░░   ~1 day
  QA Agent      ██░░░░░░░░░░░░░░░░░░░░░░░   ~1 day
                ──────────────────────────
                Parallel. Machine speed.       Total: ~4 days

4 days. Not 4 weeks. Not 5 weeks. Not 12 weeks. Four days.

And it’s not just the speed — it’s the quality of the time:

  Where time goes — Phase 1 vs Phase 3:

  Phase 1 (12 weeks):
  Productive work:   ██████████████████████░░░░░░░░░░░░░░  ~35%
  Rework:            ████████████░░░░░░░░░░░░░░░░░░░░░░░░  ~20%
  Waiting/blocked:   ██████████████░░░░░░░░░░░░░░░░░░░░░░  ~25%
  Coordination:      ████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░  ~20%

  Phase 3 (4 days):
  Productive work:   ████████████████████████████████░░░░   ~80%
  Rework/iteration:  ██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   ~12%
  Waiting/blocked:   ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   ~5%
  Coordination:      █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   ~3%
                     ──────────────────────────────────────

What used to be a quarter of work — three months of cross-team coordination, standups, sprint planning, handoff meetings, QA cycles, and release trains — collapses into a single work week. One person, one laptop, four agents, shipping on Friday what used to ship in March.

Carl Marx wrote in 1859 that “at a certain stage of development, the material productive forces of society come into conflict with the existing relations of production... From forms of development of the productive forces these relations turn into their fetters.”

In plain language: the way work is organized initially helps technology grow — but eventually the old organizational model becomes the thing holding it back. The structure that once enabled progress becomes the constraint on it.

That’s exactly what’s happening right now.

The production line — PM, Design, Dev, QA as separate teams with sequential handoffs — was the right structure when every function required deep human specialization. It enabled massive scale. But AI agents have fundamentally changed what’s possible, and the org structure hasn’t caught up. The very structure that enabled the last era of productivity is now the fetter on the next one.

The steam engine existed for decades before it transformed manufacturing. The breakthrough wasn’t a better engine. It was the factory — a new organizational model designed around what the engine made possible. The resistance was never technical. It was structural. People who organized work around the old constraints couldn’t imagine organizing it differently.

12 weeks → 10 weeks → 5 weeks is what you get from better tools inside the old structure.

5 weeks → 4 days is what you get when you redesign the structure itself.

The tools are not the bottleneck. The org is.

Phase 1 → 1.5 was about giving some humans AI tools. Phase 1.5 → 2 was about giving every human AI tools. Phase 2 → 3 is about redesigning the organization around what AI makes possible.

The tools are ready. The question is whether the org is.

Haoyu’s Substack

Discussion about this post

Ready for more?