Over the last two years, I've spent most of my time building AI-native products from scratch—ChatStats (an AI assistant over my iMessage history), multi-agent game systems with self-improving strategy loops, agentic business document generators—solving real context and orchestration problems at depth.
Before that: Building ad platofrms at Amazon and Apple handling 5M requests per second with sub 120ms latency. I know what production systems look like at scale, and I've personally gone through these AI adoption stages repeatedly across multiple codebases.
What surprised me is how predictable the journey looks once you've seen it enough times:
- The same early wins on tests, docs, glue code, and small greenfield services.
- The same frustration when agents touch large, weird, legacy codebases.
- The same “ohhh, this changes everything” moments when you finally fix the context bottleneck.
You can think of this as an AI adoption curve for real engineering work. It's not about buying a particular tool. It's about:
How long an AI agent can work autonomously in your system before it loses the plot—and how you systematically extend that duration by binding the agent tighter to your intent.
This article is my attempt to write that curve down as a roadmap:
- How to read the stages (unit of assessment, productivity ranges, codebase complexity)
- What each stage feels like in practice, from 0 → 6
- What needs to be true to graduate from one stage to the next
- Where deeper articles fit in if you want to go down the rabbit hole
This is a living document. I'm consolidating my experience here as I continue building and learning. As I move through these stages myself and work with more teams, I'll update this roadmap to reflect what actually works in practice.
How to read this: Take the quiz below to identify your current stage, then read that section plus the one before and after. Unless you're very curious, reading every stage will be overwhelming. Focus on closing gaps where you're partially in the previous stage and understanding how to reach the next one.
How to Read This Roadmap
Before we dive into stages, a few important definitions.
One Engineer, One Codebase
When I say "Stage 3" or "Stage 4", I'm talking about:
One engineer working on one codebase or service.
That matters, because your org probably looks like this:
- A couple of newer services where people are doing advanced stuff (Stage 3-ish).
- A big legacy monolith where everyone secretly still feels like Stage 1 or 2.
- One principal or staff engineer who is personally operating at Stage 3–4 with AI even if the rest of the team isn’t.
That’s normal. The point of this roadmap is not to stamp a single number on your entire company. It’s a lens, not a compliance checklist.
Productivity Ranges
Each stage has a range like 3×–30× next to it.
These ranges are wide because productivity depends on engineer skill, task complexity, and domain knowledge working together.
Writing unit tests with AI? Trivially easy. Design research or a sweeping refactor across five services? Much harder. Your strongest engineer who knows the codebase will extract far more value than a junior onboarding to unfamiliar code.
In early stages (1-2), the gap between best and worst case is enormous. As you systematize context (stages 3-4), the floor rises—average engineers on hard tasks start seeing what top engineers saw earlier.
So when you see:
Stage 2 — Agentic AI (3×–30×)
Read that as: "Depending on the engineer, the task, and the codebase, you'll see somewhere in this range."
Codebase Complexity
You've seen the vibecoding demos: AI spins up entire apps in minutes, ships features in one prompt, makes coding look like magic.
It works when the entire project fits in the model's context window. Perfect for greenfield: one service, modern stack, no history, no hidden invariants.
The disillusionment hits hard when you take those same tools into a large legacy service and they serve up hallucinations on your first request.
Real companies have systems that look like:
- 100k+ lines across multiple services
- 10–15 years of history with mixed paradigms and half-finished migrations
- Tribal knowledge about "that one place you must not touch"
For humans, onboarding onto those systems is orders of magnitude harder than starting greenfield. For AI, it's no different.
A tool that behaves like Stage 3 or 4 on a tiny service may feel like Stage 2 on your legacy monolith until you fix context.
If you take nothing else away from this article:
Codebase complexity is not a side note. It's one of the main difficulty knobs for AI adoption.
Context is how you pay down that difficulty. I dig into this in Context Is Your Constraint.
The Stages
Here's the whole ladder in one view:
-
Stage 0 — No AI (1×–10×)
Traditional engineering. No AI tools in the loop. -
Stage 1 — Ad-hoc AI (1×–15×)
ChatGPT/Claude in the browser. Manual copy-paste of context. -
Stage 2 — Agentic AI (3×–30×)
Agentic IDEs and agent terminals[†] that can search and read your repo—without any context engineering toolchain on top. -
Stage 3 — Ad-hoc Context Engineering (5×–50×)
Power users hand-crafting context bundles for serious tasks. -
Stage 4 — Systematic Context (Intent Layer) (15×–60×)
A shared, token-efficient context layer (e.g.agents.mdhierarchy) sits over your codebase. -
Stage 5 — Agentic Verification (30×–100×)
Agents own implementation and verification loops; humans review final results. -
Stage 6 — Multi-Agent Orchestration (60×–300×)
Many autonomous agents work in parallel; orchestration handles conflict and coordination.
Stage 7—Continuously Learning Agents—is frontier territory being defined by early adopters. You don't need Stage 7 to see massive value; for most orgs, there are years of runway moving from Stage 2 to 4 or 5.
Now let's go stage by stage.
Stage 0 — No AI
Most serious engineering orgs with modern stacks aren't here anymore, so we won't linger. But for completeness:
What it feels like
Everyone works in traditional editors. No AI tools. All improvements come from hiring, better processes, and better infrastructure.
| Dimension | What This Stage Looks Like |
|---|---|
| AI Tools & Infrastructure | No AI tools available. Engineers use traditional IDEs. |
| Context Engineering | Not applicable — no AI to provide context to. |
| Task Scope & Capabilities | Manual coding for all tasks. Engineers rely on their own knowledge and traditional docs. |
| Engineer Skill Requirements | Traditional software development skills. No AI prompting needed. |
| Quality & Verification | Manual code review, manual testing, traditional QA. |
| Iteration Speed | Baseline. Features take standard timelines based on engineer skill and availability. |
Stage 1 — Ad-hoc AI
This is "we use AI, but mostly as smart autocomplete."
What it feels like
Engineers paste stack traces into browser-based chat UIs, use autocomplete for small helpers, and occasionally have it explain unfamiliar code. It's undeniably helpful, but only on narrow, local tasks.
| Dimension | What This Stage Looks Like |
|---|---|
| AI Tools | Browser-based chat UIs (ChatGPT, Claude) and basic IDE autocomplete plugins. |
| Context Engineering | Manual copy-paste workflow. Engineers copy code snippets and error messages into chat interfaces. |
| Task Scope | Single-file edits, code explanations, isolated tests and docs. |
| Engineer Skills | Basic prompting skills. Wide skill variance. |
| Quality & Verification | Manual review of all AI suggestions. Engineers treat AI output as a starting point requiring significant editing. |
| Iteration Speed | Faster than pure manual for specific tasks, but context translation between browser and editor creates overhead. |
Your Constraint
Agency. You have amazing AI but you're not giving it tools to make changes directly. You're bottlenecked by human I/O going back and forth. The workflow: hit an error, switch to browser, paste stack trace, get response, copy back to editor, repeat. This translation loop limits AI to isolated tasks.
Graduate By
Adopt agentic tools[†] that can search, read, and make changes to your repo directly. This eliminates the translation loop. Agents get their hands dirty in your code instead of you playing messenger.
Final Thoughts
Stage 1 is where manual copy-paste hits its ceiling. Graduating to Stage 2 is straightforward: give engineers tools that make changes directly in code. The interesting problems start after that.
Stage 2 — Agentic AI
This is where most serious teams I talk to live today.
What it feels like
You've wired in agentic tools[1]. Agents can list files, search the repo, read code/configs/docs, and propose multi-file changes. It feels magical on small, clean services—and frustratingly hit-or-miss on large, messy ones.
| Dimension | What This Stage Looks Like |
|---|---|
| AI Tools | Agent harness with agentic context retrieval: Cursor, Augment, Claude Code, or similar. |
| Context Engineering | Agentic context retrieval. Tool automatically searches codebase using terminal tools, embedding tools, and AST relationships. |
| Task Scope | Greenfield: prototype entire apps in hours. |
| Engineer Skills | Learning to work with agents: Effective prompts, reviewing multi-file changes, understanding agent limitations. Still figuring out what works. |
| Quality & Verification | Manual code review and testing required. Must carefully verify agent changes, especially on legacy systems where context gaps lead to subtle bugs. |
| Iteration Speed | Greenfield: minutes - hours for prototypes. |
Your Constraint
Context.
The AI has the intelligence and the tools, but it does not see what your best engineers see before touching production[2].
On greenfield, agentic search can wander the repo and land somewhere reasonable. On legacy systems, it's guessing where to look and filling the window with noise.
It cannot see your architectural patterns, enforcement boundaries, or "never do X" rules. It does not know which configs and experiments actually matter, or how Service A's contract quietly constrains Service B. That missing context is why you get:
- Fast prototypes on small, clean repos
- Inconsistent results on large, messy ones
- Subtle bugs when invariants are not visible to the model
- More time spent reviewing than the AI saved
The model's capability is already there. The ceiling you are hitting is context.
Graduate By
Treat context as an explicit engineering skill rather than something the agent figures out on its own.
Pick a few senior engineers and have them hand craft rich context packs for serious tasks: code, configs, docs, and constraints that matter. Use context engineering tools[†] to help assemble bundles.
The real leverage here is teaching your engineers how to think in terms of context engineering. This skill also becomes a prerequisite for building effective agents later.
Final Thoughts
Stage 2 is where you learn that agentic search alone is just the AI feeling around your system with a blindfold on. Stage 3 is where you start taking that blindfold off by giving it the curated view your best engineers use.
Stage 3 — Ad-hoc Context Engineering
This is the "wizard" phase.
What it feels like
A handful of senior engineers have learned to act as human context routers. They spend 60-90+ minutes before a task assembling the perfect context bundle: the right files, architectural patterns, cross-service contracts, and domain constraints. When the context is right, the agent delivers production-quality work in one shot.
| Dimension | What This Stage Looks Like |
|---|---|
| AI Tools | Agent harness + manual context engineering tools. |
| Context Engineering | Manual context engineering. Engineers craft packages: files, diagrams, dependencies, domain knowledge. |
| Task Scope | Deliver entire features across complex multi-service systems. Context engineering unlocks effectiveness on legacy. |
| Engineer Skills | Must learn context engineering, a specialized skillset with steep learning curve. Complex workflow requiring significant cognitive overhead. |
| Quality & Verification | Manual review with better confidence. Good context means fewer subtle bugs, but verification still required. Reviewers still carry the load. |
| Iteration Speed | Complex features feasible but context preparation adds upfront time. Faster than Stage 2 on legacy, but the preparation tax is real. |
Your Constraint
Skill gap and scalability.
The difference between a novice and expert context engineer is 10x or more. Your wizards are shipping entire features that used to take teams, but:
- Only 2-3 engineers can execute this effectively
- Each context pack takes 30-90 minutes to assemble
- Miss one file or dump too much, and quality degrades
- Every pack is a bespoke experiment that disappears into chat logs
- Average engineers remain stuck at Stage 2
You've proven context is the lever. Now you need to scale it[3].
Graduate By
Build a systematic context layer over your codebase. Capture the hard-won mental models from your best engineers once, version them in the repo, and make them reusable.
Tools: agents.md hierarchies, architectural decision records, explicit invariants.
Final Thoughts
Stage 3 is where the most impressive AI case studies come from, but it's also where "AI works... but only when we put a principal engineer in front of it and treat every serious change as a one-off science project."
Stop treating context as personal craft. Start treating it as infrastructure. That's Stage 4.
Stage 4 — Systematic Context
This is the first big step change for larger organizations.
What it feels like
You've turned the lights on permanently[4]. Your best engineers' mental models are now encoded as a hierarchical context layer over the codebase. Instead of every task requiring manual context assembly, agents automatically start with architectural patterns, invariants, cross-service contracts, and known pitfalls. The context your wizards had to craft by hand in Stage 3 is now infrastructure that everyone gets for free.
| Dimension | What This Stage Looks Like |
|---|---|
| AI Tools | Agent harness + systematic context layer. Manual context engineering is still an incredibly valuable tool. |
| Context Engineering | Systematic and pre-built. Created once through interviews and hierarchical summarization. Maintained as code evolves. |
| Task Scope | Large features, projects, and refactors delivered end-to-end. Bugs triaged rapidly with full system context. |
| Engineer Skills | Just need to use the agent harness. Context engineering is democratized. |
| Quality & Verification | Manual review with higher confidence. Agents make sound decisions and avoid known pitfalls. Engineers still test and iterate. |
| Iteration Speed | Top engineers faster (less overhead). Bottom engineers productive on complex tasks (context guides them). Team velocity increases. |
Your Constraint
Verification and iteration.
Agents now make correct, system-aware changes without manual context wrangling, but you still spend significant time:
- Running tests manually
- Checking logs and metrics
- Iterating on failures
- Verifying edge cases
You've solved "what should the agent read?" The bottleneck is now "who runs the tests and fixes failures?"
Graduate By
Add an agentic verification layer. Let agents run tests, check logs, and iterate on their own failures autonomously. You review the final result, not every iteration.
Final Thoughts
Stage 4 raises the floor. Average engineers now get results that were previously only available to your top engineers. Context engineering becomes shared infrastructure instead of personal craft, and AI effectiveness gets democratized across your entire team.
Stage 5 — Agentic Verification
By Stage 5, you've solved the input side of context. Now you start solving the feedback loop.
What it feels like
Agents no longer just propose changes. They run tests, check logs, iterate until acceptance criteria are met. Implementation agent makes changes using the context layer, verification agent runs tests and checks regressions. They iterate for 30-60 minutes. You review a patch with clean test results.
| Dimension | What This Stage Looks Like |
|---|---|
| AI Tools | Agent harness + systematic context layer + verification agents. Implementation and test agents work in tandem to iterate autonomously. |
| Context Engineering | Systematic context layer extended to testing patterns, verification strategies, and quality criteria. Both implementation and verification are context-aware. |
| Task Scope | Agents iterate 30-60 minutes autonomously to achieve goals. Implementation agent makes changes, test agent verifies and checks regressions, iterate until complete. Fully autonomous cycles. |
| Engineer Skills | Focus shifts to planning and specification. Engineers define goals, acceptance criteria, and edge cases. Agents execute and verify. Review final output, not iterations. |
| Quality & Verification | Autonomous implementation and verification loop. Test agent ensures correctness and no regressions. Engineers review completed, tested work. |
| Iteration Speed | 30-60 minute autonomous cycles. Define feature, walk away, return to tested implementation. Eliminates manual test-debug-fix loop. |
Your Constraint
Serial execution.
You work on one task at a time. The agents can iterate autonomously, but you can't parallelize multiple loops simultaneously. If you want to ship five features, you queue them up and wait for each to complete.
The bottleneck shifts from can an agent make a correct change to how many loops can we run at once and how do we avoid agents stepping on each other.
Graduate By
Enable parallel multi-agent orchestration with merge conflict resolution. Spin up multiple agent pairs working simultaneously on different parts without stepping on each other.
Final Thoughts
Stage 5 is where you delegate entire implementation cycles. Define features in the morning, walk away while they run, return to working tested code. Or move to Stage 6 and start running multiple independent cycles in parallel while you wait. The 30-60 minute autonomous cycle eliminates the test-debug-fix loop.
Stage 6 — Multi-Agent Orchestration
This is the current frontier: engineers orchestrating many autonomous loops in parallel.
What it feels like
Engineers implement and test multiple non-trivial features daily. An orchestration layer spins up multiple agent pairs, coordinates which parts of the codebase they touch, and handles merge conflicts. You spend mornings on planning and specs, fire off multiple agent tasks, and spend afternoons reviewing and merging.
| Dimension | What This Stage Looks Like |
|---|---|
| AI Tools | Full orchestration system managing multiple parallel autonomous agents. |
| Context Engineering | Systematic context layer enables planning mode. Engineers use context to design end-to-end features and test plans with agents before implementation. |
| Task Scope | Multiple features delivered daily. Architecture decisions become two-way doors. Explore multiple directions experimentally. |
| Engineer Skills | 95% planning and design. Focus on architecture, feature specs, edge cases. Implementation runs parallel while planning next batch. |
| Quality & Verification | Multiple autonomous agent pairs in parallel, each with implementation and verification loops. Orchestration ensures clean merges between workstreams. |
| Iteration Speed | Several large features per engineer per day. What used to take a sprint now takes an afternoon. Planning quality is the bottleneck, not implementation capacity. |
Your Constraint
Planning and specification quality.
Implementation capacity is no longer a constraint. The accuracy and completeness of your feature specifications is now the bottleneck. Poorly defined edge cases or ambiguous requirements waste agent cycles.
Your role shifts from can we build it to did we specify the right thing. Engineering becomes primarily architectural decision-making and planning.
Graduate By
This is the top tier for most organizations. The next frontier is continuously learning agents that improve their own context and skills over time, but you don't need that to see massive value. Focus on improving planning processes, specification quality, and architectural decision-making.
Final Thoughts
You're operating on a fundamentally different level. The constraint is how well you can think through problems, not how fast you can implement them.
Idea people will rule the world.
What I Think Comes Next
Everything in this roadmap is based on patterns that already exist in production today.
The interesting question for the next few years is: What does Stage 7 actually look like?
My current bet is continuously learning agents that improve their planning capabilities over time:
-
Learn your patterns: After seeing how you specify features, what edge cases you care about, and how you make architectural tradeoffs, agents start suggesting complete specifications that match your style.
-
Compound context: Instead of just reading static context layers, agents update their understanding based on what works and what doesn't in your specific codebase and team.
-
Planning partners: The bottleneck shifts from how fast can I write specs to how well can I communicate what I want to an agent that already knows my constraints and preferences.
This is frontier territory. Most organizations will see enormous value just moving from Stage 2 → 4 on their critical systems.
This roadmap gives you the language and sequence to keep making progress without getting lost in the hype.
If you want to climb this curve:
- Use this article as the map
- Use the quiz as an alignment tool
- Focus on the stages that remove your current bottleneck
And if you want help moving faster, from training your team in context engineering to installing a context layer over a legacy service, that's exactly what I spend my days working on.
Want help climbing the roadmap?
Whether you're trying to get more out of Cursor/Claude Code today or you're ready to install a context layer over a real legacy service, I can help you move up a stage without burning a year experimenting.