Context Is Your Constraint

A lot of engineers I talk to have some version of the same story:

"Copilot is great for tests, boilerplate, even library upgrades. On one team it handled about 80% of a gnarly React migration."

"But if I give it a full ticket, it either gives me something half-baked, or confidently tells me it's done when it hasn't even scratched the surface."

It's tempting to file this under "AI just isn't there yet."
But there's a simpler explanation:

AI is limited by the context you give it, just like engineers are.

If the model doesn't see what a great engineer would look at before touching production, it won't perform like a great engineer.

In this post, I'll walk through:

What context actually is for an AI agent
A thought experiment about context
Three stage progression of how to give context to AI
How those approaches play out in a real-world example

What Is Context?

Before we talk about how we get context, it's worth being literal about what context actually is.

When you send a request to a model, it sees your message plus the entire context window. The context window is made of:

The system prompt
Tool definitions
All previous user prompts and assistant replies in the conversation
Tool calls and results from this conversation
Any extra files you inject

All of that is turned into one long blob of text that the model reads and responds to.
The model itself is stateless, what gets built up is the context you've accumulated in your agent or chat window.
Every new turn appends to the same window, so the context is cumulative until you hit the model’s limit.

Here's a simplified example of what the raw context might look like for a real coding task. Notice how the model sees everything that has happened so far as a single input:

Context Window (Visualization)~4k tokens used

System Prompt

You are an engineering assistant responsible for enforcing product policies across our backend services. Always ask clarifying questions if a policy change might impact multiple systems. Never invent code that doesn't compile. Your goal is to act as a Senior Staff Engineer who is intimately familiar with the codebase, architecture, and business constraints.

User (Turn 1)

Audit our consent logging for minors and summarize gaps.

Assistant (Turn 1)

Logging currently flows through ConsentAudit.java and only records personalized impressions, so under-18 contextual ads are invisible.

Tool Use: read_fileshared/logging/ConsentAudit.java

Reading file contents...

Tool Result

class ConsentAudit {
  void logPersonalizedImpression(User user, Ad ad) {
     if (user.isMinor()) {
       // TODO: This is currently dropping logs for minors!
       return;
     }
     auditLog.save(user.getId(), ad.getId(), "PERSONALIZED");
  }
}

User (Current Turn)

Make sure users under 18 never see personalized ads. Update the logic to block them at the eligibility stage.

Assistant (Current Turn)

I'll inspect the policy docs and the code that controls eligibility to ensure we enforce this strict block.

Tool Use: read_filedocs/product/policy-under-18.md

Reading file contents...

Tool Result

Under no circumstances may users under 18 receive personalized ads. 
They may see contextual ads that do not use behavioral or profile data.
Strict enforcement must happen in the EligibilityStage before the auction.

Tool Use: read_filead-serving/pipeline/EligibilityStage.java

Reading file contents...

Tool Result

class EligibilityStage {
  public boolean isEligible(User user, AdCandidate candidate) {
    // Check basic blocklists
    if (blocklistService.isBlocked(user.getId())) return false;
    
    // Check budget
    if (!budgetService.hasBudget(candidate.getCampaignId())) return false;

    // ... missing age check ...
    
    return true;
  }
}

Tool Use: read_filetargeting/rules/AgeRule.java

Reading file contents...

Tool Result

public class AgeRule implements TargetingRule {
  @Override
  public boolean matches(User user, TargetingCriteria criteria) {
    int age = user.getAge();
    if (age < 13) return false; // COPPA
    
    // Logic for 13-18?
    return criteria.getAgeRange().contains(age);
  }
}

Scroll to see full history...

Practically, three constraints drive everything:

Blind spots cause hallucinations.
If the agent doesn’t see specific context, it fills the gap with generic training priors. You only get the behavior you load.
Everything influences everything.
Every token included effects every other token. This mean noise to signal ratio is important.
The window is finite.
Models have hard token context limits beyond which the API will not respond. Performance typically degrades far before the hard limit is hit.

The job, then, is to curate the smallest, highest-signal slice of context for the task at hand.
Striking that balance is context engineering.

Context selection for engineering tasks is a choice between:

A description of the task at hand -- An entire PRD or carefully scoped down tasks?
Potentially millions of lines of code across multiple services.
Years of architecural decisions and knowledge of common operational challenges.

Typically you'd tightly scope tasks you give to your junior engineers, and rely on your senior engineers to tackle ambiguous tasks -- something they're able to do because of the context they've accumulated over their tenure on the team.

But with AI, you have the greatest engineering mind ever, but you have to explain your code & product from scratch every time.
Which pieces you choose to tell it determines the ceiling on what the AI can do.

Thought Experiment

Imagine you're in a product meeting and someone asks:

"What's the next feature we should build for our app?"

With no further info, most people generate a handful of default ideas:

"New dashboard" ... "Better reporting" ... "Mobile UX improvements" ... "Dark mode?"

Now add two concrete facts:

Revenue + churn reality: 70% of revenue is enterprise, and the #1 churn reason is confusing onboarding.
Week-one value gap: Sales can't prove value in the first week, and support tickets show new admins failing permission setup.

Suddenly, the next features seem obvious:

Guided enterprise onboarding that carries admins to first value.
A permission template/checklist that makes setup foolproof.

The added context pruned the search space to what actually matters. In a SaaS app, it's churn. In an ad platform, it's compliance.
Nothing about your intelligence changed in those 10 seconds. You got better context.

The quality of your answer changed because the context changed.

LLMs behave exactly the same way:

Vague context → generic output
Rich, relevant context → specific, useful output

Right now, most orgs are asking AI to do serious work with the context equivalent of "What feature should we build?" and then being disappointed by the obvious or off-target answers.

Context Engineering

Agents can't magically see your architecture.
In practice, there are three mechanisms ranging from default to deliberate:

Agentic search (default).
The agent harness uses terminal tools and semantic search to explore and understand on its own. You hope that it finds what matters and doesn't miss anything.
Manual context engineering.
Engineers build bespoke context bundles per task—picking the right docs, configs, and code by hand. Quality jumps, but there is a learning curve and the process is time-consuming.
Systematic context layer (deliberate).
You invest in hierarchical summarizing your codebase and curating critical architectural and product knowledge into a web AGENTS.md files throughout your codebase.
Agentic search delivers results as good or better than manually curated context.

To make this concrete, let's walk through a real-world example and then see how each approach behaves.

The Task

Let's imagine a Java/Spring monolith for an ad server.

At its core, an ad server needs to pick personalized ads that satisfy advertiser requirements, placement constraints, and policy rules—all in real-time.

The task is:

"Update the system so that users under 18 never see personalized ads, but they can still see non-personalized ads, and we can still track impressions."

Any senior engineer knows this is not a quick change in one file.
To see why, let's break down the architecture. On a real system, this usually touches at least:

Policy Layer: The source-of-truth product definition (Docs branch)
App Logic: The targeting rules and user signals (Src branch)
Enforcement: The serving pipeline (Auth & Utils)
Config & Experiments: The guardrails (Conf branch)

Here's a simplified topology map visualizing this architecture, where relevant files (Signal) are distinct from boilerplate (Noise). Hover over the nodes to explore:

The System Map

A simplified topology of our Ad Platform codebase

📁

Repo

Ad Platform Root

📄

Docs

Documentation Folder

📄

Policy

policy-under-18.md

🗑️

Old

Deprecated docs (noise)

🗑️

Src

Source Code

📄

Auth

Auth Service (Signal)

🗑️

Utils

String Helpers (Noise)

📄

Conf

Config Files

📄

Env

Prod/Dev Envs

🗑️

Logs

Log Config (Noise)

Now let's look at how each approach handles this.

Agentic Search

"It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so."
-Mark Twain

This is the default pattern in agent harnesses today. The agent has tool-based search (list_files, grep, read_file) and semantic search (an embedding index over your files), and it explores your codebase autonomously.

For our toy example, it would probably succeed: it's small enough that the agent can explore most of the relevant branches, find the logic and policy nodes, and make reasonable changes.

But agentic search performance varies dramatically with codebase size:

Small service (fits in context):

The agent can explore thoroughly and find most relevant code
Changes are usually correct
But it still misses the "why": architectural decisions, historical context, downstream dependencies that aren't obvious from code alone
In microservice systems, this becomes painful fast — the agent doesn't see how Service A's implicit contract affects Service B

Medium service (partially fits):

You're playing a slot machine: maybe the agent finds the right files, maybe it doesn't
Each exploration step and dead end fills the context window with noise that degrades output quality
The agent is unlikely to infer architectural patterns unless you tell it explicitly
Coverage becomes accidental: whether it touches the Config or Experiments branches is essentially a dice roll based on similarity scores

Large legacy codebase (millions of tokens):

Complete exploration is impossible — the agent will miss critical files
It has no concept of "what must never happen" or enforcement boundaries
Patterns and architecture get broken because the agent treats everything as isolated code
Exploration cost guarantees hallucinations
Engineers spend more time reviewing than they would have writing it themselves

The core limitation isn't the tools — it's that code doesn't capture intent, history, or the tribal knowledge of "why things are this way."

An agent can see that the Targeting code exists, but not that it's the single enforcement point for an under-18 policy, or that Experiments must never override it.

Agentic Search

Ready to run simulation

📁

Repo

📄

Docs

📄

Policy

🗑️

Old

🗑️

Src

📄

Auth

🗑️

Utils

📄

Conf

📄

Env

🗑️

Logs

Context Window

Signal Quality

Time Spent

<1 min

Manual Context

"Give me six hours to chop down a tree and I will spend the first four sharpening the axe"
-Abraham Lincoln

This is the opposite extreme: your best engineers act as human context routers, hand-curating exactly what the agent sees.

When done well, this unlocks capabilities that blow people away:

You can include what matters beyond code: product requirements, architectural decision records, cross-service integration points, deployment constraints
You can work across services: Pull in the relevant contracts from Service A, Service B, and Service C, and the agent can reason about the entire flow
The agent infers your patterns: Force-feeding it your code helps it understand higher-level architectural patterns and conventions
Quality can be stunning: A well-assembled context pack can enable the agent to one-shot entire features that would take you hours to write manually

Spending 60–90 minutes assembling the perfect context bundle is absolutely worth it if the agent then delivers production-quality work in one shot.

But there are real tradeoffs:

Massive skill gap: The difference between a novice and expert context engineer is 10x or more. Most engineers don't know how to curate context well.
It's time-consuming: Even experts need 30–90 minutes for non-trivial changes.
It's brittle: Miss one file, include an outdated doc, or dump too much context, and output quality degrades — sometimes worse than agentic search.
It's not reusable: Every context pack is a bespoke experiment that disappears into chat logs. The next engineer starts from scratch.
It doesn't scale: You've moved the bottleneck onto your most expensive people.

In other words: "AI works… but only when we put a principal engineer in front of it and treat every serious change as a one-off science project."

Manual Context

Ready to run simulation

📁

Repo

📄

Docs

📄

Policy

🗑️

Old

🗑️

Src

📄

Auth

🗑️

Utils

📄

Conf

📄

Env

🗑️

Logs

Context Window

Signal Quality

Time Spent

<1 min

Systematic Context

"If I have seen further, it is by standing on the shoulders of giants."
-Isaac Newton

This is where you take the skills from manual context engineering and bake them into reusable infrastructure.

Instead of hoping the model reconstructs your architecture from raw files (agentic search), or relying on senior engineers to hand-wire context every time (manual context), you build a thin, token-efficient context layer over your codebase once, and every engineer and agent benefits from it automatically.

How it works:

You apply manual context engineering principles at the architecture level:

Chunk the codebase semantically
Group files and directories into meaningful "subsystems" (e.g. "ad-serving pipeline", "targeting rules", "user/consent model"), not just whatever happens to share a parent folder.
Create condensed reference documents
Write AGENTS.md files at subsystem boundaries that capture what the code can't: responsibilities, invariants, policies, history, "why things are this way," which files/configs/APIs are hot, and how this chunk interacts with others.
Hierarchically summarize and downlink
Root-level AGENTS.md provides a system overview and links down to mid-level subsystem guides. Mid-level guides link down to leaf-level details where needed. This enables progressive disclosure: agents (and humans) can efficiently explore without upfront manual curation.
Respect token budgets
Each AGENTS.md is sized to fit comfortably in a model's optimal context window, so you get rich signal without blowing the budget.

The payoff:

Automatic context for every task: Because agent harnesses automatically read AGENTS.md files, and these files downlink to their children, every request by every engineer gets the benefit of an expert-curated context pack — for free, with no extra effort.
Raises the floor: This essentially elevates every agent interaction to the level of a tenured senior/staff engineer who knows where the bodies are buried.
Reusable infrastructure: Unlike manual context packs that disappear into chat logs, this becomes living documentation that compounds in value.
Progressive disclosure at scale: Agents can efficiently navigate even million-token codebases by following the hierarchy instead of blindly searching.

The tradeoffs:

High upfront cost: Building this layer properly across an entire codebase requires significant time investment.
Requires expertise: Getting the layer performing optimally requires deep knowledge of both context engineering and your system's architecture.
Maintenance overhead: The layer needs to stay in sync with the code as your system evolves.

But once it's in place, every subsequent task reaps the benefits at zero marginal cost.
When you ask:

"Update the system so that users under 18 never see personalized ads…"

The agent doesn't start with a blind grep. It:

Starts from the root document. Immediately gets a bird's eye view of the system
Automatically searches downlinks to high-signal summaries for relevant subsystems
Uses tools and semantic search inside that scoped area
Proposes sweeping changes consistent with intent and invariants

It follows the system map, enforces patterns everywhere it matters, and avoids common pitfalls.

Systematic Layer

Ready to run simulation

📁

Repo

📄

Docs

📄

Policy

🗑️

Old

🗑️

Src

📄

Auth

🗑️

Utils

📄

Conf

📄

Env

🗑️

Logs

Context Window

Signal Quality

Time Spent

<1 min

What This Means

These three approaches aren't just different ways of working—they are categorical step-function jumps in capability, corresponding to stages 2, 3, and 4 of the AI Adoption Roadmap.

Agentic search (Stage 2) is where most teams are today. It’s great for local edits and small refactors, but it hits a wall on cross-service work.
Manual context engineering (Stage 3) is the "power user" phase. The gap in capability here is massive—often a 10x improvement in the breadth, depth, and quality of results. A few senior engineers can get the AI to perform incredible feats by hand-curating high-signal context.
A systematic context layer (Stage 4) is where you democratize that 10x advantage. Instead of relying on individual brilliance, you build shared infrastructure that lifts the whole team.

In practice, moving to a systematic context layer means:

Everyone gets the "10x" results. You don't need to be a prompt engineering wizard to get great output. The system provides the expert context automatically, so every engineer gets the benefit of your best architect's knowledge on every prompt.
It scales your best engineers. Instead of answering the same questions or fixing the same bugs, your senior staff encodes their understanding into the context layer once, and the AI uses it forever.
It future-proofs your stack. This layer is a distillation of your specific data and intent. It doesn't just unlock current models; it will make every future model generation more effective. As models get smarter, they will be able to leverage this structured context even more effectively.

Context Is the Ceiling

There is a misconception that we are waiting for "smarter models" to solve our complex engineering problems.

The capability is already there.

Modern models are already capable of incredible reasoning and complex problem-solving. The only thing preventing you from accessing that capability right now is the inability to supply them with the correct context.

When we hand an AI a giant repo and say "fix this," we are effectively blindfolding a genius.

Agentic search lets them fumble around and feel the shape of the room.
Manual context engineering takes the blindfold off for a specific task.
A systematic context layer turns the lights on permanently.

The core belief behind Intent Systems is simple:

If the AI sees what your best engineers see, it can perform like your best engineers. If it doesn't, it won't.

This kind of context-aware behavior will likely be embedded into out-of-the-box tools in the next 12–18 months—we'll likely see IDEs that automatically index AGENTS.md hierarchies or "Context Protocols" that standardize how tools share this map. But the gap between teams who figure this out now and those who wait has never been wider.

There are companies operating at this level right now. They are getting 10x better results from the exact same models you are using, simply because they have solved the context bottleneck.

Getting there happens in two steps:

Teach your engineers to do manual context engineering on purpose, not by accident.
Turn that hard-won knowledge into a shared, token-efficient context layer.

These articles are the playbook to run that journey yourself. If you want to get there faster—through live training or hands-on design of your context layer—that’s exactly what I work on at Intent Systems.

Ready to level up context engineering?

Let's talk about teaching your team deliberate context engineering and turning it into a reusable context layer over the code you already have—no refactors required.

Get in touch

Up Next

Deep Dive

The Intent Layer

How to build a token-efficient context layer that makes agents perform like your best engineers.

Interactive Guide

The AI Adoption Roadmap

Where is your organization on the journey to AI-native engineering? Find out with our detailed roadmap.

The System Map

Ready to level up context engineering?

Up Next

The Intent Layer

The AI Adoption Roadmap

Get notified when I publish