QRefAI
Contents
AI Coding

Part 1 — Foundations

What is an agentic harness, and why does it take months to build?

4 min · Updated June 2026

If you’ve been handed an AI coding tool and told to “make it enterprise-grade,” this is where to start. Before you touch a config file, you need a mental model of what you’re actually building — because the answer is not a config file.

Q1.1 — What is an “agentic harness” or “agentic OS,” and why is it more than a config file?

An agentic harness is the engineered environment around a coding agent that determines whether the code it produces is safe, consistent, compliant, and aware of how your organization actually works. The model is the engine; the harness is the car, the road, and the traffic laws.

The trap most teams fall into is thinking the harness is “a good CLAUDE.mdfile.” It isn’t. The 2026 consensus — captured in Danar Mustafa’s May 2026 essay on running Claude Code at enterprise level — is blunt: successful deployments invest weeks to months of dedicated engineering time building the harness before broad rollout, and then keep investing as the codebase and models evolve. The harness is a platform-team product with a backlog, an owner, and a release cycle.

Think of it as a layered stack, not a file:

  • Context layerwhat the agent knows about your project every session (instruction files)
  • Capability layerwhat the agent can do and look up (skills, MCP servers, code-intelligence tools)
  • Control layerspecialized personas and deterministic gates (subagents, hooks, event-driven workflows)
  • Governance layerwhat the agent is allowed to do and how you prove it afterward (managed policy, enterprise controls, audit pipelines)

Everything in this article is an expansion of those four layers.

Q1.2 — What are the core primitives, and do they exist on both Claude Code and Copilot?

Yes — and that symmetry is the single most important fact for building one reusable harness. There are seven primitives, and they map almost 1:1 across the two vendors.

PrimitiveWhat it doesClaude CodeGitHub Copilot
Instruction memoryPersistent project context loaded every sessionCLAUDE.md (hierarchical, supports @import).github/copilot-instructions.md, instructions/*.instructions.md, AGENTS.md
Custom agentsSpecialized personas with scoped tools/model/prompt.claude/agents/*.md.github/agents/*.agent.md
SkillsOn-demand, progressively-disclosed task playbooks.claude/skills/<name>/SKILL.md.github/skills/<name>/SKILL.md (same standard)
PluginsA bundle of skills + commands + agents + hooks + MCP for distribution.claude-plugin/plugin.json via marketplacesOrg Copilot marketplace synced from a private repo
Hooks / eventsDeterministic gates on the agent's lifecycle21 lifecycle events, 4 handler typesAgent hooks (preview) + Actions + Agentic Workflows
Tools / MCP serversExternal tool and data access via the Model Context Protocolclaude mcp add (stdio/HTTP/SSE)mcp-config.json, .vscode/mcp.json, GitHub MCP Registry
Settings / policyEnterprise enforcement that users can't overridemanaged-settings.json + MDMGitHub Enterprise AI Controls + Copilot policies

Because the primitives line up, you can author most assets once in a vendor-neutral form and compile them to both targets. That’s the core thesis of Part 6.

Q1.3 — Why did “context engineering” suddenly become the central discipline in 2026?

Because the bottleneck moved. Models stopped being the limiting factor for routine coding work; what you put in front of them became the limiting factor. Two things crystallized this.

First, progressive disclosure (see Q1.4) made it cheap to install dozens of capabilities without drowning the context window — so the question shifted from “can I add this knowledge” to “how do I structure knowledge so the agent finds the right piece at the right moment.”

Second, and counterintuitively, research showed that more context is often worse. The most rigorous 2026 evidence is the ETH Zurich / LogicStar.ai paper “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” (Gloaguen et al., arXiv:2602.11988, February 2026). Its finding is uncomfortable: context files tended to reduce task success rates compared to giving the agent no repository context at all, while increasing inference cost by over 20%.

The practical takeaway isn’t “don’t write instruction files.” It’s “write lean instruction files and push detail into skills the agent loads only when relevant.” Context engineering is the discipline of deciding what loads always, what loads on demand, and what never loads unless a specific step needs it.

Q1.4 — What is “progressive disclosure” and why should I care as a harness builder?

Progressive disclosure is the loading model Anthropic formalized with Agent Skills, and it’s now the dominant context-engineering pattern across the industry. It works in three layers:

  1. 1.Discovery — always loaded, tiny. At startup the agent scans the YAML frontmatter of every installed skill — roughly 80 tokens each (a name and a description). With 40+ skills installed, that's only around 1,500 tokens total.
  2. 2.Activation — loaded when relevant. When the agent decides a skill applies, it reads the full SKILL.md body — a median of about 2,000 tokens across Anthropic's official skills.
  3. 3.Reference — loaded only when a step demands it. Supporting files — references/, scripts/, assets/ — are read only when the agent actually executes a step that needs them.

Why this matters for harness builders: it’s what makes a large harness viable. You can ship a deep library of organizational knowledge without paying for all of it on every prompt.

It also tells you where to put things:

  • Knowledge that's always relevantthe (lean) instruction file
  • Knowledge that's situationalskills
  • Bulky material (long checklists, code templates, scripts)skill reference files

This three-tier routing decision is one you’ll make for every piece of organizational knowledge you encode. Get it right and your harness scales. Get it wrong and you pay for context you don’t need, and the agent ignores what you paid for.