The Sovereign Harness: Engineering Custom AI Agency
Engineers own their own agent harness by shifting away from simply "renting" generalized, ready-made CLI tools like Claude Code, Codex, or OpenCode, which ultimately cap their potential and limit their "ceiling" [1]. Instead, they engage in harness engineering — the practice of taking an underlying LLM and combining it with specific tools, context (data and instructions), and environment details to build a highly optimized program for a specific task [2].
How engineers take true ownership
- Domain-specific specialization. Custom harnesses tailored for specific workflows — DevOps, testing, billing, or code verification. This "one tool, many versions" approach creates a specialized competitive moat [1].
- Composing advanced features. Integrating composable skills, secure sandboxing, sub-agent delegation, model fallbacks, and multi-agent orchestration — such as a multi-tiered UI team [1].
- Leveraging orchestration frameworks. Systems like Mario Zechner's Pi agent or frameworks like Gas City implement a provider protocol that swaps different LLMs and agents in and out based on task suitability, cost, and quality [3, 4].
By fully owning and customizing the harness, engineers effectively turn basic AI models into highly tailored "factory workers" that can be plugged directly into autonomous software factory pipelines [4, 5].
Dark Factories and the Parabolic Ascent
A "dark factory" — also known as an ADW (AI Developer Workflow) — is the core mechanism of the Software Factory pillar in agentic engineering [1, 2]. Instead of engineers manually coding features by hand, a dark factory is a system composed of AI agents and code designed to systematically produce reproducible, on-spec results [2].
The factory works by templating the entire software development lifecycle, fully automating each stage. In this environment, a plan is simply treated as "a prompt scaled" [1, 2].
The seven automated stages
Plan
A plan is treated as "a prompt scaled."
Scout
Survey the codebase and gather the context required.
Build
Generate the implementation against the spec.
Validate
Check the output against the intended specification.
Test
Exercise the result for correctness.
Review
Inspect for quality and on-spec compliance.
Release
Ship reproducible, on-spec results.
By utilizing these automated workflows rather than relying on human-speed coding, an engineer's output per unit of time goes parabolic [1, 2]. This massive acceleration shifts the engineer's focus from manually coding features to building "the system that builds the system" — and it is the critical stepping stone pointing toward Zero-Touch Engineering, where a prompt can be sent straight to production without a human in the loop [1, 3].
According to the sources, building these dark factories will soon become a requirement at top companies.
On the inevitability of software factoriesThe Architecture of Adaptability
Extensible software is the third compounding pillar of agentic engineering [1, 2]. Because the landscape of AI models, tools, prompts, and technology is changing at such a rapid pace, the best defense for engineers is to build adaptability directly into the software itself. The guiding principle is that software must be "open to extension, closed to modification" [1, 2].
Achieving a pluggable design
- Apply extensibility everywhere. Build the principle directly into both your internal engineering tools (the agent harness) and your final production products [2].
- Build custom, specialized harnesses. Rather than relying on rented tools that cap your potential, build "one tool, many versions" — swappable harnesses for DevOps, testing, billing, and verifying [3].
- Compose modular features. Make the harness adaptable with composable skills, sub-agent delegation, model fallbacks, and multi-agent orchestration [3].
- Avoid rigid coding structures. Steer clear of "brittle, vibe-coded slop" and "cascading if-statements." Hardcoded, inflexible logic will be crushed by the rapid pace of agentic speed [1, 2].
By ensuring individual components can be swapped out or extended without altering the core system, these structures survive the constant changes happening at agentic speed. If engineers ignore this principle, those inflexible systems will ultimately fail and be "crushed" by the rapid technological evolution [1, 2].
The Arbitrage Rule for Autonomous Agents
The strict scaling rule for always-on (AFK) agents is that you should only allow them to run 24/7 after you have successfully proven and nailed the token arbitrage loop [1-3]. You must first ensure the agents are creating actual value rather than just maximizing token usage [1, 2].
Scaling before proving this value capture is a mistake. An estimated 90% of always-on cron agents are "dead useless, just burning cash" [3]. However, once you have secured a successful arbitrage loop — meaning the output value consistently exceeds the token cost — you can safely scale the operation [3].
At that stage, a rising API bill transforms into a productivity KPI — because the cost of the tokens tracks below the value they generate.
When cost becomes a signal of valueAgentic Access & the Token Tax
The "token tax" refers to the tokens that are wasted simply because an AI agent lacks direct, programmatic access to systems [1]. To avoid paying it, you must implement Agentic Access: giving your agents direct reach into your systems by exposing CLI tools, REST APIs, webhooks, and RPC clients [1, 3].
By doing this, agents can programmatically command the systems they need to interact with, rather than wasting tokens trying to navigate without direct access [1].
The non-negotiable guardrail
While granting this access is a requirement for achieving agentic speed, you must also set strict guardrails [1, 3]. It is critical to lock down the bash tool so that an autonomous agent can never accidentally "nuke" or destroy production databases and resources [1, 3]. By securing the bash execution environment, even agents operating autonomously cannot execute destructive shell commands that would wipe out critical production infrastructure [1-3].
The Token Arbitrage Loop
Tokenomics is the economic engine that transforms a software factory from a technical tool into a viable business [1, 2]. It shifts the engineer's role from simply building systems to running those systems as a business [3, 4]. The core idea: anyone can run automated agents to burn tokens — but the true goal is to generate useful tokens [5].
A three-level funnel
Use more tokens
Also known as "token-maxing." The baseline or "floor" — a great starting point, but a terrible place to finish [5].
Make tokens valuable
Roll token usage into actual products and workflows. Most teams get stuck here — using a lot of tokens but struggling to extract real value [1, 5].
Capture the revenue
The ultimate goal — the value created by the agents is successfully converted back into real-world revenue [5].
The token arbitrage loop is the mechanism to reach the third level. An engineer essentially buys a token for $1 (the API cost), runs it through their business process or software factory, and sells the resulting output for $2 [1, 5]. This arbitrage acts as what is described as "an infinite cash generating glitch, also known as a business" [5]. Even a very thin margin works — spending $1 on tokens to create $1.10 of value and capturing that $0.10 margin [5].
Scaling the loop
Nailing this arbitrage loop is a strict prerequisite for scaling. You should only allow agents to run 24/7 (AFK) after you have proven the arbitrage is successful, as roughly 90% of always-on agents are otherwise just "burning cash" on useless tasks [6]. Once the loop functions correctly, a rising API bill stops being a cost concern and instead becomes a productivity KPI — because the cost of the tokens will always track below the actual value they are generating [1, 6].
From Vibe Coding to Zero-Touch Engineering
The evolution of software engineering from vibe coding to Zero-Touch Engineering (ZTE) represents a shift from operating AI agents manually to building fully autonomous systems — moving from the "floor" of AI assistance to its ultimate "ceiling" [1, 2]. This progression unfolds across several defined stages.
Vibe Coding
From ~March 2025 with tools like Claude Code: engineers sit in terminals manually prompting multiple agents back and forth. The "lowest-hanging fruit" — and a terrible place to stop [3, 4].
The 5 Compounding Pillars
Adopt the mantra to "build the system that builds the system": the agent harness, software factories, extensible software, always-on agents, and agentic access [5, 6, 8-15].
Tokenomics & Token Arbitrage
The economic engine. Engineers evolve from maximizing token usage to making tokens valuable, and finally to capturing revenue — buy for $1, sell the output for $2 [13, 14, 16, 17].
Software Factories as Default (End of 2026)
Relying on software factories becomes the industry default rather than an edge case. The engineer writes a prompt and reviews near-production output [5, 6, 18, 19].
Zero-Touch Engineering
The final destination and asymptote. "Super, super advanced" — a direct prompt-to-production pipeline with absolutely no human in the loop. The human engineer is completely removed [18, 19].
In ZTE, the senior engineer's value is no longer measured by the code they ship directly, but by their ability to architect the autonomous systems that generate it [8]. Because the underlying ADWs or "dark factories" can autonomously plan, scout, build, validate, test, review, and release reproducible, on-spec results, human oversight in the actual coding and deployment process becomes obsolete [6].
The senior engineer focuses entirely on the five pillars — designing custom harnesses, building extensible software, and managing tokenomics to run the system as a profitable business.
The new mandate