Self-Harness: When AI Agents Rewrite Their Own Operating Rules

Researchers at the Shanghai Artificial Intelligence Laboratory have introduced Self-Harness, a new paradigm in which an LLM-based agent systematically improves its own operating rules. By examining its own execution traces to apply edits, the system trades manual guesswork for empirical evidence, boosting performance by up to 60%.

Not every company can or should build their own frontier AI language model. However, the harness controlling the model is something that most enterprises can and should customize for their specific purposes. Agent harnesses are still largely tuned through manual, ad hoc debugging—a process that relies heavily on intuition rather than systematic feedback loops, making it difficult to keep pace with rapidly evolving LLMs.

The Challenge of Harness Engineering

An LLM-based agent’s performance is not determined solely by its underlying base model, but also by its harness: the surrounding system that provides context and enables the model to interact with the environment. A harness includes components like system prompts, tools, memory, verification rules, runtime policies, orchestration logic, and failure-recovery procedures.

This layer is crucial because many common agent failures stem from the harness rather than the model. For example, an agent may report success without checking the model’s response, or it might retry a failed action repeatedly. The harness is also responsible for preventing context rot or overload when the agent’s interaction history grows very large.

According to Hangfan Zhang, lead author of the Self-Harness paper, the true bottleneck of manual engineering is that it relies heavily on ad hoc debugging rather than a verifiable, empirical feedback loop. “Many edits are made based on intuition, a few observed failures, or ad hoc debugging,” Zhang explained.

How Self-Harness Works

The Self-Harness paradigm enables an LLM-based agent to improve its own harness without relying on human engineers or stronger external models. This continuous self-evolution is driven by a three-stage iterative loop:

Weakness mining: Starting from an initial harness, the agent runs a set of tasks, producing execution traces with verifiable outcomes. The agent categorizes failed traces and tries to detect model-specific failure patterns.
Harness proposal: Based on these failure patterns, the agent uses a “proposer” role to generate a set of diverse yet minimal harness modifications, each tied to a specific failure mechanism to avoid overly general corrections.
Proposal validation: The system evaluates candidate modifications through regression tests. An edit is promoted only if it improves performance without causing measurable degradation on held-out tasks. If multiple candidate modifications pass the regression tests, they are merged into the next version of the harness.

The results are compelling: across multiple benchmark datasets, Self-Harness improved agent performance by 25-60% relative to the baseline harness, with the largest gains on tasks where the original harness had significant blind spots. Self-improving harnesses can enable development teams to deploy robust custom agents that continually adapt their own execution protocols to overcome model-specific weaknesses.

The research is available on arXiv at arxiv.org/abs/2606.09498.