← Notes · Architecture · 18 April 2026 · ~8 min read

The three-agent loop versus NCDSMO raise-the-bar.

The National Cross-Domain Strategy and Management Office wants counter-tested content inspection at the classification boundary. One model cannot counter-test itself. Two models voting yes together cannot either. Here is the architectural shape that clears the bar, and what “architectural separation” technically commits you to.

A single language model, asked to review its own output for classification concerns, is doing the equivalent of a law firm representing both sides of a dispute. It might give you the right answer. It might give you a wrong answer you can’t distinguish from the right answer. The question you can’t resolve from the outside is how you would know. Cross-domain review is a setting where “how you would know” is the entire job.

This is the premise behind the NCDSMO’s raise-the-bar directive. The bar isn’t “an AI checked it.” The bar is counter-tested: the content was examined by a mechanism whose design prevents it from finding what it wants to find. For four decades, cross-domain solutions achieved this with static rule engines and the human eye. Adding an LLM doesn’t clear the bar — it just adds a new surface that has to be counter-tested in the same way the content itself is. The rest of this note is about what architectural shape does that.

Three approaches. Only one passes.

There are three patterns in production right now in federal AI content inspection. It’s worth being explicit about why the first two are architecturally insufficient for raise-the-bar even when they perform well on accuracy benchmarks.

Pattern A · Single model with a safety filter

A model produces the classification call. A rule-based filter or a second pass of the same model checks the output against a sensitivity list. Widely deployed in commercial content moderation; familiar from consumer AI products. Fails counter-tested at the architectural level: the filter is a bolt-on, not an opposing party. If the model has a blind spot, the filter — tuned against the same model’s outputs — usually inherits it. If the filter has a gap, the model cannot detect it. There is no mechanism of disagreement. The one-paragraph summary an inspector would write is: the system agrees with itself.

Pattern B · Ensemble with consensus

Three or five models run the same prompt. Their determinations are combined by majority vote or by confidence-weighted averaging. More robust than Pattern A against random model error. Still fails counter-tested: the ensemble is optimized to produce a confident single answer. What you gain in stochastic noise rejection, you lose in adversarial coverage. An input that nudges the whole cohort in the same direction produces a confident, consensus-wrapped, wrong answer. Worse, the ensemble collapses the reasoning into a single output; the inspector has no way to see the disagreement that would have been diagnostic. Ensembles answer the question “what should I call this?” They do not answer the question “what is the case against this call?”

Pattern C · Adversarial three-agent loop

Two agents are instructed to argue opposing positions, with architecturally enforced separation so that neither can observe the other’s internal state. A third agent, the Arbiter, reasons over the transcript of their exchange and issues a determination with citations. This passes counter-tested — by construction. The two opposing agents are structurally incapable of agreeing with themselves, because they are not one system. The Arbiter is constrained to work from artifacts: each agent’s stated claims, each agent’s cited policy references, and the disputation transcript between them. The inspector’s paragraph now reads: the determination advanced through a contest it could not have advanced through if the parties had colluded.

The difference between Pattern C and the two failures is not one of model quality. All three patterns can use the same underlying models. The difference is that Pattern C treats “no collusion” as an architectural property, not a behavioral assertion.

What “architectural separation” technically means.

The phrase does real work, and it’s worth saying what it is and what it isn’t. Architectural separation between the Advocate and Defender agents means, at minimum, the following:

1 · No shared process

The two agents run in separate execution contexts — separate containers, separate memory spaces, separate inference sessions. The infrastructure treats them as two distinct workloads. This rules out the failure mode where a single process context leaks state between reasoning passes.

2 · No shared prompt context

The Defender does not see the Advocate’s prompt. The Advocate does not see the Defender’s. Each receives the source content and its own role specification and nothing else. The Arbiter receives their outputs — as artifacts, not as rolling conversation — and reasons against those. No agent has a mental model of what the other agent might be doing beyond knowing that a counterparty exists.

3 · No reflection on the counterparty

Neither the Advocate nor the Defender is instructed to anticipate or respond to what the other will say. Each is asked to make the strongest case for their assigned position, given the content and their role specification. Reflection on the counterparty — “what would the Defender argue against this?” — is explicitly out of scope, because it would let each agent internalize the disputation and produce an output pre-adjusted for the counter-argument. The disputation has to be external, on the artifacts, to be counter-tested.

4 · Two-person rule compatibility

Because the agents are architecturally separate, it is trivial to run them on different models, different infrastructure, different authorities. In a high-assurance posture, you can literally have two different organizations run the two agents — each signing their own brief, each logging to separate ledgers — and only the Arbiter output requires integration. This is the two-person rule, implemented architecturally instead of procedurally.

5 · Artifact-based arbitration

The Arbiter reasons from structured artifacts: the Advocate’s brief, the Defender’s brief, cited policy references, the disputation record. It does not have a privileged view of either agent’s reasoning trace. This constraint means the Arbiter’s determination can be reproduced by any competent reviewer given the same artifacts. It also means the determination carries its own audit trail — the artifacts are preserved, the disputation is preserved, the citation chain is preserved.

Why “adversarial” beats “ensemble.”

There is a seductive argument for ensembles that runs: “If I want robustness, I should make the models agree. Disagreement is a bug.” This is right for some workloads and catastrophically wrong for classification-boundary review. The reason is that the value of the system under adversarial conditions comes from the disagreement, not despite it.

The question a cross-domain reviewer is trying to answer is not “what should I call this?” It is “what is the best case against calling it that?” An ensemble erases the case against. An adversarial loop preserves it. The operational distinction

A determination that comes with a recorded counter-argument survives adversarial testing because the counter-argument is itself a piece of evidence. When an OCA or an inspector later asks “why did we release this?” — which is the question that retires careers — the answer is no longer “the model thought it was fine.” The answer is “here is the case for release, here is the case against, here is how the conflict was resolved, here are the citations.” That is a defensible record. An ensemble’s 0.94 confidence score, alone, is not.

The architecture generalizes.

The interesting property of the three-agent loop, once it’s committed to, is that the same architecture holds the shape of every high-stakes adjudication we have deployed against.

Federal healthcare claims: Advocate argues for payment against coverage policy. Defender argues against on coding, medical necessity, or fraud-indicator grounds. Arbiter issues a determination with citation to the relevant coverage policy and clinical documentation. This is what runs today at FedRAMP High under a VA National ATO at nation-scale.

Cross-domain release: Advocate argues for release. Defender argues on sensitivity, source/method, or mosaic-risk grounds. Arbiter issues a downgrade determination with citation to the SCG, CAPCO register, and relevant executive orders. This is what the Active Review demo runs.

Permitting, inspection, reshoot approvals, utility reroute authorization, grant disbursement, regulatory compliance review — every one of these has the same skeleton: an argument for advancing a determination, an argument against, a policy corpus, and a human who has to sign. The architecture holds. The domain changes; the shape doesn’t.

What the AO should ask about any AI cross-domain proposal.

If you are the Authorizing Official reading an AI content-inspection proposal in the next twelve months, there are five questions that will sort Pattern A and Pattern B from Pattern C in one meeting:

  1. Do the inspection components run in separate execution contexts, or one? If one, it is not counter-tested.
  2. Do the opposing components see each other’s prompts or reasoning state? If yes, they can internalize the disputation. Not counter-tested.
  3. Is the final determination produced by a vote, by an average, or by an arbiter that reads artifacts? Only the third is counter-tested.
  4. Can you produce the disputation transcript for an inspector six months after the determination? If not, the determination is not reconstructible.
  5. Are the citations in the determination verifiable against a policy corpus you control? If the citations are generated and ungrounded, they are not citations; they are ornamentation.

Any proposal that cannot answer all five doesn’t clear NCDSMO’s bar, regardless of how the marketing describes it. Any proposal that can is worth the AO’s time.


Previous notes in this series: Active AI-RMF isn’t RMF with an AI sticker and Zero egress is a measurement, not a marketing line. To see the three-agent loop run a full cross-domain case in five minutes, the Active Review demo is the fastest way.