Case Study · Claims Genomics Model

01 · The problem

Five parties. Conflicting incentives.
Every determination under live inspection.

Federal healthcare claims adjudication is the most adversarial multi-party decision environment we could find inside U.S. civil government. Five parties — patient, provider, payer, regulator, inspector general — have standing to challenge any single determination. Each has a different definition of “correct.” Each is staffed, funded, and empowered to re-litigate calls they disagree with. This was the selection criterion for our first production deployment: we wanted the hardest environment we could find, on purpose, because we wanted the architecture to generalize.

Patient

Beneficiary

Wants coverage, clarity, timely access.

Provider

Clinician / hospital

Wants reimbursement for services rendered.

Payer

VA / health plan

Wants to pay the right claim, once, at the right rate.

Regulator

CMS / policy office

Wants policy-compliant determinations, auditable at scale.

Inspector General

OIG / oversight

Wants fraud surfaced and every call reconstructible.

The baseline we displaced was the industry standard: rule-engine scoring plus human adjudicators plus periodic statistical audit. It had three failure modes. First, the rule engine couldn’t capture the policy’s analytical nuance, so rejections defaulted to “insufficient information” and clinicians got form letters instead of answers. Second, human adjudicators varied by reviewer and by week, which the OIG scored as inconsistency risk. Third, the audit trail was a decision plus a single-line justification, which is not a reconstructible record when a determination is challenged two years later.

02 · The architecture

A three-agent adversarial loop —
briefed into an append-only evidence ledger.

What was deployed is the architecture the rest of the site describes: a model-agnostic runtime running inside the VA’s accreditation boundary, with three agents adjudicating every determination in parallel. For claims, the agents are:

Advocate

Builds the case for paying the claim as submitted. Cites the applicable coverage policy, the clinical documentation, the prior-authorization history, the precedent of analogous claims. Produces a releasability brief: here is why this determination should advance.

Defender

Builds the case against. Cites policy exclusions, coding discrepancies, medical-necessity concerns, duplicate-submission patterns, referral gaps, and the fraud-indicator corpus. Runs in architectural isolation from the Advocate — they cannot observe each other’s internal state.

Arbiter

Weighs the two briefs against the policy hierarchy, issues a Glass-Box determination, and seals the entire record — both arguments, the adjudication, the citations, and a cryptographic anchor — to an append-only evidence ledger. The determination is then presented to a human verifier who either signs or overrides with a logged rationale.

The runtime is model-agnostic. The underlying language and reasoning models are open-weight, hosted inside the VA accreditation boundary, and can be swapped without breaking the evidence chain. Zero bytes of patient data leave the boundary at inference time. Every determination carries its own citation chain, model-identity hash, and verifier signature.

Every determination the system ships carries its reasoning with it. The engine runs at FedRAMP High under a VA National ATO — $250 billion adjudicated per year, roughly 200 million claims across about 400 health systems, in production since 2025. This is the architecture you see operating in the three loops on the homepage — the same engine, generalized. From the CompositeApps origin story

03 · The deployment

Four phases. Signed at each gate.
The same shape we run today.

The activation path we describe in the engagement page — Designate, Install, Calibrate, Cutover — is not a notional template. It is the actual shape of the Claims Genomics deployment, abstracted. In the CuraPatient context, each phase produced an artifact the VA sponsor and the accreditation team signed:

Designate: a scoped work statement naming the specific claim classes in scope, the sponsor, and the go/no-go criteria for production cutover.
Install: the runtime deployed inside the VA accreditation boundary, with an egress baseline filed as accreditation evidence. Open-weight models were loaded and hashed. The policy pack — coverage policies, coding references, fraud indicators — was loaded as data, not baked into code.
Calibrate: shadow mode against live claims for an extended window. Every determination the agents produced was compared against the senior reviewers’ determinations. Disagreement patterns were logged, reviewed weekly, and used to tune the agents’ interpretation of the policy corpus — never to tune the policy itself.
Cutover: the verifier workflow went live. The human reviewer read the structured transcript and signed, or overrode with logged rationale. The runtime began producing evidence that the OIG, the CMS policy office, and the accreditation boundary owner could each inspect on their own cadence without coordinating with us.

04 · The outcomes

Every determination reconstructible. Zero bytes egressed.

The outcomes we can state without qualification, based on the accreditation boundary itself:

Outcome

Result

How we know

Accreditation posture

FedRAMP HighUnder VA National ATO

Accreditation package on file with the VA.

Egress

0 bytesAt inference time

Packet capture filed as accreditation artifact.

Evidence posture

Per-decisionAppend-only ledger

Any determination reconstructible end-to-end.

Scale

$250B / yr~200M claims · ~400 health systems

In production since 2025.

Operator impact

[TIME_DELTA_TBC]Review time per case

Baseline manual vs. verifier workflow.

Agreement rate

[AGREEMENT_TBC]Agent vs. senior reviewer

Weekly calibration reports on file.

The bracketed cells above reflect operational metrics that exist in the VA’s production reporting and that we are in the process of clearing for public citation. When those are released, this page is updated and the numbers are linked to the source of record.

05 · Why it generalizes

The architecture doesn’t care
what’s being adjudicated.

The Claims Genomics deployment is not a healthcare product. It is a composite-layer architecture that happened to be battle-hardened inside healthcare because healthcare is the hardest multi-party adjudication environment we could find in U.S. civil government. The agents — Advocate, Defender, Arbiter — do not encode any domain-specific assumption. They encode a shape: arguments are briefed against a policy corpus, an arbiter adjudicates with citations, a human signs, the evidence is sealed.

That shape is what Cross-Domain Review looks like. It is what utility outage rerouting looks like when the call has to survive inspection. It is what a studio post-production reshoot decision looks like when the call can trigger an insurance claim. Every one of those environments has the same skeleton: multiple parties, conflicting incentives, a determination that has to be defensible, an audit trail that has to be reconstructible. The architecture generalizes not by adding new features, but by pointing the same three agents at a different policy corpus.

That is the argument behind every other page on this site. This one is the reason the argument is credible — because it is already running.

Next

Want to see the same architecture run against a different workload? The Active Review demo shows it adjudicating a Five-Eyes coalition release. The engagement page is what a 90-day activation looks like if you designate one.

For a briefing: long.nguyen@compositeapps.net

The Claims Genomics Model —bytes moving atoms, at nation-scale.

Five parties. Conflicting incentives.Every determination under live inspection.

A three-agent adversarial loop —briefed into an append-only evidence ledger.