Activation · 01
Four phases.
Ninety days.
Signed at each gate.

From designation to cutover, in the open.

No open-ended consulting engagement. Every phase has a written exit criterion, a deliverable you can point to, and a signature — yours. If a phase misses, we fix before the next one begins. If a phase exceeds, we bank the time.

A one-page work statement — signed by the authority holder.

You pick the first application. Not a category — a specific workload a specific person is accountable for. Cross-domain review. Claims adjudication. Utility outage reroute. One thread, narrow enough to ship in 90 days.

Owner
Your Authorizing Official or OCA is named in writing. We work through them, not around them.
Scope
Data boundary, decision class, and go/no-go criteria for cutover are specified before anyone touches a terminal.
Success
Three measurable outcomes — performance, agreement, governance — agreed on paper. No moving goalposts.
Output
A countersigned one-page work statement. That page, alone, moves us to Phase 02.

Runtime inside your boundary. Egress verified to zero.

Composite Core deploys on hardware you control — your FedRAMP-authorized cloud, your on-prem cluster, your air-gapped enclave. Open-weight models are loaded; you pick which ones. Policy pack loads: your SCG, your CAPCO dictionary, your OCA caveats, your domain rules.

Sovereign install
Runtime lives where your data lives. Nothing leaves. No SaaS call to us, OpenAI, Anthropic, or anyone else.
Egress test
Packet capture during a representative workload confirms zero bytes exit your perimeter. The test is filed as evidence.
Model loadout
You choose the open-weight models. We configure them. Swap any model underneath — the workflow and evidence chain stay intact.
Access
Designated reviewers provisioned. Control-plane credentials under your identity provider, not ours.

Shadow mode against real cases — reviewer stays primary.

The runtime runs every live case in parallel with your human reviewers. Humans still own the decision. The agents brief their case into the evidence ledger. We compare. Where they disagree with senior reviewers, we tune — not the reviewer, not the output, but the agent’s interpretation of your policy.

Coverage
Every case in the designated workload gets a parallel agent brief. No sampling, no cherry-picking.
Disagreement log
Every agent/reviewer disagreement is logged with structured rationale. This is how calibration actually works — and how it stays defensible.
Weekly cadence
One-hour review with your ops lead and senior reviewers. Calibration adjustments happen in the open.
Exit criterion
Agreement rate against senior reviewers reaches the threshold set in the work statement. If it doesn’t, we stay in Phase 03. No pretending.

Reviewer becomes verifier. Authority stays with the human.

Verifier workflow enabled. Agents brief; the human reviewer reads the structured case and signs. Cases that were taking 47 minutes start closing in five. Queue depth falls. Every determination ships with its own citation chain and evidence hash. Ninety-day report filed.

Go-live
You approve cutover against the Phase 01 success criteria. Not us, not a product manager — you.
Metrics dashboard
Live read-out of performance, agreement, and governance metrics. Your senior leadership sees the same numbers we see.
Weekly ops
Standing 30-minute op-tempo call. Anomalies surfaced, override patterns inspected, policy drift discussed.
90-day report
Written brief covering runtime throughput, agreement deltas, override rationale patterns, and recommendation for the next designated application.
What we measure · 02
Three buckets.
All evidence-backed.
No vanity metrics.

The numbers we publish to you.

Every metric below is computed from the evidence ledger — not a dashboard we wrote to make ourselves look good. You can reconstruct them yourself at any time. If you want something added to this list in the Phase 01 work statement, we add it.

Bucket 01

Performance

  • Review time per case — median, p90, p99sec
  • Throughput — cases closed per reviewer-hourcases/hr
  • Queue depth impact — against 90-day baselinecases
  • Time-to-determination SLA — hit rate%
  • Reviewer capacity reclaimed — hours freed per weekhr
Bucket 02

Agreement

  • Agreement rate — agent determination vs. senior reviewer%
  • Override rate — verifier changes the call%
  • Override rationale clusters — where policy is ambiguouspatterns
  • Calibration drift — over 30 / 60 / 90 daysΔ%
  • Confidence vs. ground-truth — calibration curveplot
Bucket 03

Governance

  • Egress compliance — continuous packet-level verificationbytes
  • Model provenance — every inference traced to a weight hashhash
  • Evidence completeness — % of decisions with full citation chain%
  • Authority chain integrity — OCA → reviewer → verifier signatures present%
  • Audit reconstruction — random decisions replayed end-to-endweekly
Bounds · 04
What this isn’t.
Stated plainly.

Negation matters.

Part of being defensible is being explicit about what you are not. If your team is expecting any of the things on this page, we’re not the right counterparty. Say so now and save everyone a cycle.

Not this
Not a SaaS you send queries to. Your data never leaves your boundary — the runtime comes to you, not the other way around.
Not this
Not a proprietary LLM we license you. We don’t build models. We compose the industry’s best under your authority.
Not this
Not a fixed-policy black box. You calibrate against your SCG, your caveats, your reviewers. Disagreement logs are part of the product.
Not this
Not a replacement for the reviewer. The human stays on the decision. The machines brief; the human signs. Every time.
Not this
Not a multi-year engagement to nowhere. Ninety days to cutover or we’re in a Phase 03 conversation you can walk away from.
Not this
Not phone-home telemetry from the runtime. Zero egress is measured, not marketed. Packet captures are filed as evidence.