← Notes · Governance · 18 April 2026 · ~7 min read

Active AI-RMF isn’t RMF with an AI sticker.

The federal Risk Management Framework was built for systems that change rarely and slowly. AI systems don’t. Here’s what breaks, what “Active” actually means, and what the cheapest path to compliance looks like when the surface you’re governing mutates underneath you.

A federal program office granted an Authority to Operate in February. Production cutover happened in March. The underlying model was updated in April — new weights, subtly different behavior on edge cases. The Security Assessment Report filed in January no longer describes the system in production. The accreditation package is technically current; it is practically stale. Nobody broke a rule. The rule was built for a different kind of system.

This is the shape of the problem the federal AI governance conversation keeps circling around without naming precisely. The Risk Management Framework — as specified in NIST SP 800-37 and practiced across DoD, IC, and civilian agencies — is a magnificent instrument for governing systems that are essentially static between accreditation cycles. It was designed against a world where the thing being authorized on Friday is the thing running in production on Monday. AI systems violate that assumption by construction.

Three things that change in AI that do not change in traditional IT.

A fire-control system, a payroll database, a claims-adjudication engine — the traditional workload under RMF — has a controlled change surface. Code changes go through change control. Configuration changes are logged. Data schemas evolve slowly. The system your SCA inspected in January is materially the system that’s running in July. Evidence filed in January is still descriptive in July.

AI systems move on three axes that traditional IT does not:

1 · Weights

Every model update — frontier labs push these every few weeks now — is, effectively, a new system. Not a patch. Not a configuration change. The weights are the program. When the weights update, the behavior of the system changes in ways that are not fully knowable from the diff. Traditional change control doesn’t have a vocabulary for this. You can file a Change Request for a weight update, but the review board has no way to characterize the new system except to re-test it.

2 · Training data

Models that retrain against production feedback change every training cycle. The decision boundary moves. Edge-case behavior drifts. Bias surfaces that weren’t in the previous corpus can appear in the next. The system that was accredited against a characterized training set last quarter is not the system in production this quarter — even if the architecture, code, and infrastructure are identical.

3 · Emergent behavior

Capability emerges in large models that was not characterized during accreditation. Sometimes this is capability you want (a new reasoning pattern that generalizes well). Sometimes it’s capability you emphatically do not want (a new pattern of evading constraints you thought you had). The characteristic of emergence is that it is not derivable from the specifications the accreditation package describes.

RMF, as practiced, handles change through periodic reassessment. You file a new SAR. You update the SSP. You go through a continuous-monitoring cadence that samples at a frequency measured in weeks and months. For weapons systems and financial databases, that cadence is well-matched to the rate of change. For AI systems, it is off by two or three orders of magnitude.

What “Active” means, architecturally.

“Active AI-RMF” is not a marketing modifier. It is an engineering commitment: the system generates the evidence of its own compliance every time it makes a decision, and the control plane adjudicates risk continuously rather than at accreditation boundaries. Three architectural properties make it work:

Per-decision evidence

Every inference the runtime serves carries a structured record: inputs, model identity (by weight hash), policy version applied, reasoning trace, confidence, cited controls, and a cryptographic anchor. The record is appended to a ledger the authorizing official can inspect any time without a scheduled engagement. Compliance stops being something you demonstrate annually and starts being something the system emits.

Real-time policy adjudication

The control plane sits between the model and the decision being served. Policy — classification guides, release authorities, domain rules, red-team findings — is loaded as data, not as a code change. When policy updates, it takes effect on the next inference, not the next deployment. When the governance officer wants to tighten a threshold or revoke an agent scope, that’s a UI action, logged to the ledger, enforced immediately.

Model-agnostic substrate

The runtime treats the model as a replaceable component. Industry pushes a new open-weight model; your team validates; you swap. The evidence chain, the policy pack, the authority graph — none of it breaks. What was accredited is the runtime, not the model. Model changes become a routine control-plane event, not an accreditation-cycle event.

The effect is that RMF becomes enforceable at the cadence AI actually operates at. The SCA doesn’t need to be in the loop for every weight swap; the evidence is there, in the ledger, verifiable. The AO doesn’t need to re-authorize every model push; the authorization is attached to the runtime and the policy posture, not to a snapshot of weights that’s already obsolete.

This is the cheapest way to be compliant.

Federal program offices hear “continuous evidence generation” and reasonably assume it must be more expensive than periodic audits. The opposite is true, and the reason is simple. Under the periodic-audit model, every model change triggers a reassessment scoped against a system you can no longer fully characterize. The assessment takes weeks. During those weeks, you’re either running an unassessed system (risk) or you’re not running at all (capability loss). You pay for the assessment, and you pay for the gap.

Under the active model, the assessment is the production workload. The runtime emits evidence. The AO samples the ledger. The quarterly review validates the posture against the ledger, not against a separately-constructed artifact. You pay once, for the infrastructure, and the evidence accretes for free.

The question federal governance has been circling is not “how do we extend RMF to cover AI?” It is: “what architectural commitments make RMF mechanically enforceable against systems that mutate?” The claim of Active AI-RMF

What the G-6, J-6, and civilian CIO shops actually need to decide.

The near-term decision in front of every federal governance office isn’t whether to “adopt AI.” It’s whether the governance posture they’re building out for AI assumes the cadence of a weapon system (periodic) or the cadence of a frontier model (continuous). Those two assumptions lead to fundamentally different architectures. One of them produces evidence the AO can defend. The other produces accreditation packages that describe a system you’re no longer running.

If you’re a G-6 staff officer or a civilian CIO reading this, the test to apply to any AI governance proposal on your desk is narrow: does the architecture generate continuous per-decision evidence under the authority of a designated human, or does it assume a periodic human inspection? If the latter, the framework will collapse on contact with the model-update cadence. If the former, the framework has a chance of actually working.

The rest — what models, what vendors, what use cases — follows from that architectural choice. Not the other way around.


This is the first of a series of notes on the architecture of sovereign AI. If you’d like to see how it runs against a specific workload — cross-domain review, claims adjudication, something else you’re accountable for — the Active Review demo walks through a full case in five minutes, and the engagement page describes the 90-day path from designation to cutover.