Evidence packs beat AI explanations

Programs do not need model explanations. They need evidence packs — the documents, data hits, and rules evaluations that justify an action and can be independently re-checked. Build that, and you satisfy due process and speed up appeals.

The explainability conversation around AI in government has been running for several years. The conversation is, in our reading, asking the wrong question. The question is not "how do we get the model to explain its reasoning?" The question is "what does a hearing officer need to overturn or affirm this decision?" The two questions point at different artifacts. The first leads to a research problem. The second leads to a buildable system.

Federal civil-rights and AI guidance has tightened over the past year. HHS/OCR and OMB updates have raised the bar on adverse-decision transparency. Several states updated their notice templates with automation-disclosure requirements in early 2026. The agencies caught flat-footed are the ones that interpreted these requirements as "explain the model" and are now negotiating with vendors who cannot deliver the explanation. The agencies that interpreted them as "produce the evidence pack" are shipping.

This piece is about the second interpretation, what the evidence pack actually contains, and why it solves a problem the explanation does not.

What a hearing officer actually does

In an administrative hearing on a benefits determination, the hearing officer does not evaluate the algorithm. They evaluate whether the determination was correct against the policy. They want, in order:

The policy basis the determination was made under (the specific section of the State Plan, federal regulation, or program manual).
The facts the determination was based on, with their sources.
The rules evaluation that connected the facts to the policy basis to produce the finding.
The procedural steps that were followed (notice was sent on date X, the appeal window opened on date Y, the claimant was advised of their right to counsel under the relevant statute).
The identity of the human reviewer who approved the action, if any.

None of these require an explanation of how a model arrived at its suggestion. They require the evidence trail of the determination itself. The model's role in producing the suggestion is one input to that trail, not the subject of the trail.

This is what an evidence pack is. It is the prosecutor's binder, the defense's binder, and the judge's binder, all assembled before the hearing. When the agency can produce it cleanly, the hearing is short and the determination usually stands. When the agency cannot, the hearing becomes a litigation of the agency's processes, not of the merits of the case — and the agency frequently loses.

What an explanation gives you, and what it doesn't

Model explanations — feature importance scores, SHAP values, attention visualizations, natural-language reasoning traces — are useful for the model developer. They help the engineering team improve the model. They give a sense of which inputs the model is leaning on.

They do not give a hearing officer what they need. A SHAP value attached to "reported income" tells the hearing officer that the model considered income important. The hearing officer already knew income was important — it is in the regulation. What the hearing officer needs to know is what specific income number the model used, where it came from, whether it was the right number, and whether it was applied correctly against the income threshold the regulation specifies for the household size at the date of determination.

The explanation answers the wrong question. It is a confidence-building artifact for the engineer. The evidence pack is the dispositive artifact for the proceeding.

A practical consequence: agencies that have spent procurement cycles on explainable-AI features have, in some cases, ended up paying for features that did not help them at audit. The same engineering investment, redirected at producing the evidence pack, would have closed the actual due-process gap.

The evidence pack: what's in it

The pack is a structured, machine-readable artifact with a specific shape. We have refined this shape against three agency deployments and the format is converging.

Case header. Case identifier, claimant identifier (with appropriate access controls), program (SNAP, Medicaid, UI, etc.), action type (initial determination, redetermination, sanctions, overpayment), action effect (approval, denial, partial, reduction), date of action, and the policy basis (cited specifically — e.g., "7 CFR § 273.10(d)(1)(i), version effective on date of determination").

Inputs. Every fact considered by the system, with three properties per fact:

The value at the time of the determination (the actual number, the actual document content, not a pointer to a record that may have been updated).
The source of the value (which integration, which document, which claimant declaration, which prior case entry).
The retrieval time (when the value was pulled into the determination's context).

If the same fact had multiple values (a wage-records number from one integration, a self-reported number from the application), all of them are in the pack with their respective sources. The hearing officer does not have to ask which version was used.

Rules evaluation. Step by step, which rules were evaluated against which inputs, what the result of each was, and how they composed into the final determination. Not a model trace. A deterministic, replayable evaluation that takes the inputs and the rules and produces the same finding every time.

For agentic systems, the agent's contribution is evidence preparation, not rules evaluation. The agent collects inputs, suggests how rules apply, and produces a draft determination — but the determination itself is reviewed by a human and the rules evaluation is a deterministic process the human can re-run.

The human in the loop. Which reviewer signed off, what action they took (accept, amend, reject), and if they amended, what changed and why. The reviewer's action is part of the evidence trail and survives any later question about whether the system or the human made the determination.

Notice and procedure. Which notice was sent, on what date, with what language, with what appeal rights. The notice itself is in the pack. The date the appeal window opened and closed is in the pack. The claimant's procedural rights are documented.

Version pins. Every model, prompt, dataset, rule version, and policy reference is pinned to the exact version that was active at the time of the determination. The pack can be replayed against the same versions and produce the same finding.

Together, these properties make the pack the dispositive artifact in a hearing. The hearing officer reads it once and either confirms the determination or identifies the specific defect — and either outcome is faster, more defensible, and more accurate than the current state.

Determinism over eloquence

The pack does not contain a narrative explanation of the system's reasoning. It contains a deterministic evaluation that produces the determination from the inputs. The distinction is load-bearing.

A narrative explanation can be persuasive while being wrong. The model could produce a coherent-sounding sentence about how it weighed the claimant's income against the household threshold while having used the wrong threshold for the actual household size — and the narrative would not surface the error. The deterministic evaluation, by contrast, makes the error obvious: the threshold was X, the income was Y, the comparison was wrong.

This means the agent's natural-language outputs are not the artifact under review. The agent produced the evidence pack. The human reviewed it. The pack is the artifact, and the pack survives narrative drift.

In practice, the agent's natural-language summary is a useful caseworker affordance — a fast read for the reviewer — but it is not part of the formal evidence trail. The formal trail is the structured pack, version-pinned and replayable.

Speeding up appeals

There is a second consequence of building the evidence pack as a first-class artifact: appeals get faster.

Most agency appeal cycles are slow because the agency has to reconstruct the case for the hearing. The pack is the reconstruction. With it pre-assembled and attached to the case, the agency arrives at the hearing with the binder already prepared, the hearing officer can read it before the hearing, and the hearing itself is shorter and more focused.

This is operationally meaningful. The states we have worked with that built the pack as a deliverable saw mean time to appeal disposition fall by between 35% and 60% in the first six months — not because the determinations got better, but because the friction of preparing for the hearing collapsed. Claimants get faster answers, hearing officers handle more cases, and the agency's appeal-cycle burden goes down.

The appeal-cycle improvement is the operational case for the work. The due-process posture is the legal case. They point at the same artifact.

Integration with notices

State notice templates updated in early 2026 typically require, for adverse decisions involving automation: a description of the system used, the basis for the determination, the inputs considered, the right to challenge, and a path to contact a human. The evidence pack contains all of these. The notice does not include the full pack — that would be operationally unworkable and includes sensitive content — but the notice references the pack and the agency commits to producing it on request.

The wire-up is small: the notice template includes a case reference and a phone number, the call-center system can pull the pack by case reference, and the claimant or their representative can receive the pack on request without an additional records-request process. The procedure is documented and trained.

We have seen agencies announce policy updates without this wire-up and find themselves in the position of having promised a transparency right they cannot operationally deliver. The fix is upstream, in the architecture, not at the notice template.

What to do Monday

Pick one automated action your system took last quarter. Ask the program-integrity team to assemble the evidence pack for it.

If they cannot produce it, list what is missing. The missing items are your engineering backlog. The first item is usually version-pinning — the model, prompt, or rule that produced the action was updated and the original is no longer queryable. Add the version pinning. Then add the content-addressed inputs. Then add the rules evaluation. The pack falls out of the rest.

Add one clause to the next solicitation: the system shall produce an evidence pack for every action, accessible to the agency without vendor involvement, replayable to reproduce the finding. The clause turns the pack from a wish into a deliverable.

Where Vardr fits

We design the evidence pack as a deliverable, both for systems being procured and for systems already in production that have a due-process gap. Payton brings the regulatory and hearing-officer knowledge that defines what the pack has to contain to satisfy the bar that actually matters; Amlan brings the agent-workflow engineering that produces the pack deterministically as part of every determination. The Reference Architecture treats the evidence pack as a first-class output of every agentic workflow — not an afterthought assembled in response to an appeal.

Eligibility evidence packs beat AI explanations