Audit-grade logging is a data-platform problem, not a compliance bolt-on
The audit trail an OIG or court will demand cannot be reconstructed from application logs after the fact. It has to be a first-class, immutable data product, designed before the system ships.
By Kevin Odongo and Payton Jonson · May 26, 2026
We have written before about the audit-trail problem — the pattern of agency AI deployments that fail their first OIG review because they cannot show their work. That piece treated the failure as architectural in the abstract. This one treats it as a specific kind of architectural failure: a category-of-system mistake in which the artifact every agency owes a reviewer eighteen months later was built as application logging instead of as a data product.
The distinction matters because the fix is different. If you treat audit-grade logging as a logging problem, you reach for log aggregation, retention policies, and tooling like Splunk or the cloud-native equivalent. None of that produces what an OIG audit actually needs. If you treat it as a data product, you reach for schemas, immutability, query interfaces, and lineage — which is what the reviewer needs and what your engineering team can actually deliver.
Multiple OIG reports in the past twelve months have called out reconstruct-the-decision failures in agencies that thought they were compliant. The bar has shifted from "we have logs" to "produce the decision packet." Agencies are now being asked, against specific historical decisions, to produce the packet that drove the determination — the inputs, the version of the system, the human in the loop, the policy basis. Most cannot. The work to fix this is not exotic, and it is not optional.
What an audit trail must actually reconstruct
The OIG and the courts are not asking for an audit log. They are asking for the decision packet — the complete artifact, at the time of the decision, of everything that drove it. For a benefits determination, the packet has to contain:
The decision. The specific finding, with its timestamp, its case ID, and its action effect. Not "eligibility check passed" — the structured field-level finding that produced the action.
The inputs. Every fact the system considered. Not pointers to a database that has since been updated — the values as they were at the moment of the decision. If the claimant's reported income was a particular number at the time, that number is in the packet, not a foreign key to a record that has been modified seven times since.
The model and prompt version. Which specific version of the agent, model, and prompt produced the determination. Pinned, not "the current version." Audit five months later, the prompt has been revised twice and the model has been updated three times. The packet has to reflect the system as it was.
The retrieved context. If the agent pulled documents, policy guidance, or prior decisions to inform its reasoning, those specific documents — at the version they were in at the time — are in the packet. A document that was the basis for a denial has to be reproducible exactly, even if the document has since been superseded.
The human in the loop. Which reviewer approved, amended, or rejected the suggestion, with their action and the time. The human is part of the decision; the audit needs to know which human.
The policy basis. The specific statutory, regulatory, or program-manual section the action was grounded in, named, with version. Not "per program policy" — per 42 CFR § 435.916(b), version effective on the date of the determination.
This is the packet. It is what an OIG asks to see. It is what a hearing officer asks to see. It is what a federal judge in a class-action discovery order asks to see. The agencies that can produce it on demand survive review. The agencies that cannot are the ones writing remediation plans.
Why application logs cannot produce the packet
Application logs were not designed to answer this question. They were designed to help an engineer diagnose a production problem. The properties they have are not the properties the packet requires.
Application logs are typically append but not immutable — they can be deleted by retention policies, redacted under privacy review, or lost in a vendor migration. They are usually unstructured — written for human reading, not for machine reconstruction. They are not version-pinned — the message "decision made" does not record which version of the decision-making code produced it. They are not content-addressed — they reference IDs into a mutable database, so when the database changes, the meaning of the log entry changes with it. And they are usually not queryable as evidence — the path from "show me the decision packet for case 1234567 as of 2025-08-04" to actual reconstruction crosses three or four different systems with no consistent schema.
These properties are not deficiencies of the logging system. They are deliberate trade-offs for the use case logging was built for. The use case the audit requires is a different one, and the trade-offs need to be re-made.
The evidence store as a designed data product
The artifact that produces the packet is a separate data product, designed with five properties from the start:
Append-only, immutable storage. Once a decision-packet record is written, it cannot be modified or deleted by application code. Deletion, if it ever happens (regulatory data-minimization, court order), goes through a different pathway with its own audit trail. The store is on object storage with object-lock or similar enforced immutability, or in a write-once database table protected at the database level. The point is that the engineering team cannot accidentally lose evidence by mishandling a migration.
Content-addressed payloads. The inputs to a decision are not stored by reference. They are stored by their content hash, with the actual content stored in the same data product. When the underlying source database is updated, the packet still resolves to the input as it was — because the input is in the packet, not pointed at from the packet.
Schema-versioned. The shape of the packet itself can evolve. When it does, the schema version is named in every record, the migration is explicit, and the old shape is queryable against the new schema. An audit query five years in is not blocked by the fact that the packet structure has been refined twice.
Lineage-aware. Every record in the evidence store points back to the version of every model, prompt, dataset, and policy reference that contributed to it. The pointers themselves are content-addressed. An auditor can resolve from a single packet to the exact version of everything upstream of it.
Queryable by decision identifier, claimant identifier, and time window. Not "queryable by engineer with access." The query interface is part of the data product and is exposed to the program-integrity team. They produce the packet without needing to file a request with engineering.
Together, these properties produce a data product that does what the audit asks of it. The cost is not exotic. It is largely a one-time architectural investment plus a small ongoing storage bill. The OIG reports we have read in the past year suggest the cost of not having it is rapidly exceeding the cost of having it.
Reproducing a six-month-old decision
The functional test for whether the evidence store works is straightforward: pick a decision the system made six months ago. Reproduce it.
If you can pull the packet, hand the inputs and the pinned model version to a re-run of the agent in a sandboxed environment, and get the same output the system originally produced — you have a working evidence store. If you cannot, you have an audit risk. The exercise is the test, and it is the exercise we run with program-integrity teams as a one-day diagnostic.
The failures break down into a small number of categories:
- The model version is unknown — the system did not record which version was deployed at the time of the decision.
- The prompt is unknown — the prompt was updated in production without versioning.
- The inputs cannot be reconstructed because the source database has been updated.
- The retrieved documents have been deleted, redacted, or modified.
- The decision packet itself has been lost (retention policy aged it out, or the storage layer was migrated).
Each failure points at a specific gap in the data product. None of them are unfixable. All of them get more expensive to fix the longer they sit.
Where this connects to procurement
The data product has to be a contractual deliverable. The vendor cannot be the only entity that can produce the packet, because the vendor has incentives the OIG does not share. The clause is short:
The Vendor shall implement and maintain an Evidence Store for the System, consisting of an immutable, content-addressed, schema-versioned record of every System decision sufficient to reproduce that decision on demand. The Evidence Store shall be queryable by the Government, in perpetuity, by decision identifier, claimant identifier, and time window. The Government shall retain unrestricted rights to access, query, export, and migrate the Evidence Store, independent of the Vendor's continued role.
That clause turns the evidence store into a deliverable the agency controls. The OIG asks for a packet, the program-integrity team produces it, the vendor is not in the loop. The audit posture is defensible.
What to do Monday
Pick one decision the system made last quarter. Ask the program-integrity team to produce the decision packet — the inputs, the model version, the prompt, the policy basis, the human in the loop.
If they cannot produce it in an hour, you have an audit-risk problem that an OIG can find on a Tuesday and you will not be able to remediate by Friday.
For every gap in the packet, log the gap. The gaps are your engineering backlog. The remediation is not exotic — schema, immutability, content addressing — but each item has to be done deliberately.
Where Vardr fits
We help agencies and enterprises design the evidence store as a first-class data product before launch, and we run the reproduce-a-decision diagnostic on systems that are already in production. Kevin brings the backend and data-platform depth to specify the storage, schema, and query interface — and Payton brings the regulatory knowledge of exactly what an OIG, hearing officer, or court will ask the packet to contain, so the data product is built to the real bar. The Reference Architecture treats the evidence store as the central audit artifact, not as application logs aspiring to be evidence.
If this resonates with a program you're working on, we'd be glad to talk.