The latency budget for benefits decisioning

Eligibility and fraud determinations have a latency budget the same way an ad-bidding system does. Ignoring it produces either claimant harm or unreviewable auto-decisions. The architecture has to make the trade-off explicit per decision type.

Two systems we have worked on, in completely unrelated domains, both run on a clock measured in tens of milliseconds. One serves about ten billion advertising decisions a day. The other serves about half a million eligibility-related decisions a day. The technical patterns are nearly identical. The cultural treatment of the clock is entirely different.

In the ad system, the latency budget is the first thing everyone learns. New engineers can recite it. A regression of two milliseconds is an incident. Decisions arrive on time or they do not happen — there is no third option.

In the benefits system, the latency budget is unwritten. New engineers learn it by getting paged. A regression of two seconds is invisible until it becomes a backlog. Decisions arrive on time, late, or sometimes the next morning, and the system has no architectural concept of which case is acceptable.

This is the gap we keep hitting. State eligibility systems built to handle the 2024 and 2025 modernization wave have not generalized the lesson the ad world learned in 2010. The 2025 and early-2026 disaster-driven surges have made the gap operationally visible. The lesson is borrowable. The work is to borrow it explicitly.

Why a latency budget matters for a benefits decision

A determination that arrives too late is a different kind of failure than a determination that is wrong. It produces a queue. The queue produces operational pressure. The operational pressure produces a workaround. The workaround is what bites the agency at audit.

The workaround can be one of two things. Either the system bypasses review and auto-decides cases under load — which is the unreviewable-auto-decision failure, the one that fails the OIG audit and produces the wrongful-denial headline — or the system queues the cases and processes them in a delayed mode, which is the claimant-harm failure, the one that fails the timeliness requirements under 7 CFR § 273.2(g) for SNAP or the equivalent state-plan timeliness obligations under Medicaid.

There is no good failure mode under spike load. The architecture either picks one in advance, with an engineered degradation plan, or it picks one ad hoc, in production, while people are watching.

The latency budget is the artifact that makes the pick explicit. For each decision type, the system declares: this decision must be made within N milliseconds, or it goes to fallback mode F. The N is real. The F is real. The combination is the system's design under load, not its aspiration.

Borrowing the ad-tech budget framing

The ad-decisioning playbook is straightforward and worth importing wholesale:

The budget is per-decision and visible. Every decision request carries a deadline header. Every subsystem sees the deadline. The deadline is enforced at the protocol level, not as a soft suggestion. A subsystem that cannot meet the deadline returns its fallback within the budget rather than blocking.

The budget is tiered by decision class. Not every decision has the same budget. A real-time eligibility check at the document-upload step might have a 200-millisecond budget. A retroactive overpayment determination might have a 10-second budget. A monthly redetermination batch might have a 12-hour budget. The tier is engineered. The tier matters.

The budget is instrumented to the percentile, not the average. p95 and p99 are what bite under load. Average latency is fine; p99 latency that has spiked from 800ms to 4 seconds is the early warning of a queue forming. The dashboards in an ad system are percentile-first. The dashboards in most benefits systems are average-first. Switching is a single afternoon's work and produces visibility into a class of degradation the current dashboards miss entirely.

The budget has a written fallback. Each decision type names what happens when the budget is missed. The fallback is itself a decision — the right fallback for an eligibility check might be "queue the case for human review with a deadline-tracked SLA," not "auto-approve" or "auto-deny."

These four properties are within reach of every benefits-system architecture team. They require discipline, not invention.

Decision tiers — synchronous, queued, batch

Every benefits decision falls into one of three tiers, and the tier determines the architectural treatment.

Synchronous. The claimant is on the phone, in the office, or in the application flow. A decision that arrives a minute later is a worse outcome than a decision that arrives in milliseconds, even if the late decision is more accurate. Document-upload validation, identity proofing, prefill suggestions for the caseworker, and check-the-status queries are synchronous. The budget is tens to a few hundred milliseconds.

Near-real-time queued. The claimant has submitted. They expect a determination within hours or days, not seconds. Initial eligibility determinations, document-completeness checks routed to a worker, fraud-flag-triggered manual review queues, and partial-information escalations live here. The budget is single-digit minutes to hours, but each step in the chain has its own budget, and the queue depth is the load-bearing measurement.

Batch. Periodic processes, redeterminations, reconciliation runs, recoupment calculations, and reporting. These have a budget measured in hours to days, run on a known schedule, and have a clear definition of done. Batch is the easiest tier to engineer and the hardest tier to mis-engineer; we have seen synchronous decisions routed to batch processes and discovered six months later in audit.

The architecture decision is which decision goes in which tier. Most benefits-system architectures we have audited have made this assignment implicitly, by accident, in 2018, in a wiki that no longer exists. Re-making the assignment explicitly is the highest-leverage architectural exercise the program office can run.

Spike behavior — open enrollment, mass layoffs, disasters

Steady-state load is not the problem. The problem is the curve.

A state SNAP system that handles 4,000 applications per day under normal load is asked to handle 60,000 per day during a major disaster declaration. A state UI system that processes 1,500 claims per day under normal conditions saw 35,000 per day during the 2025 winter freeze events in the south. The curve is the design constraint, not the steady-state.

Three properties of the architecture matter for the curve:

Backpressure that prefers human review over auto-decision. When the synchronous tier overflows its budget, the right fallback is queue with a tracked SLA, not make a decision faster. Auto-decisioning to clear a queue is exactly the failure mode the OIG cares about. The architecture has to make the right fallback the easy fallback.

Horizontal scale that does not depend on the vendor. If the only path to scale is a billable change order against the vendor's licensing, the agency does not have horizontal scale; it has a billing event. Surge capacity has to be a property of the deployment architecture, not the procurement.

A pre-engineered degradation plan, with counsel sign-off. What gets turned off, sampled, or delayed under sustained surge has legal implications. The right time to think about that is before the surge. The agencies that handled the 2025 disaster declarations well had a written, pre-approved degradation plan with the relevant deputy attorney general's name on it. The agencies that handled them poorly did the policy work in real time, on the phone, at 11 PM.

These are not exotic engineering moves. They are basic surge-architecture practice from any high-volume real-time system. The reason they are rare in benefits systems is that the procurement that built the system did not name them as requirements.

What to put in the SOW

For the agencies still designing or recompeting their next-generation eligibility platforms, the latency-budget clauses are short and matter:

The Vendor shall define, document, and instrument a per-decision-class Latency Budget for the System, including (a) a synchronous tier with a p99 deadline measured in hundreds of milliseconds for decisions exposed to the claimant; (b) a queued tier with documented SLA budgets per step; (c) a batch tier with a documented cadence; and (d) a written degradation plan, approved by Government Counsel, naming which decisions are queued, sampled, or delayed when budgets are exceeded. The Latency Budget shall be instrumented to the 95th and 99th percentile per decision class and exposed to the Government via dashboard, with alerting on threshold violations.

That clause is the single most consequential operational requirement an eligibility-system solicitation can carry. It will be argued. It is worth arguing.

What to do Monday

Pick your top three decision types. Write a latency budget for each — synchronous, queued, or batch, with the relevant percentile and the deadline.

For each, name the fallback. If the budget is missed, what happens to the case? If you cannot name the fallback in one sentence, you do not have one — and the system is making the choice for you in production every day under load.

Move one dashboard from average to p99 today. Watch what shows up. The cases that have been slow-failing in the long tail for months will surface inside a week.

Where Vardr fits

We import the latency-budget framing into benefits-system architectures explicitly, both as a design exercise during modernization and as a recovery exercise after the first surge incident. The Reference Architecture treats per-decision deadlines and engineered fallbacks as first-class control-plane primitives. Frank's two decades shipping high-volume real-time decisioning systems at SocialFlow makes the framing earned rather than borrowed; Tomas's distributed-systems and security-grade-data-systems background turns it into infrastructure that holds the budget under spike load.