The cutover nobody tested — and the leadership change that exposes it
Most benefits-modernization launches do not fail at go-live. They fail the second time a new commissioner walks through the door and the original cutover defenses break.
By Payton Jonson and Tomas Hobza · June 16, 2026
A modernization launch is judged on two days. The first is go-live. The second is the day the leadership change that signed the original SOW is no longer in the building.
The first day is what every system integrator plans for. The second is what nobody owns. And the second is where most of the failures we have been called to look at since early 2025 actually occurred.
This piece is about the failure mode between those two days — the one where the system technically went live, performance looked stable for nine months, and then a new commissioner asked a question the original cutover plan was never written to answer.
Why the second turnover is worse than the first
A standard modernization cutover is treated as a Gantt chart. Tasks have owners. Dependencies are mapped. A go/no-go meeting produces a green light. The day after go-live, the integrator celebrates and the program office breathes out.
What that plan does not contain is the institutional memory that produced it. The reason a particular reconciliation step runs every Thursday at 04:00 is in the head of the integrator's lead engineer. The reason a particular policy override was hard-coded to a magic constant is in an email thread between the commissioner's chief of staff and the integrator's project manager. The reason the rollback path was scoped the way it was is in a verbal agreement at the kickoff meeting.
None of that is in the documentation, because the documentation was written for procurement.
The first commissioner-level turnover usually happens within twelve to eighteen months of go-live. The new commissioner asks a question — why does the system deny these cases?, why did processing time spike last week?, why are we still paying the integrator for change requests we already paid for? — and the program staff who were supposed to answer have rotated, taken other roles, or are protecting their predecessor's reputation.
The second turnover, usually two to three years in, is where the wheels come off. The integrator's bench has rolled three times. The agency staff who lived through the original cutover have retired. The hard-coded override is now a production liability nobody can explain. The reconciliation step has been silently failing for six weeks and the alerts are going to an inbox the original owner left two years ago.
The Medicaid unwinding cutovers of 2023 and 2024 produced a citable, named pattern of this failure mode. State eligibility systems that survived their unwinding go-live are now, in mid-2026, hitting their second commissioner turnover. The CMS-required redetermination paths that were tested at launch are now running against undocumented operational drift, and the agencies that did not build for the second turnover are the ones we see in the news.
Why "we documented it" is not the answer
Documentation is a coping mechanism for institutional turnover, not a solution to it. The agencies that lean hardest on documentation are the ones with the worst second-turnover outcomes, for a structural reason: documentation drifts faster than code, and the consequence of stale documentation is worse than the consequence of no documentation. Stale documentation produces decisions made against a fiction.
The integrators who sell hardest on documentation packages know this. The first ninety days post-launch produce a high-quality binder. By month nine the binder is referenced only in audits. By month eighteen it is wrong.
The right unit of institutional memory is not a binder. It is executable, version-controlled, and run on a cadence the agency does not control — meaning a test suite, a rehearsal script, and a rollback path that are exercised on a schedule, by the agency, against a non-production copy of the live system, whether anyone has the time for it or not.
Reversibility as a contract clause, not a slide
The cutover plans we see most often treat rollback as a slide near the end of the deck. The slide has bullet points. The bullet points are never executed.
The cutover plans that survive a second turnover treat reversibility as a contract clause with three properties:
Numeric thresholds. Not "if there are significant issues." Specific numbers: error rate above X, processing time p95 above Y, denial-rate delta above Z. The thresholds are committed before launch and they are not allowed to be revised down to make a launch look successful.
Executable rollback paths. Not "we will revert." Actual code that runs in a test environment before contract execution and is rerun on a quarterly cadence against a current snapshot of production. If the rollback breaks because the data shape has drifted, that breakage is a known operational risk surfaced in time to fix, not a discovery at the moment of crisis.
Tested with the actual operating data. The reason most rehearsed rollbacks fail in practice is that the rehearsal was done against the synthetic data the integrator had at kickoff. The data shape eighteen months in is different. The reference snapshot used for rollback rehearsal must be from production, refreshed quarterly, and reconciled against the live system before each rehearsal cycle.
When reversibility lives in the contract instead of the slide, the second commissioner who asks "can we roll this back" gets a real answer instead of a panic.
What "rehearsable at production scale" actually means
A shadow-traffic rehearsal is not the same thing as a load test. A load test confirms the system does not collapse under volume. A shadow-traffic rehearsal confirms the system makes the same decisions against the same real-claimant inputs that the production system is making, in parallel, in real time, for long enough to catch drift.
The technical pieces are not exotic. The reason agencies rarely have them is that they are usually built after launch instead of before:
- A traffic-mirror layer that duplicates inbound requests from the production system to the shadow environment without affecting production latency.
- Deterministic comparison of the two systems' outputs on a per-decision basis, surfaced as a diff dashboard the program office can read.
- An owned data subset that represents the messy edge cases — the manual-review cases, the appeals, the data-quality outliers — not just the clean happy-path mass.
- A scheduled rehearsal cadence (quarterly minimum) that walks the rollback path end-to-end and produces a written report the agency keeps on file.
This is the level of rehearsal that makes a launch reversible nine months later, when the original integrator's lead engineer has moved to a different account and the commissioner who signed the contract is no longer running the agency.
Due process as a load-bearing test case
A cutover plan that does not treat due-process correctness as a first-class test case is a plan that will produce wrongful denials at scale during the first surge after the second turnover, because the people who knew which edge cases mattered are gone.
The cases that matter for due process are not the ones that go through cleanly. They are the ones that get partial documentation, the ones that arrive on the deadline boundary, the ones where the claimant's reported income changed mid-cycle, the ones where the system has to make a determination on incomplete information and produce a notice that says exactly what it considered.
These cases need to be in the test suite, named, and run on every code change. They need to be in the rehearsal suite, against shadow traffic, on the quarterly cadence. And the suite needs to be the agency's, not the integrator's, because the integrator's incentive is to keep the suite small.
When a new commissioner asks why a particular denial happened, the answer should be a replay of the case against the test suite, not an email to the integrator asking for a memo.
What to do Monday
If you signed a modernization contract in 2023 or 2024 and went live in 2024 or 2025, you are sitting on the second-turnover risk now. Three concrete moves close the worst of the gap before the next commissioner walks in:
Write your rollback criteria as numeric thresholds today and put them in writing. If you cannot name the threshold, you do not have a rollback plan; you have a slide.
Identify the three people whose absence would put the cutover at risk. Get one hour of recorded conversation with each on the topics that are not in the documentation. The recording is the institutional memory artifact that survives them.
Schedule a quarterly rehearsal that walks the rollback path end-to-end against a recent production snapshot. The first one will fail. That failure is the gift — it surfaces what the launch never tested.
Where Vardr fits
We are usually called after the second turnover, not before. The work at that point is recovery rather than prevention, and it is more expensive than it needed to be. The Modernization Readiness Assessment lets a program office identify the second-turnover risks in three weeks — before the commissioner walks in, while the people who built the system are still in the building, and while the rollback path can still be made real.
If this resonates with a program you're working on, we'd be glad to talk.