Programmable 2026 Presentation
Replaying a Week of Production in One Hour on AWS with Ghost Staging
This talk presents a practical, AWS-native pattern for “ghost staging”: an isolated environment that replays a week of real production inputs in roughly an hour. The goal is to expose defects that do not show up in unit tests, short canaries, or synthetic load, such as month-end calculations, cron collisions, queue backlogs, daylight saving quirks, and cross-service timing effects.
We cover the end-to-end blueprint. At the edge, CloudFront and API Gateway journaling capture request payloads, headers, and trace context into S3. Data stores are snapshotted with DynamoDB point-in-time restore and read-only RDS snapshots. A replay engine built with Lambda and SQS paces events in causal order, while Step Functions advances a virtual clock so systems believe time is passing quickly and deterministically. Side effects are quarantined by separating reads from commands, routing commands into stubs guarded by an outbox, and enforcing idempotency keys. AppConfig switches services into virtual time and traffic modes. CloudWatch and QuickSight compare ghost metrics and outputs against production baselines and highlight meaningful diffs.
Attendees learn how to choose what to journal, how to bound non-determinism, how to design diff checks for prices, taxes, and rankings, and how to wire alerts on mismatch rates rather than error codes. We also cover integration into CI and pre-prod gates, cost controls for large replays, and a cutover checklist that uses evidence from ghost runs. The result is a repeatable method you can graft onto an existing stack in days to de-risk risky rewrites like billing, pricing, search, or recommendations.