Field notes · March 2026 · 6 min read

Bali at four in the morning.

Notes from a server migration we didn't sleep through.

The migration was scheduled for 4 AM Bali time. That's mid-afternoon in our biggest client's market, which is exactly when we wanted to be awake — they were closed for business, but their guests were checking in.

The plan was 90 minutes of downtime. The actual downtime was four hours and eleven minutes. We were on the phone with a sleepless front desk manager in Singapore for most of it.

Two things broke that we had not predicted. One was a DNS propagation delay that hit some Asia-Pacific resolvers harder than we'd planned for. The other was a dependency on a third-party PMS API that had silent rate limits we'd never bumped up against in testing.

What we'd do differently

We'd warm the cache for 48 hours instead of 12. We'd run the migration against a staging environment that mirrors the production traffic profile, not just production schema. We'd have a phone tree for the front desk teams written down on actual paper, because somehow the WhatsApp group fell apart at hour two.

None of these are exotic lessons. They're the lessons every migration teaches every engineering team eventually. The point of writing them down is that next time it's our turn to be awake at 4 AM, we read this first.