In the course of our business, we see many data center/applications migrations and/or high-severity issues. One observation we always share with our clients is to plan for staff rotation. As you might expect, some listen and others do not. Here’s why it’s important.
Migrations often happen overnight…when the business sleeps or operates at a lower activity level. Organizations without satisfactory disaster recovery plans often incur an outage to do a migration. People are resilient for so many hours, and then they crash.
What often happens in migrations is everyone wants to be at the starting line, and the adrenaline keeps them engaged. If shifts are not “forced,” then there is often nobody left with “gas in their tank” to troubleshoot issues. People simply have to disengage to be fresh.
We saw this at a large customer where the team had persevered, declared success, and then dragged themselves home. There was an issue, and the on-call was unwilling to make changes as he didn’t understand the changes that had taken place (a change management issue.) NOBODY involved was responding to calls. As it turned out, the group’s manager lived in my town, and I got to knock on his door at 10:00AM on a Sunday morning. His wife wasn’t happy (he had been up all night) and did indeed get him up. While he resolved the issue, a few months later he resigned and went to work at a different company.
In this case, the team was not structured to focus on a multiple day issue….and response was poor.
In another case, a new virus definitions in client’s antivirus system determined the operating system was bad, quarantining the operating system. The client had a policy to delete quarantined files, so with the speed of automation thousands of operating systems were deleted.
The senior manager quickly determined this would require a sustained 24/7 response, and teams were “nominated” to cover 12 hour shifts. We were asked to help on a sustained basis, providing process oversight and helping with crisply doing turnovers.
To the credit of the senior manager, this approach allowed a sustained response as systems we recovered from (gasp!) tape.
Large IT shops often run with multiple shifts and a technical response is more organic. Smaller shops tend to have an operational capability 24x7, and may lack the detailed technical response.
When planning or reacting to major events, think in terms of how to rotate your staff for a sustained time.