Preparing for Disaster
Posted Jul 5, 2012 // 0 comments
Last Friday night, a derecho storm swept through the DC area, having already left a swath of destruction across Ohio and West Virginia earlier in the day. Millions of people - including many Phase2 employees, as well as the main Phase2 offices in Alexandria - were left without power. The disruptions caused by the thunderheads even extended into the cloud, taking down Amazon's AWS server facility in northern Virginia, shutting down Netflix, Pinterest, and Instragram, as well as thousands of other sites, including a site that I had just logged into to test and close some tickets.
How do you prepare for this? It's not something you really work into a project plan. "Sprint 3 - Build content import scripts; Set up taxonomy management; Clean out freezer and sweep up glass after storm" What you can do, though, is work resilience into everything you do - after all, disasters come in all shapes and sizes.
Work redundancy into your workplan
The more people know about the full lifecycle of the project, the easier it is for them to jump in when they are suddenly required to cover for their colleagues. It sounds like an obvious thing, but there are competing pressures (known to most of us as 'time' and 'money') to bring staff in only when needed, and to have them focus specifically on the issues that they are specifically tasks to, and this can result in situations where staff enter a project knowing little about the backstory and the processes that informed the decisions that dictated the architecture that they are working within. One can argue that transitioning this knowledge through the project is the job of the analyst and the project manager, but what happens if your analyst is, for example, incapacitated by one (or, god forbid, two) sick babies? Or the project manager rides his bike off the road?
Something we've been doing lately is bringing the members of the full team (as envisioned in the project plan) into some of the earlier discovery and planning sessions - even if they are remote and only on a conference line (and working on wrapping their current project in the background), they are able to hear first hand from the client as they describe their business practices and goals, and when they build a related feature a month later they understand better what the ultimate reasons behind it are. If disaster strikes and they have to make a decision on what features to prioritize without input from the analyst (who wrote the ticket and worked out the implementation with the client), they can reach out to the client having already worked with them before, or can refer to the prior meetings to make an educated guess about the appropriate path.
This is of course a balancing act - you can't have too much redundancy, or the client will balk at paying the bills for developers not developing. Overall, though, bringing developers into the process earlier has (in our experience) introduced efficiencies later on that made it worthwhile.
Make everyone remote (even if they are always in the office)
Many of our developers are remote - our headquarters are in the DC area, but to do the kind of work we like to do we need to pull talent from a much wider pool of talent. We've had to adapt our working processes to make our teams efficient, and this has resulted in a culture of communication that places an emphasis on pushing information and processes into the cloud. The overall result is that many of our local team members are able to work as efficiently away from the office as they do within the office. I live in DC, and having worked at home full time in previous jobs I know that I need the structure of an office to be at my best. Yet many of my hours are billed from home, and this is possible because we have adapted our practice to support flexible work schedules. And when our office was shut down by the derecho, work continued, even if many of us had to relocate to Starbucks because our homes were without internet and power as well.
Always make sure you have a buffer. Leave an extra sprint or two at the end for dealing with the things you can't forsee - there will ALWAYS be things that need doing at the end of the project. If everything comes in on schedule and nothing has blown up along the way, then congratulations are in order, and you can free the developers to work on other projects. Chances are, though, that you'll be happy you left time to tighten the screws and clean up the integrations.
Always assume that the worst could happen
Don't overdo it. Don't get paralyzed in worrying about what could happen, and don't waste time coming up with contingency plans for every possible catastrophe. There's a difference between 'the worst is going to happen' and 'the worst could happen'. The thing about disasters is that (hurricanes not withstanding) you don't often see them coming. Not that it interrupted anything, but we even had an earthquake last year in Virginia. Who'd expect that? Something always happens, though - maybe not a natural disaster, but a third party integration could go tragically wrong, or a team lead might need to take leave suddenly. Building redundancy and resiliency into your everyday practice doesn't prevent this, but it can help turn a disaster from a disaster into something that gets overcome as a matter of course.
Solutions Architect Joshua Lieb has a deep knowledge of web development, content management, and programming thanks to his more than than 15 years of web development experience in the government, publishing, and NGO sectors. His proven ...