Diesel Generators, Amazon and Cloud Computing
By Jim Lundy
There have been numerous accounts of what went wrong in Amazon’s Virgina Cloud Data Center last week. Diesel Generators, always important, are emerging as a vital part of the Cloud Computing story.
It is often easy to blame anyone but yourself when things go wrong during a power outage. This is certainly true for PaaS providers that want to convince users to shift to using their Cloud platform. We are not here to pass judgement on Amazon or others, but rather to discuss the issues that arose and how people can avoid them. The topic has to do with Power and what happens when the main power grid goes offline and you need backup power, such as diesel generators.
First, while Amazon was not down for that long, many of their customers who run SaaS applications were. The situation, which to Amazon’s credit they described in detail, was avoidable.
This is why Cloud data center Disaster Backup and Recovery systems need to be tested and re-tested. This isn’t the first time this year or in the last two years that there have been problems at this particular Amazon facility.
Any IT Pro that runs a data center knows that staying online is critical. Being able to have Diesel generators come online quickly (with UPS power in the interim), is basic and fundamental. Planning for and being prepared for worst case scenarios is also critical and this is where Amazon seems to have had problems.
While Amazon is recommending that enterprises have their configurations set-up so that their sites are mirrored via other Amazon Data Centers, the issue is that this Data Center could have stayed online.
From this situation, there are lessons learned for enterprises who are evaluating PaaS Cloud providers.
1. Basic Infrastructure. Does the data center have the right equipment, including basics such as strong enough circuit breakers to be able to handle power surges that can occur, particularly during electrical storms. How often are the base components tested? What maintenance is done on the generators, including their cooling systems (e.g. fans)? When it comes to this level of evaluation, this is crossing over from IT expertise to one of commercial building and power operations. This includes uptime and reliability for diesel generators.
2. Backup Power Compliance Testing. How long can said data center run on backup power before it fails? How well doe the backup systems work when having to go back and forth during power outages and voltage spikes? What are the operational certifications and checks that are done on a regular basis to verify backup system readiness?
2. Operations Staff. How well trained are the staff in being ready to deal with these issues. This is similar to flying a plane. Running a highly automated data center often means it is a hands-off operation, but when things go wrong, trained and competent staff need to step in.
What this all means is that we are still in the early days of Cloud. Enterprises that need to have the same kind of uptime that they have designed into their current data centers will have to examine PaaS providers with a higher level of scrutiny. Operational sandbox testing for worst case scenarios is critical. One thing is clear: highly reliable Cloud PaaS Services that have highly reliable online backup (including diesel generators) will not be cheap.