Monday, December 3, 2012

Emergency Preparedness & IT Part 4: Spares & Warranties

In the event of a major event, warranties will not do much for you.  They are good to have during your typical daily operations, covering part failures as needed with as little as one day turn around time, but when things go really bad you can't count on them.  As was the case with Hurricane / "Superstorm" Sandy, transportation was severely impacted and nothing was getting moved into or out of the effected region.  You need to "self warranty" as much as you can, and as much as it reasonable to.
"Self warrantying" comes down to playing the odds and comparing the costs involved.  Basically, risk management.  Every piece of equipment you own will fail.  It's a given.  Even if it's as simple as a fan failing in it, something about it will break.  This where the hard part comes in: you need to weigh the odds of the device failing versus the cost of keeping a spare on hand.  And to complicate matters, if you have more then one of that device, how many spares to keep.  I can't tell you what to do for your network in a blog post, but I'll try to give you some things to think about.

First of all, the more of one device you have, the higher the chances of one of them going bad.  For example, if 1 out of every 5 servers are predicted to die, and you only have 1 server, you have a 20% chance of having one that dies. But if you have 5 servers, you have a 100% that it will die.  This is very rough math and I'm sure a statistician will yell at me, but that's OK.  My point is still somewhat illustrated.  The more of something you have, the more spares you have to keep on hand for it.

Along the same lines, if you have only one of a certain device, you need to decide if it's worth having a spare on hand.  For example, if you have one cordless phone for your server room, buying a second to keep in the closet "just in case" won't break the bank.  But if you have one Cisco 7613 router, keeping a spare in the closet isn't exactly budget friendly.

This all applies to "non-major-emergency" situations too.  You could have a server die any day of the week.  Having spares on hand lets you get back up and running a lot faster then having to wait for a warranty replacement.  This means having spare servers as well as spare parts handy.  A great example are hard drives.  Lots of places don't even keep spare hard drives on hand for their servers.  This practice just tempts Murphy too much and it's begging him to show up and ruin your day.  For the cost of a hard drive, it's worth keeping one (or many) on hand. Keeping spares limits your exposure to bad situations and can limit the chances of a "warning" turning into a "critical" problem.

Storage of your spares is also an important consideration.  Usually storage space in corporate environments is tight and carefully allocated.  An easy solution: storage units.  They are just about everywhere and priced at very reasonable amounts.  You don't really even need a climate controlled unit (unless you're located in Death Valley or Southern Texas) since the storage temperature limits of IT equipment generally has a much wider tolerance then the running temperature limits.  Just make sure you storage location is close by, in a safe location with less chance of flooding/earthquake/etc then your data center, and has access times that will work for you.  Lastly, don't forget to make sure your business' insurance covers what's in the storage unit, just in case.

Once you're back up and running on your spares, you can deal with getting your warranty replacements swapped in, if it's possible to.  If you were in some of the affected areas of Superstorm Sandy, the Dell warranty won't exactly cover "power supply death by drowning".  That makes having those spares even more important.

Again, I hope this is nothing new for anyone.  But if some of these ideas are new for you, then I'm glad I could enlighten you and help you consider ways to harden your network and protect it from a major event.

Part 5 is going to be about backups.  Stay tuned!

No comments:

Post a Comment

IT Accountability: Avoiding Murphy

Amongst technology experts, Murphy is someone we all try to avoid.  Murphy's Law states "Anything that can go wrong, will".  E...