IMPERFECT SYSTEMS AND WHY IT'S OKAY
It may seem silly to think of corporate database systems in terms in terms of cosmological philosophy, but same principles apply. All systems are bound by entropy and thus are ultimately destined to failure at some point. If someone tells you they built a perfect system, especially in an environment as volatile as IT, they are definitely lying and you should walk away. Fast.
All businesses need to be prepared for such technological failures and to come to terms with the fact that there is nothing they can do about it; except to find and invest in a trustworthy software house with an experienced IT team who will be there for them when anything or, indeed, everything else fails.
Otherwise, be prepared to pay the cost of failure in lost customers and lost data resulting in an embarrassingly bad public image.
They thought the system should have worked; the technology was to prevent it, the but ultimately it did not
Cost of failure
One doesn’t have to go far to see the scale of trouble when one of the “too-big-to-fail” CDN providers go down. Cloudflare sustained a catastrophic blow when it went down last week and took a massive number of their clients with them.
Millions of customers experienced a near heart attack when they got a “502 Bad Gateway” error page. The cause, ironically enough, was a routine security test gone wrong. This deployed a new rule globally that made all the Cloudfare server CPUs peak at 100% load and shut themselves down together with a significant part of the internet.
The safety-net that should have caught and predicted such an outcome didn’t work.
Why? Cloudflare themselves still don’t know while they continue to investigate into the situation; only apologizing for the error.
They thought the system should’ve worked; the technology was to prevent it, but it didn’t. It wasn’t a perfect system, because there is no such thing. We can only speculate about the damage done to the Cloudfare’s finances and public image.
What can a company do about it? Not using corporate databases or any technology whatsoever is a ridiculous notion that is not worth addressing. The answer is - hire professionals. Find, research, and employ people who have experience in risk analysis, prediction, and minimization, and who are trained and ready for the majority of the possible scenarios of things taking a hard-left turn.
A helping hand
There is really not much one can do about systemic failures of technology, but having an experienced IT team like Tentacle Solutions, who have previously handled crises like that and know how to minimize the damage, back everything up, quickly restore the data, and have it all up and running is priceless.
It takes years of practice and training to be ready for a situation like the one with Cloudflare and it takes a lot of nerve and character to withstand the stress of dealing with any technology issues with confidence and professionalism.