RIM has finally released their official statement on why the entire North American Blackberry Network went down.
The outage was triggered by “the introduction of a new, non-critical system routine” designed to optimize the cache, or temporary memory, on the computer servers that run the BlackBerry network.
RIM said “the pre-testing of the system routine proved to be insufficient.”
The failed upgrade apparently set off a domino effect of glitches, which the company referred to as “a compounding series of interaction errors between the system’s operational database and cache.”
The Canadian company said a “failover process” to switch to a backup system “did not fully perform to RIM’s expectations.”
What astounded me during the whole outage was RIM were silent… there was no communication about what or why… even the RIM homepage had nothing on it… even after it had been down for 8hrs!
Obviously I’d expect them to prioritise getting the system up and running but I was suprised about the lack of information being shared.
This was my favorite comment from all the comments I got on my blog…. (From Greg Lowe)
So, doesn’t the average IT Professional feel comfortable knowing that a server/infrastructure that they have no control over can negatively impact their users in such a major way?
At least with Direct Push, we can fire the bonehead that tripped on the power cord (or installed the latest patch without testing)
You can read more about RIM’s statement HERE