So why did the Blackberry Service go down?

RIM has finally released their official statement on why the entire North American Blackberry Network went down.

The outage was triggered by “the introduction of a new, non-critical system routine” designed to optimize the cache, or temporary memory, on the computer servers that run the BlackBerry network.

RIM said “the pre-testing of the system routine proved to be insufficient.”

The failed upgrade apparently set off a domino effect of glitches, which the company referred to as “a compounding series of interaction errors between the system’s operational database and cache.”

The Canadian company said a “failover process” to switch to a backup system “did not fully perform to RIM’s expectations.”


What astounded me during the whole outage was RIM were silent… there was no communication about what or why… even the RIM homepage had nothing on it… even after it had been down for 8hrs!

Obviously I’d expect them to prioritise getting the system up and running but I was suprised about the lack of information being shared. 

This was my favorite comment from all the comments I got on my blog…. (From Greg Lowe)


So, doesn’t the average IT Professional feel comfortable knowing that a server/infrastructure that they have no control over can negatively impact their users in such a major way?

At least with Direct Push, we can fire the bonehead that tripped on the power cord (or installed the latest patch without testing)

You can read more about RIM’s statement HERE

Comments (9)
  1. irblinx says:

    I think you’re being a bit unkind Jason, how were they supposed to email their statement out :o)

  2. Kris Kumar says:

    Good one, irblinx!

    Jason, I was also surprised that RIM doesn’t have anything on their home page. No press release. And during the outage no message comforting the users.

    The weirdest thing, I was watching the business channels and the investors/analysts/so-called-tech-experts were saying that (this 8hr+) glitch will fade away soon. But if it happens again then it will be a problem. According to them people trust RIM. They weren’t a bit worried about RIM’s silence. Infact the stock market shows that it is not worried.

  3. Abdul Aziz says:

    Slashdot was running the same story and in fact I did comment the same thing- Exchange rocks coz of Activesync Direct Push

  4. Fergus says:

    I’m a bit tired of this constant childish niggling, to be honest.

    Usually your posts are interesting and informative, but this is just a bit unnecessary. You MS folks get p1ssed off when people slag off your products (with good reason most of the time) but then you do the same.

    Stones and greenhouses Jason, stones and greenhouses. 🙂

  5. Paul Mah says:

    There are a lot of fanatical (and blind) support for the BlackBerry.  I guess its a good thing for RIM where marketing is concerned, but it really irks me when I see a biased, non-rational defense of the BlackBerry.

    To quote an example of a one of these comments that I have read just these two days – and one of the person is a respected analyst no less.

    It goes something along the line of: "Well, even though the BB was down, I’m sure some of your corporate mail systems suffer more down time right.  So see, let’s just forget the RIM outage and move on…". What this statement ignores is that for most BlackBerry users, the BlackBerry IS your corporate E-mail AND the BB network combined. Hence if your corporate e-mail up-time sucks, then your BB definitely sucks, and sucks more too!

  6. J Letendre says:

    Having to support both BES/BB and MSExch2003 for mobility I have insight to both solutions strengths and weaknesses.

    For email / internal application nothing beats BB .. yet.  Yeah MS Direct push is almost as good as BB but at the end of the day it’s not .. the devices have god awful  battery life, direct push seems to lock up often and that’s fine .. I’ll live with that but to me .. being the Admin to this solution .. having only two security features (wipe, password policy( just doesn’t cut it.  I want total control over the device as well REPORTING of the devices and MS mobility doesn’t provide that. I have that control with BES as well VPN access built in with MDS. Encryption and the ability to control every function of the BB my end users use (or can’t).  So the MS solution is cheaper chat can apply if your a small shop but on on Enterprise level you have a whole lot of factors to consider beyond .. hey we can save $ on the CAL and seperate infra.

    Exchange 2007 looks to finally provide more features and policies but with a large clustered environment like mine management is not likely to approve another multi-million dollar upgrade for new mobility features when BES/BB provide it today.

    Time will tell.

  7. Well the title is a little far fetched but as the response to Mr Mobile’s (aka Jason Langridge) post

  8. The One Eyed Man says:

    So…. Has anybody asked, or has RIM offered to validate that queued e-mail was not compromised during this outage?  NO ONE had access to it during the outage, right?  It WASN’T STORED SOMEWHERE, on a hard drive in cleartext, RIGHT?  The "non-critical system routine" didn’t DISCLOSE anything externally, nor foul up the cache such that information was NOT disclosed to the wrong user, RIGHT?  Integrity checks have been run against the database, and comparisons against database backups have been performed to insure that Personal Data was not disclosed, changed, or deleted, RIGHT??  

    My final thought:  Is this the straw that may break the camel’s back, forcing the government and military to reconsider it’s assenine critical dependence on Crack-Berry?

    My other final thought:  WHERE IS OPEN-SOURCE???  J2ME Midlet + (OpenSSL + Apache / Tomcat + e-mail client) == Open-source BES.  To avoid the NTP lawsuit, USE A FULL-TIME TCP connection, like Microsoft.  We have anti-virus for phones that run in a Midlet, therefore device policies might be possible??

  9. The One Eyed Man says:

    In response to RIM’s statement:  "The issue is just how do you tell people what it is when it is e-mail that people are counting on, and that very communications path is down"

    My answer is this:  

    1.  DETERMINE AN ETA, even if it’s close but wrong.  PEOPLE WANT AN ETA.

    2.  Put a notice on your website.  People will go there first.

    3.  Don’t forget…. these devices are not just slaves of the "Blackberry network", they are devices that sit on a cellular network as well.  Work with the cell network providers for a notification solution (Maybe an SMS text broadcast)?

    4.  Don’t tell me in 12 hours somebody can’t set up a temp server to blast out a quick e-mail to all the devices — SHUT DOWN the regular queue servers and delivery servers if you have to:  It sounds like they were not working anyway.

    ALL of this leads me to the conclusion that RIM is very consistent in one respect, and I don’t like the device nor the service for this reason:  RIM’s ubiquitous culture appears to be that NOBODY seems to be able to (or is willing to) think outside the box.  

Comments are closed.