Hi everyone,
We got to the office this morning to find a flurry of emails that hit us more effectively than 6 shots of espresso… “ActiveInbox has been down” “Did you have an outage?” “I can’t log in”.
We’ve been doing this a long time, and I immediately had some theories… The number 1 suspect was that we’re close to our annual security audit with Google, and I thought that might have triggered a permissions problem. The 2nd thought was that Chrome might have updated and changed some operational detail that created waves for us. Only in a distant 3rd – almost as an afterthought – did I consider it might be at our end. (This isn’t arrogance. It’s just we’ve made no server updates in the last few weeks so it would be very unexpected.)
So it was something of a shock when I logged into our hosting metrics and found this:

I’m sure you’ll spot, with the bare minimum of sleuthing, that “big red blotches” aren’t what we like to see.
It transpires that our provider (Heroku) had a global outage about an hour after we went home last night. They’re status page indicates it took them most of the night to fully resolve it, but the good news is we’re back now.
If you’ve followed our handful of outages over the last decade, you might wonder why we’ve stuck with Heroku. I’ll be honest, it’s not a comfortable decision: we’d love to find someone to be our forever home. The problem is no one else has the same combination of “we take care of most of it for you” and serious credentials (they’re owned by Salesforce and powered by Amazon).
We’re sticking with them a while longer, but I’ll be seeking answers on why they were so slow to notify our alerting systems.
This was written by Andy Mitchell