Forum outage - Sunday 15th December

Hi all, 

As some of you may have noticed, we had a period of downtime across the forum and a few other services on Sunday 15th December. 

At around 03:15 on Sunday morning, the server performed a routine restart following the automatic installation of some updates. This restart was not successful, and despite many determined attempts to get it to do so, the server was unable to boot into its normal operating system. The suspicion is that one of the updates installed conflicted with something else on the server. As the server is remote, debugging options are incredibly limited which makes issues like ‘not booting’ near impossible to track down or resolve. 

At around 3PM on Sunday, the decision was made to provision a new server environment. As you can imagine, this is a significant undertaking, especially unplanned and at short notice.

By around 21:00, key services like the membership back end were available again, but there were sustained issues with the forum. This appeared to be caused by an underlying incompatibility with the new database environment, and the forum database. 

The decision was made to provision another new server environment. At around 00:30, the forum was returned to normal operation along with the membership back end. 

 

As the original server had shutdown gracefully and the recovery console allowed access to the databases and file stores, there was no data loss as a result of this outage. 

 

As you can probably imagine, this is bordering on a worst case scenario for the club, but our processes to handle it went smoothly, and whilst I would have certainly preferred my weekend to have gone uninterrupted (not least since I am in the middle of refurbishing a set of brake callipers!), we should all be happy that such a major issue has been resolved so swiftly. In the 11+ years that I have worked with the club’s IT, this is by far the most significant issue/outage that we have faced. 

 

Hopefully that is the end of any major disruption, but as you can appreciate this is a substantial change in our infrastructure so there may still be issues to iron out.

 

 

I appreciate your patience whilst things weren’t working  

Thanks for the full and detailed report on what happened.

After a stressful and frustrating day like that most people would have just crashed out and left it until the morning.

Nice one Ramsey.

 

I feel your pain.

I’ve a server migration to perform in the new year, with about 10 databases etc.

Thanks for your hard work getting us up and running again.
Hope you have a Merry Christmas and New Year without any problems.

I didn’t understand a word of what Ramsey wrote - world’s no.1 techno-thickie me !

I did notice the web site was not working this last weekend, but knowing that a new version was being developed, didn’t worry too much, and just assumed it was a hiccup with an experimental dry-run.

I do feel for anyone having problems with computers / servers etc - I know how frustrated I get when my PC throws a wobbler.  

Well done for getting it all going again so quickly.

I’m with Chris on this one. The detailed explanation of the issues with the server went totally over my head.I have to ask for the help of my daughters for any computer issues.Thanks to all involved for making this forum work