Hi all,
As some of you may have noticed, we had a period of downtime across the forum and a few other services on Sunday 15th December.
At around 03:15 on Sunday morning, the server performed a routine restart following the automatic installation of some updates. This restart was not successful, and despite many determined attempts to get it to do so, the server was unable to boot into its normal operating system. The suspicion is that one of the updates installed conflicted with something else on the server. As the server is remote, debugging options are incredibly limited which makes issues like ‘not booting’ near impossible to track down or resolve.
At around 3PM on Sunday, the decision was made to provision a new server environment. As you can imagine, this is a significant undertaking, especially unplanned and at short notice.
By around 21:00, key services like the membership back end were available again, but there were sustained issues with the forum. This appeared to be caused by an underlying incompatibility with the new database environment, and the forum database.
The decision was made to provision another new server environment. At around 00:30, the forum was returned to normal operation along with the membership back end.
As the original server had shutdown gracefully and the recovery console allowed access to the databases and file stores, there was no data loss as a result of this outage.
As you can probably imagine, this is bordering on a worst case scenario for the club, but our processes to handle it went smoothly, and whilst I would have certainly preferred my weekend to have gone uninterrupted (not least since I am in the middle of refurbishing a set of brake callipers!), we should all be happy that such a major issue has been resolved so swiftly. In the 11+ years that I have worked with the club’s IT, this is by far the most significant issue/outage that we have faced.
Hopefully that is the end of any major disruption, but as you can appreciate this is a substantial change in our infrastructure so there may still be issues to iron out.
I appreciate your patience whilst things weren’t working