OFFICIAL DOWNTIME THREAD

Anything you find suspicious, things that crash your client, things that crash the server, anything that doesnt work as it should.
Orionesss
Apprentice Scribe
Posts: 14
Joined: Fri Oct 16, 2020 12:39 am

Re: OFFICIAL DOWNTIME THREAD

Post by Orionesss »

+Colibri wrote:Ok, this will take a while, probably offline for an hour or two. Although i can just restart it right now, it will probably lock up like this later in the evening, or when I'm already sleeping. I see what the problem is, just need to fully research what exactly happened and how to fix the code then.
GL with this. if we can help with some research, let us know :nod:
User avatar
Xavian
Legendary Scribe
Posts: 485
Joined: Sun Dec 25, 2011 4:09 pm

Re: OFFICIAL DOWNTIME THREAD

Post by Xavian »

+Colibri wrote:Ok, this will take a while, probably offline for an hour or two. Although i can just restart it right now, it will probably lock up like this later in the evening, or when I'm already sleeping. I see what the problem is, just need to fully research what exactly happened and how to fix the code then.
Fine.jpg
Fine.jpg (60.87 KiB) Viewed 4480 times
User avatar
arrow
Legendary Scribe
Posts: 328
Joined: Sun Mar 06, 2011 1:40 pm

Re: OFFICIAL DOWNTIME THREAD

Post by arrow »

Image
Always remember to pillage before you burn!-----------Unknown
User avatar
+Colibri
Administrator
Posts: 3963
Joined: Sat Feb 25, 2006 4:08 pm
Location: static void Main

Re: OFFICIAL DOWNTIME THREAD

Post by +Colibri »

Server coming back online. It crashed at 21:35, so there's 35 minute revert, sorry about that. Just came back online at 0:00 so that means a 2 hour and 25 minute downtime.
I fixed 2 parts of the code that caused this, but that's just on the surface... I have 2 more upgrades planned to prevent such deadlocks in the future (or at least, so that the shard keeps working at some half-capacity, for the world to be saved and regularly restarted without a revert). I hope there's no other similar bugs that would crash the server when I'm not here to fix it, and we end up with a 6-8 hour downtime.

Here's what happened.

The culprits: Pariah and Banethorn, but they were framed by ThreadId 1 and ThreadId 270. All orchestrated by +Colibri, but there's no evidence :p

This is just an example of a deadlock. The image below is a very good metaphore of what happened.
Our server uses a lot of multi-threaded code to keep the lag down. For example, when you do a search with [mystuff, that should cause a noticable lag spike if used by a player with a lot of stuff. But it's multi-threaded, so that while everyone is attacking monsters, that searching algorithm just works on the data from the sidelines.
However, most things cannot just be concurrent, imagine a bunch of blind people running around, each having a spear aimed ahead of them. For example, one thread might want to sum up the numbers in a list of 100 numbers, and as it's doing that, another thread removes one number. As the first thread wants to read the last of the 100 numbers, it's no longer there, because the list is now just 99 numbers long, and that just causes a crash. But just crashing is a good thing, a worse scenario that can happen is silent data corruption, and you don't know where it's coming from. That's why computers use semaphores, to signal who can currently work on a piece of data, so that only one at a time does it. No data corruption, no problem. The problem is just when, in a system that's very complex, for all the lights to turn red, and we get a deadlock.

This happened twice in the past, always when I'm on vacation :( I remember one time in ... july, probably 2017. Then again August 1st 2020, last summer. These things are almost impossible to catch in testing, only show when the shard is under load of various activity (everyone doing a lot of different things at once). Well, there are coding practices that prevent such deadlocks, but it makes things much harder to code. This is just a game server, not the software that runs the electrical grid.

deadlock.png
deadlock.png (57.17 KiB) Viewed 4448 times
+Colibri, Administrator of UO Excelsior Shard

Don't know what the purpose of your life is? Well then make something up! ;)
(Old Colibrian proverb)
Banethorn
Apprentice Scribe
Posts: 12
Joined: Mon Dec 07, 2020 7:01 pm

Re: OFFICIAL DOWNTIME THREAD

Post by Banethorn »

I had just pulled up [ach when it locked, guess I have so many achievements.....
Pariah
Legendary Scribe
Posts: 429
Joined: Wed Sep 05, 2012 10:31 pm

Re: OFFICIAL DOWNTIME THREAD

Post by Pariah »

You know what's funny, every time there is downtime (planned or not, I'm never paying attention in chat), I always think it was something I did. I used to think it was just my huge ego, but I was right all along.../s :)
User avatar
+Colibri
Administrator
Posts: 3963
Joined: Sat Feb 25, 2006 4:08 pm
Location: static void Main

Re: OFFICIAL DOWNTIME THREAD

Post by +Colibri »

Banethorn, yeah you just opened your [ach :)
Pariah, you did something that updated some info on your account, who knows what that was.
+Colibri, Administrator of UO Excelsior Shard

Don't know what the purpose of your life is? Well then make something up! ;)
(Old Colibrian proverb)
User avatar
Wil
Legendary Scribe
Posts: 1128
Joined: Mon Dec 30, 2013 1:19 pm
Location: Seattle, WA, USA
Contact:

Re: OFFICIAL DOWNTIME THREAD

Post by Wil »

+Colibri wrote:Image
traffic.jpg
traffic.jpg (94.53 KiB) Viewed 4248 times
Pariah
Legendary Scribe
Posts: 429
Joined: Wed Sep 05, 2012 10:31 pm

Re: OFFICIAL DOWNTIME THREAD

Post by Pariah »

+Colibri wrote:Banethorn, yeah you just opened your [ach :)
Pariah, you did something that updated some info on your account, who knows what that was.
I do! I moved a GOC stone into a container in my house preceding the crash. Not sure if that helps with debugging, but being account bound, there could be something to that.
User avatar
+Colibri
Administrator
Posts: 3963
Joined: Sat Feb 25, 2006 4:08 pm
Location: static void Main

Re: OFFICIAL DOWNTIME THREAD

Post by +Colibri »

The server crashed at 14:04 shard time, fortunately this was just 4 minutes after the save so there's virtually no loss of progression (4 minute revert).

All software stuffs seem normal, so this was likely a hardware issue, or possibly loss of power... I'll investigate further. For the next 48 hours, the server will be on a 15-minute save interval, just in case.
+Colibri, Administrator of UO Excelsior Shard

Don't know what the purpose of your life is? Well then make something up! ;)
(Old Colibrian proverb)
User avatar
Wil
Legendary Scribe
Posts: 1128
Joined: Mon Dec 30, 2013 1:19 pm
Location: Seattle, WA, USA
Contact:

Server go boom?

Post by Wil »

ping shard.uoex.net
PING shard.uoex.net (51.222.105.87) 56(84) bytes of data.
^C
--- shard.uoex.net ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 190ms
User avatar
+Colibri
Administrator
Posts: 3963
Joined: Sat Feb 25, 2006 4:08 pm
Location: static void Main

Re: Server go boom?

Post by +Colibri »

Hmm, i just thought my router or internet is down, looks like it's the server. Working on it...
+Colibri, Administrator of UO Excelsior Shard

Don't know what the purpose of your life is? Well then make something up! ;)
(Old Colibrian proverb)
User avatar
Sethra Lavode
Grandmaster Scribe
Posts: 92
Joined: Mon Sep 16, 2013 12:26 pm

Re: Server go boom?

Post by Sethra Lavode »

It seems so.... :cry:
Starfish & coffee, maple syrup & jam. Butterscotch clouds & a tangerine, side order of ham.
User avatar
+Colibri
Administrator
Posts: 3963
Joined: Sat Feb 25, 2006 4:08 pm
Location: static void Main

Re: OFFICIAL DOWNTIME THREAD

Post by +Colibri »

It's back online. This time we weren't so lucky with the crash timing, since the server crashed at 18:53 shard time, so there was a revert of almost one hour.

The server will be on 15-minute save interval through the weekend, and I'm trying to resolve this with our hosting provider to find what the problem was. The crash on May 26th might as well have been a random occurence, a crash once or twice a year is still manageable... but this was just a week later, so something's definitely fishy.
+Colibri, Administrator of UO Excelsior Shard

Don't know what the purpose of your life is? Well then make something up! ;)
(Old Colibrian proverb)
User avatar
ButteryBiscuits
Elder Scribe
Posts: 112
Joined: Tue Apr 30, 2019 9:32 am

Re: OFFICIAL DOWNTIME THREAD

Post by ButteryBiscuits »

But it's up now! Go +C!!
BB
---------------------------
ButteryBiscuits
in game name ButteryBiscuits
https://en.wikipedia.org/wiki/Mermaid_of_Warsaw
Post Reply