GL with this. if we can help with some research, let us know+Colibri wrote:Ok, this will take a while, probably offline for an hour or two. Although i can just restart it right now, it will probably lock up like this later in the evening, or when I'm already sleeping. I see what the problem is, just need to fully research what exactly happened and how to fix the code then.
OFFICIAL DOWNTIME THREAD
Re: OFFICIAL DOWNTIME THREAD
Re: OFFICIAL DOWNTIME THREAD
+Colibri wrote:Ok, this will take a while, probably offline for an hour or two. Although i can just restart it right now, it will probably lock up like this later in the evening, or when I'm already sleeping. I see what the problem is, just need to fully research what exactly happened and how to fix the code then.
Re: OFFICIAL DOWNTIME THREAD
Always remember to pillage before you burn!-----------Unknown
Re: OFFICIAL DOWNTIME THREAD
Server coming back online. It crashed at 21:35, so there's 35 minute revert, sorry about that. Just came back online at 0:00 so that means a 2 hour and 25 minute downtime.
I fixed 2 parts of the code that caused this, but that's just on the surface... I have 2 more upgrades planned to prevent such deadlocks in the future (or at least, so that the shard keeps working at some half-capacity, for the world to be saved and regularly restarted without a revert). I hope there's no other similar bugs that would crash the server when I'm not here to fix it, and we end up with a 6-8 hour downtime.
Here's what happened.
The culprits: Pariah and Banethorn, but they were framed by ThreadId 1 and ThreadId 270. All orchestrated by +Colibri, but there's no evidence :p
This is just an example of a deadlock. The image below is a very good metaphore of what happened.
Our server uses a lot of multi-threaded code to keep the lag down. For example, when you do a search with [mystuff, that should cause a noticable lag spike if used by a player with a lot of stuff. But it's multi-threaded, so that while everyone is attacking monsters, that searching algorithm just works on the data from the sidelines.
However, most things cannot just be concurrent, imagine a bunch of blind people running around, each having a spear aimed ahead of them. For example, one thread might want to sum up the numbers in a list of 100 numbers, and as it's doing that, another thread removes one number. As the first thread wants to read the last of the 100 numbers, it's no longer there, because the list is now just 99 numbers long, and that just causes a crash. But just crashing is a good thing, a worse scenario that can happen is silent data corruption, and you don't know where it's coming from. That's why computers use semaphores, to signal who can currently work on a piece of data, so that only one at a time does it. No data corruption, no problem. The problem is just when, in a system that's very complex, for all the lights to turn red, and we get a deadlock.
This happened twice in the past, always when I'm on vacation I remember one time in ... july, probably 2017. Then again August 1st 2020, last summer. These things are almost impossible to catch in testing, only show when the shard is under load of various activity (everyone doing a lot of different things at once). Well, there are coding practices that prevent such deadlocks, but it makes things much harder to code. This is just a game server, not the software that runs the electrical grid.
I fixed 2 parts of the code that caused this, but that's just on the surface... I have 2 more upgrades planned to prevent such deadlocks in the future (or at least, so that the shard keeps working at some half-capacity, for the world to be saved and regularly restarted without a revert). I hope there's no other similar bugs that would crash the server when I'm not here to fix it, and we end up with a 6-8 hour downtime.
Here's what happened.
The culprits: Pariah and Banethorn, but they were framed by ThreadId 1 and ThreadId 270. All orchestrated by +Colibri, but there's no evidence :p
This is just an example of a deadlock. The image below is a very good metaphore of what happened.
Our server uses a lot of multi-threaded code to keep the lag down. For example, when you do a search with [mystuff, that should cause a noticable lag spike if used by a player with a lot of stuff. But it's multi-threaded, so that while everyone is attacking monsters, that searching algorithm just works on the data from the sidelines.
However, most things cannot just be concurrent, imagine a bunch of blind people running around, each having a spear aimed ahead of them. For example, one thread might want to sum up the numbers in a list of 100 numbers, and as it's doing that, another thread removes one number. As the first thread wants to read the last of the 100 numbers, it's no longer there, because the list is now just 99 numbers long, and that just causes a crash. But just crashing is a good thing, a worse scenario that can happen is silent data corruption, and you don't know where it's coming from. That's why computers use semaphores, to signal who can currently work on a piece of data, so that only one at a time does it. No data corruption, no problem. The problem is just when, in a system that's very complex, for all the lights to turn red, and we get a deadlock.
This happened twice in the past, always when I'm on vacation I remember one time in ... july, probably 2017. Then again August 1st 2020, last summer. These things are almost impossible to catch in testing, only show when the shard is under load of various activity (everyone doing a lot of different things at once). Well, there are coding practices that prevent such deadlocks, but it makes things much harder to code. This is just a game server, not the software that runs the electrical grid.
+Colibri, Administrator of UO Excelsior Shard
Don't know what the purpose of your life is? Well then make something up!
(Old Colibrian proverb)
Don't know what the purpose of your life is? Well then make something up!
(Old Colibrian proverb)
Re: OFFICIAL DOWNTIME THREAD
I had just pulled up [ach when it locked, guess I have so many achievements.....
Re: OFFICIAL DOWNTIME THREAD
You know what's funny, every time there is downtime (planned or not, I'm never paying attention in chat), I always think it was something I did. I used to think it was just my huge ego, but I was right all along.../s
Re: OFFICIAL DOWNTIME THREAD
Banethorn, yeah you just opened your [ach
Pariah, you did something that updated some info on your account, who knows what that was.
Pariah, you did something that updated some info on your account, who knows what that was.
+Colibri, Administrator of UO Excelsior Shard
Don't know what the purpose of your life is? Well then make something up!
(Old Colibrian proverb)
Don't know what the purpose of your life is? Well then make something up!
(Old Colibrian proverb)
- Wil
- Legendary Scribe
- Posts: 1128
- Joined: Mon Dec 30, 2013 1:19 pm
- Location: Seattle, WA, USA
- Contact:
Re: OFFICIAL DOWNTIME THREAD
+Colibri wrote:
Re: OFFICIAL DOWNTIME THREAD
I do! I moved a GOC stone into a container in my house preceding the crash. Not sure if that helps with debugging, but being account bound, there could be something to that.+Colibri wrote:Banethorn, yeah you just opened your [ach
Pariah, you did something that updated some info on your account, who knows what that was.
Re: OFFICIAL DOWNTIME THREAD
The server crashed at 14:04 shard time, fortunately this was just 4 minutes after the save so there's virtually no loss of progression (4 minute revert).
All software stuffs seem normal, so this was likely a hardware issue, or possibly loss of power... I'll investigate further. For the next 48 hours, the server will be on a 15-minute save interval, just in case.
All software stuffs seem normal, so this was likely a hardware issue, or possibly loss of power... I'll investigate further. For the next 48 hours, the server will be on a 15-minute save interval, just in case.
+Colibri, Administrator of UO Excelsior Shard
Don't know what the purpose of your life is? Well then make something up!
(Old Colibrian proverb)
Don't know what the purpose of your life is? Well then make something up!
(Old Colibrian proverb)
- Wil
- Legendary Scribe
- Posts: 1128
- Joined: Mon Dec 30, 2013 1:19 pm
- Location: Seattle, WA, USA
- Contact:
Server go boom?
ping shard.uoex.net
PING shard.uoex.net (51.222.105.87) 56(84) bytes of data.
^C
--- shard.uoex.net ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 190ms
PING shard.uoex.net (51.222.105.87) 56(84) bytes of data.
^C
--- shard.uoex.net ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 190ms
Re: Server go boom?
Hmm, i just thought my router or internet is down, looks like it's the server. Working on it...
+Colibri, Administrator of UO Excelsior Shard
Don't know what the purpose of your life is? Well then make something up!
(Old Colibrian proverb)
Don't know what the purpose of your life is? Well then make something up!
(Old Colibrian proverb)
- Sethra Lavode
- Grandmaster Scribe
- Posts: 92
- Joined: Mon Sep 16, 2013 12:26 pm
Re: Server go boom?
It seems so....
Starfish & coffee, maple syrup & jam. Butterscotch clouds & a tangerine, side order of ham.
Re: OFFICIAL DOWNTIME THREAD
It's back online. This time we weren't so lucky with the crash timing, since the server crashed at 18:53 shard time, so there was a revert of almost one hour.
The server will be on 15-minute save interval through the weekend, and I'm trying to resolve this with our hosting provider to find what the problem was. The crash on May 26th might as well have been a random occurence, a crash once or twice a year is still manageable... but this was just a week later, so something's definitely fishy.
The server will be on 15-minute save interval through the weekend, and I'm trying to resolve this with our hosting provider to find what the problem was. The crash on May 26th might as well have been a random occurence, a crash once or twice a year is still manageable... but this was just a week later, so something's definitely fishy.
+Colibri, Administrator of UO Excelsior Shard
Don't know what the purpose of your life is? Well then make something up!
(Old Colibrian proverb)
Don't know what the purpose of your life is? Well then make something up!
(Old Colibrian proverb)
- ButteryBiscuits
- Elder Scribe
- Posts: 112
- Joined: Tue Apr 30, 2019 9:32 am
Re: OFFICIAL DOWNTIME THREAD
But it's up now! Go +C!!
BB
---------------------------
ButteryBiscuits
in game name ButteryBiscuits
https://en.wikipedia.org/wiki/Mermaid_of_Warsaw
---------------------------
ButteryBiscuits
in game name ButteryBiscuits
https://en.wikipedia.org/wiki/Mermaid_of_Warsaw