It seems like the problem has gone away, it's a bit frustrating since there was no answer or any particular thing that I did, maybe our main server just needed some downtime and a couple of restarts.
Timeline of events:
- Saturday, September 28th at 22:15 shard time, server was disconnected. I was still able to access the IPMI management console.
- 23:30-ish shard time, the technician took over. The cable got unplugged or something along those lines, and he re-plugged it in. There was a bit of a back-and-forth with the tech, because well although they get alerts of the server being down, I was just connected to the management console, so the tech thought that the server is offline because I'm doing maintenance on it, and then half an hour later again a similar thing but because of something else.
- We were back online about 2 and a half hours later (Sunday at 0:52 shard time).
But after this, there was a brief connection degradation happening about every 12 minutes. Some people got disconnected, but not everyone... and there seem to be no particular rule about who would get disconnected. It seemed that it won't go away, so I planned to get the game server moved to an auxiliary/secondary server, and although that server was setup with everything ready to go, it hasn't been used in probably 3 years, so it was a bit of extra work to get it ready. I think it would be good to have like a fire drill a few times a year, to make sure it's all ready to go in case of an emergency.
On tuesday morning around 5am, the spikes stopped for about an hour and a half.
Well, Thursday, October 3rd at 18:20 we moved to the auxiliary server and I took the main one down for some diagnostics (running a bare minimum linux environment just to run some commands and see if the problem is still there). However, those lag spikes were gone. Even after going back into Windows, with everything running, no lag spikes. So at 22:00, we moved again to the main server (very fast with just 15 minutes of downtime). And the lag spikes are no longer there.
So there's no telling what exactly happened, only theories:
- Since these lag spikes started happening just after Saturday's disconnect (perhaps a loose cable), there could be some hardware/firmware/software issue that caused some loop or data overflow, going off every 12 minutes, overwhelming the connection, and causing those disconnects. Taking the server offline for a few hours and doing a couple restarts, might have cleared that up.
- A DOS attack, although it would have to be one that targets some vulnerabilities of the RunUO server or one of our services, in a way that wouldn't be detected by the Anti-DDOS system in the datacenter. Though I think this is unlikely.
- Similar to a DOS attack, but someone running some very intense script that would overwhelm the gameserver ... again, I don't think this is likely, but it could be a kind of accidental DOS-attack, someone just having their script set with too little timeout between actions. Though, no amount of requests to the game server could cause a gigabit connection to saturate.
It has now been about 12 hours and it seems that the connection is stable.

- 2024-10-04 uogateway online chart.png (33.79 KiB) Viewed 8639 times
Point A: moving from main server to auxiliary.
Point B: moving back to main
The lag spikes stopped right after point A.
@Wil - it's a pretty good host (OVH). I just looked back, in late August 2021, there was a hardware issue. They replaced the motherboard at the time, but it took some effort to get them to look at it seriously. Although, well they have a huge number of servers and automated monitoring, and there were also times when something was fixed in a very short time and no hassle.
@Lach - not sure what you wanted to say with your post, it doesn't seem to be helping though. If the lag spikes were still happening, I'd be trying to figure it out, or probably just move us to a new server. If the lag spikes continued to happen on another server, then I'd just keep digging. But no amount of money can make the process any faster. (well, hiring an IT professional but since all these systems and software is completely custom, and there's a lot of trust involved, would be hard to get someone very fast).