Page 1 of 1

Prevent Server Outages

PostPosted: Mon Oct 13, 2008 2:30 pm
by bedub1
Twill wrote:Update #3:

We're going to probably have to go back through some backups, dig out just off-topics and the lost usergroups and restore them individually.

This probably wont happen until Tuesday (it's thanksgiving weekend here in Canada and so our families seem to want to spend time with us for some odd and unknown reason), so hang tight until then, off-topics and your clan forums will be coming back as soon as we can get them there.

We've heard conflicting things from the data center - we've been variously told it's the memory controller, then the hard drive something or other and then finally "sorry if the chiller failure caused you any problems"...which is what took out the database 3 months ago.

I'll post more when I know more.

Twill

Ensure there is a battery backup for the RAID controller write caching. Raid controller cards typically have 128-512mb of ram onboard, and require a battery to keep the data in the ram in case of problems. If the database is in the process of a write, and it fails, with the battery the data will get written to the drives when the system comes back up. If the battery is dead or not there, then when the system comes back online, the data will be lost, and this can cause serious and random database corruption.

EDIT: The battery is a little battery inside the server screwed to the raid controller card...it's not a UPS for power for the entire server etc...

Re: Prevent Server Outages

PostPosted: Mon Oct 13, 2008 8:40 pm
by blakebowling
You do realize that the RAID failure was just an excuse, their cooling systems keep going offline (Rackspace's, not CC's)

Re: Prevent Server Outages

PostPosted: Mon Oct 13, 2008 10:23 pm
by hecter
Perhaps they should upgrade to some sort of fancy liquid cooling system? Ya... And put Kool Aid in it! That'll keep the temperatures down, fosho!

Re: Prevent Server Outages

PostPosted: Mon Oct 13, 2008 10:25 pm
by hwhrhett
hecter wrote:Perhaps they should upgrade to some sort of fancy liquid cooling system? Ya... And put Kool Aid in it! That'll keep the temperatures down, fosho!



or some nice hawaiian punch, thatll make it faster right?

Re: Prevent Server Outages

PostPosted: Mon Oct 13, 2008 11:36 pm
by bedub1
blakebowling wrote:You do realize that the RAID failure was just an excuse, their cooling systems keep going offline (Rackspace's, not CC's)

yes, and cooling causes cpu's to lock up mid-stream...and thus the entire system, and then the shit in raid ram gets dumped instead of being written to the drives.

Re: Prevent Server Outages

PostPosted: Tue Oct 14, 2008 12:34 am
by blakebowling
bedub1 wrote:
blakebowling wrote:You do realize that the RAID failure was just an excuse, their cooling systems keep going offline (Rackspace's, not CC's)

yes, and cooling causes cpu's to lock up mid-stream...and thus the entire system, and then the shit in raid ram gets dumped instead of being written to the drives.

My vote is to leave Rackspace altogether and move the servers elsewhere