Server Downtime and Data Loss

Archival storage for Announcements. Peruse old Announcements here!

Moderator: Community Team

Forum rules
Please read the Community Guidelines before posting.
Locked
User avatar
agentcom
Posts: 3994
Joined: Tue Nov 09, 2010 8:50 pm

Re: Server Downtime and Data Loss

Post by agentcom »

morleyjoe wrote:Glad to see it's all running fine now. To those who are complaining or are pissed off, would you have preferred to find CC did not do backups at all? Having had to replace my share of data on crashed or dead computers, I think it is amazing to see that they were able to get this backup in place and running so quickly. It could have been far worse. Congrats to the team for their hard work is in order.


I wouldn't go so far as to say that it should even be in the realm of possibility that CC didn't have any[i] backup, so I'm not going to give the admin props for quite that much. But I will second you on the direction if not the magnitude of your sentiment. I think that a 24 hour rollback for a situation that hasn't happened in 3 or 4 years is pretty impressive. I'm surprised and impressed that such a thing was so well prepared for (although I don't know if it was just luck that the last rollback was only 24 hours prior).

I can't believe all the griping in this thread. The single best post so far has been this one:

drunkmonkey wrote:The random outcome of my rolls was lost at a random point, and the new random results are different! It's an outrage!


But that doesn't stop people from having atrociously bad ideas:

CHECK-M8 wrote:All games in progress need to be deleted. That is the only fair way to do it.


:o

Wow. How about thinking next time before you post, okay? Can you imagine the outrage if the games were completed deleted? TOs and their clan counterparts would probably start finding and stabbing people. Not to the mention the thousands of users who would lose entire games rather than just a turn or so. Unbelievable that you would actually suggest this.

TheProwler, I was very interested in your post though:

show


I have been very interested to read the posts by folks that work in similar industries and their takes on the matter. You seem to be somewhat in the minority here, chalking it up to a complete failure rather than something that can be learned from and improved as we go forward. Nonetheless, I appreciated the informative post from that viewpoint. Makes me slightly reconsider my kudos to the admin. Although, I still think I come down generally supportive and impressed by their handling of this.

Finally, I think the biggest losers here are the forum posters. A lot of those guys are running tourneys, may have taken games out of their Watch This Game screen, are running clan wars, are posting long, informative forum posts, etc. That type of stuff is more of a bitch to redo than just having to take a few turns over again. I hope if the admin have to make a choice that they will put emphasis on keeping a live backup of the forum in the future. (errr ... the "biggest losers" are maybe the people who are groaning about the loss of "their" dice that they "should have" got a second time, but I meant biggest losers in a different sense.)
Nucker
Posts: 215
Joined: Mon Oct 15, 2012 2:27 pm
Gender: Male

Re: Server Downtime and Data Loss

Post by Nucker »

Well done guys. Our obsession is up and running again. How nice is to know in advance what evil intent some or other player had for you.
GoranZ
Posts: 2923
Joined: Sat Aug 22, 2009 3:14 pm

Re: Server Downtime and Data Loss

Post by GoranZ »

TheProwler wrote:
bigWham wrote:In the late evening of Oct 3 (CC Time) one of our core system data tables suffered data loss and could not be recovered.


I find this interesting...

Surely you have (lots of) storage redundancy...you should be able to recover from hardware failure without reverting to a backup.

Was it bad code? Did you implement a change that wasn't properly tested? Is your system documentation lacking and your developer(s) getting overwhelmed?

I'm just curious. Downtime is something that might happen when a disaster occurs. But having to go to a backup? Shit, somebody fucked up badly.


bigWham wrote:The only efficient solution was to roll back our entire database to the most recent backup, which happened to be approximately 24 hours before.


I think the word that is screaming at me in that sentence is "efficient".

Because I've designed a number of systems with 100+ tables...and if one the of the "core" tables somehow "suffered data loss", I would expect to be able to recover the vital information from those tables based on their child and parent tables data, and other related tables. Whatever table was lost, you should be able to re-build it with data other tables.

I know there might be some information loss like exact time of turns, but that wouldn't be a big deal. You could look at the physical order to the rows and estimate the time of turns. Obviously I have to speak in general terms because I don't know shit about your design or what table was lost. But you can go to a backup for everything up to the last backup, and then "fix" the data for the time since the last backup.


I guess without going on and on, I think you chose the word "efficient" because you know that there was a better solution with respect to recovering all the turns, but you were either too fuckin' lazy to do the work to take the site down and fix the problem properly, or because you don't understand the data well enough to fix the problem in an acceptable amount of time.


All these pats on the back that people are giving you shouldn't fool you; reverting to a backup is called "Failing".


I presume update was made on the production DB(by mistake) instead on testing one... And the update was with faulty "where" part. I mean that's the most common mistake to be made.

and yes reverting to backup is called Failing for those that understand how IT industry works :D

Well hopefully bugs from now on will not be as common as they are in the last few months
Even a little kid knows whats the name of my country... http://youtu.be/XFxjy7f9RpY

Interested in clans? Check out the Fallen!
Nucker
Posts: 215
Joined: Mon Oct 15, 2012 2:27 pm
Gender: Male

Re: Server Downtime and Data Loss

Post by Nucker »

Interesting how so many players complain about lost position and what is unfair. It is a symptom of a ME, ME, ME world. Clearly in 10 games these things will balance on the whole.

It is supposed to be strategy and playing the same hand again differently will be as much part of strategy as anything else.

What does become evident is the role the luck of the dice play in the outcome of these strategy positions.

But primarily the outrage is a positive sign that CC is healthy and in full cry.
User avatar
elbitjusticiero
Posts: 69
Joined: Tue Jun 21, 2011 8:19 pm
Gender: Male

Re: Server Downtime and Data Loss

Post by elbitjusticiero »

TheProwler wrote:I don't know shit about your design or what table was lost.

This is the important part.
User avatar
drunkmonkey
Posts: 1704
Joined: Thu May 14, 2009 4:00 pm

Re: Server Downtime and Data Loss

Post by drunkmonkey »

elbitjusticiero wrote:
TheProwler wrote:I don't know shit about your design or what table was lost.

This is the important part.

Not really. It just means he can't walk them through step-by-step on how to fix it. He was still spot on about the failure.
Image
User avatar
Tzentsu
Posts: 85
Joined: Wed Nov 14, 2012 3:48 pm

Re: Server Downtime and Data Loss

Post by Tzentsu »

Well done!! Glad to see you are back and if I only lost 1 day, that too is a bonus.

Tzen
User avatar
Dukasaur
Community Team
Community Team
Posts: 28215
Joined: Sat Nov 20, 2010 4:49 pm
Location: Beautiful Niagara

Re: Server Downtime and Data Loss

Post by Dukasaur »

drunkmonkey wrote:The random outcome of my rolls was lost at a random point, and the new random results are different! It's an outrage!

Bravo!

=D> =D> =D>
“‎Life is a shipwreck, but we must not forget to sing in the lifeboats.”
― Voltaire
gendotte
Posts: 29
Joined: Tue Oct 12, 2010 4:58 pm
Gender: Male

Re: Server Downtime and Data Loss

Post by gendotte »

How do I see the original post?
User avatar
Dukasaur
Community Team
Community Team
Posts: 28215
Joined: Sat Nov 20, 2010 4:49 pm
Location: Beautiful Niagara

Re: Server Downtime and Data Loss

Post by Dukasaur »

gendotte wrote:How do I see the original post?

Both at the bottom and at the top of the page are links that look like this:
Image
Those are all the pages in the thread. Click on "1" and you should be there.
“‎Life is a shipwreck, but we must not forget to sing in the lifeboats.”
― Voltaire
User avatar
agentcom
Posts: 3994
Joined: Tue Nov 09, 2010 8:50 pm

Re: Server Downtime and Data Loss

Post by agentcom »

gendotte wrote:How do I see the original post?


Yeah, I think the announcement atop the CC pages is linking to unread. Needs to be fixed to OP.
User avatar
garyshirley
Posts: 2
Joined: Tue Aug 04, 2009 11:08 am
Gender: Male
Location: Great Britain.

Re: Server Downtime and Data Loss

Post by garyshirley »

hi :D
User avatar
SteveHereNow
Posts: 12
Joined: Fri Apr 26, 2013 12:34 am
Gender: Male
Location: Benque, Belize, Central America

Re: Server Downtime and Data Loss

Post by SteveHereNow »

These colored pixels are harmless until regarded as important.
User avatar
misher
Posts: 101
Joined: Thu Jan 25, 2007 7:44 pm
Gender: Male
Location: Vancouver, BC

Re: Server Downtime and Data Loss

Post by misher »

I feel that in the end this is something like a hobby community than a multi million dollar gaming empire....so 24 hour rollback and efficient/effective recovery is surprising in itself! Goodjob! I've been playing this since 2007 and this has never happened that I can remember so its nice to see there are backups in place.

I would request that next time it say something like 24 hour rollback! in bold on the front page so I don't think I've somehow timetravelled to yesterday and check the date.....no wonder felt like im repeating my turns.
User avatar
bdb
Posts: 186
Joined: Tue Oct 12, 2010 9:32 pm
Location: skitown USA

Re: Server Downtime and Data Loss

Post by bdb »

Gee, if I had it to do over again..


Hey Wait... I DO :lol:
Image
The truth shall make ye fret -- Terry Pratchett
jrc1028
Posts: 1
Joined: Fri Jan 04, 2013 7:55 pm
Gender: Male
Location: Ohio

Re: Server Downtime and Data Loss

Post by jrc1028 »

You guys are always on top of these mishaps and they do happen from time to time. Thank you for your quick work on this and Thank you for the credit.
User avatar
cairnswk
Posts: 11510
Joined: Sat Feb 03, 2007 8:32 pm
Gender: Male
Location: Australia

Re: Server Downtime and Data Loss

Post by cairnswk »

Nice work gents. =D>
Image
* Pearl Harbour * Waterloo * Forbidden City * Jamaica * Pot Mosbi
User avatar
bigWham
Webmaster
Webmaster
Posts: 2921
Joined: Mon Aug 26, 2013 12:08 pm

Re: Server Downtime and Data Loss

Post by bigWham »

TheProwler wrote:
bigWham wrote:In the late evening of Oct 3 (CC Time) one of our core system data tables suffered data loss and could not be recovered.


I find this interesting...

Surely you have (lots of) storage redundancy...you should be able to recover from hardware failure without reverting to a backup.

Was it bad code? Did you implement a change that wasn't properly tested? Is your system documentation lacking and your developer(s) getting overwhelmed?

I'm just curious. Downtime is something that might happen when a disaster occurs. But having to go to a backup? Shit, somebody fucked up badly.


bigWham wrote:The only efficient solution was to roll back our entire database to the most recent backup, which happened to be approximately 24 hours before.


I think the word that is screaming at me in that sentence is "efficient".

Because I've designed a number of systems with 100+ tables...and if one the of the "core" tables somehow "suffered data loss", I would expect to be able to recover the vital information from those tables based on their child and parent tables data, and other related tables. Whatever table was lost, you should be able to re-build it with data other tables.

I know there might be some information loss like exact time of turns, but that wouldn't be a big deal. You could look at the physical order to the rows and estimate the time of turns. Obviously I have to speak in general terms because I don't know shit about your design or what table was lost. But you can go to a backup for everything up to the last backup, and then "fix" the data for the time since the last backup.


I guess without going on and on, I think you chose the word "efficient" because you know that there was a better solution with respect to recovering all the turns, but you were either too fuckin' lazy to do the work to take the site down and fix the problem properly, or because you don't understand the data well enough to fix the problem in an acceptable amount of time.


All these pats on the back that people are giving you shouldn't fool you; reverting to a backup is called "Failing".


I for one certainly do not ask for pats on the back in this situation, and I agree that reverting to a backup is, if not failure, certainly not success in any way.

Unwinding the data after losing parts of it may well have been possible, but unfortunately it would have been very complex, likely quite time consuming and may have left us with ongoing data inconsistency issues for an indefinite time. Since we worked on it all night, and continue to work on it, I don't feel that laziness was the issue - we just wanted to get CC back running reliably for our users in as quick a time as possible, and with the minimum ongoing disruption. Rolling back was time efficient, reliable and safe. So we made the executive decision that it was better to suffer a shorter setback of a known nature, and then proceed with a system state we could be confident in.

Rebuilding a complex system of interdependent data tables after arbitrary data loss is no easy task. CC does not currently have any master facility that can automatically rebuild everything after suffering losses of an arbitrary nature.... even if that were always possible. No previous owner created such a thing, and in my 6 weeks or so on the job, creating such a system was not exactly my #1 priority. Some tools of this nature may make sense, however our main focus will be enhancing the backup and recovery processes. I will report back to the community on the steps we have taken in the coming weeks.
User avatar
ruleroftheworld1
Posts: 88
Joined: Wed Aug 24, 2011 12:27 pm
Gender: Male
Location: THE OMEGA PANTHEON

Re: Server Downtime and Data Loss

Post by ruleroftheworld1 »

Thank you guys. Clearly you did everything possible and have our thanks.
Shino Tenshi
Posts: 166
Joined: Sat Sep 01, 2007 1:35 pm
Location: nostalgically reading the chat in game#14480932

Re: Server Downtime and Data Loss

Post by Shino Tenshi »

bigWham wrote:We apologize for the inconvenience and will be crediting Premium Members with a 2 days of extension to their Membership, and Freemiums with 4 free speed games in recognition.


I was only able to play 3 speed games :(
RyanHo
Posts: 1
Joined: Thu Aug 15, 2013 10:43 pm

Re: Server Downtime and Data Loss

Post by RyanHo »

LOOOOOOUD. NOISES.
BroncoJordy
Posts: 6
Joined: Sat Dec 04, 2010 11:46 pm

Re: Server Downtime and Data Loss

Post by BroncoJordy »

I am still waiting to receive my 95 points from 2 weeks ago when games ended early and credited wrong players...OR AT LEAST A RESPONSE TO THE HELP TICKET I OPENED
Last edited by BroncoJordy on Sat Oct 05, 2013 12:20 am, edited 1 time in total.
User avatar
Guderian09
Posts: 392
Joined: Fri Sep 26, 2008 12:20 pm
Location: Tibet

Re: Server Downtime and Data Loss

Post by Guderian09 »

Is PRISM and NSA excluded from the occurrence?..
User avatar
Slaylark
Posts: 176
Joined: Tue Mar 03, 2009 11:09 pm
Gender: Male
Location: New York

Re: Server Downtime and Data Loss

Post by Slaylark »

MULLIGAN!!!! WOOOOOT! :D
User avatar
DB4Christ
Posts: 4
Joined: Thu Aug 02, 2012 5:05 pm
Gender: Male
Location: Arizona

Re: Server Downtime and Data Loss

Post by DB4Christ »

Nice Work Team...and Thx!
Locked

Return to “Announcement Archives”