"High Quality Random Numbers" Data & Analysis

by **WestWind** on Fri Oct 01, 2010 6:57 pm

Alright, I've been monitoring my dice rolls for a little bit and decided something is up, so I decided to analyze it.

If you want the brief summary, look at the Conclusions section of my post.

Data

Click image to enlarge.

Now I know that by uploading this, I'm going to get a few "Your sample size is too small." comments. However, my analysis takes that into account, so hold your horses.

Analysis

Looking at this, I was struck by the discrepancies in my dice rolls. I'm rolling +1.31% in my 3v2s, and -3.88% in my 3v1s. I decided to analyze both numbers using the simple Chi-Square test. If you're unfamiliar with it, the Chi-Square test is a very simple, widely-accepted test to see if variations between expected and observed results can be attributed to just random chance, or if there is something else at play. First, I need to have a null hypothesis: There is no significant difference between my observed dice outcomes and my expected dice outcomes. I am going to go with the majority of the science community and define a significant difference as a p value < 0.05. If p > 0.05, I accept my null hypothesis. If p < 0.05, I reject it and accept that there must be something besides random chance affecting my dice outcomes. With that out of the way, on to the calculations.

If you don't understand how chi-square values are calculated, take a statistics class. For those that know how this works, feel free to check my numbers. To find the chi square volume I'm taking (Observed - Expected)^2/Expected for each outcome and adding them together.

3v2 Calculations
Attacker Wins: (1385-1432)^2/1432 = 1.54
Tie: (1206-1208)^2/1208 = 0.003
Attacker Loses: (1008-1053)^2/1053 = 1.92

Add these together, and my chi-square value is 3.46. Since there are 3 outcomes, I have 2 degrees of freedom. Looking at a chi-square chart, my chi-square value falls between p values of 0.1 and 0.2. Since p > 0.05, we accept the null hypothesis that there is no significant difference between the expected dice outcomes and the observed dice outcomes. Alright, we can accept these as possibly random. They are clearly in my favor, but they still may be random. Now let's move on to 3v1.

3v1 Calculations
Attacker Wins: (1199-1273)^2/1273 = 4.30
Attacker Loses: (732-657)^2/657 = 8.33

Add these together and my chi-square value is 12.63 Since there are 2 possible outcomes in a 3v1 match, I have 1 degree of freedom. Looking at my chi-square chart, my chi-square value falls AFTER p values of 0.001. Since p < 0.05, we can conclude that there is a significant difference between my observed and expected results. FURTHERMORE: Based on the fact that p < 0.001, there is less than a 0.1% chance that these results can be attributed to chance alone.

Conclusions
1. There is no significant difference between the expected and observed results of my 3v2 battles. However, there is only a 10-20% chance that the differences can be attributed to random chance- this is not enough to be significant, but it is interesting.
2. There IS a significant difference between the expected and observed results of my 3v1 battles. Furthermore, the difference is great enough that there is a 99.9% chance that something OTHER THAN RANDOM CHANCE is affecting the results of my battles.
3. I have way too much time on my hands.

Now, I'm sure someone is going to find issues with this. These might be:
Your sample size is too small: Granted, it's not 100,000 rolls, but it is more than enough to do a chi-square test on, and chi-square tests take sample size into consideration. The lower limit of accuracy for this test is when you have expected values of less than 5.
Chi-Square is not a valid test: Says who? The vast majority of the scientific community uses it, and for this little project it works fine. Admins contend that all of the results of the dice rolls are purely random, and this tests that. If you feel like you have a better test, feel free to run it.
You did something wrong: Point it out and I'll take a look. I've checked my numbers, but it's not beyond possibility that I've messed up somewhere.

Anyways, hope you've enjoyed this and it does some good.

by **Metsfanmax** on Fri Oct 01, 2010 7:50 pm

WestWind wrote:Conclusions
1. There is no significant difference between the expected and observed results of my 3v2 battles. However, there is only a 10-20% chance that the differences can be attributed to random chance- this is not enough to be significant, but it is interesting.

2. There IS a significant difference between the expected and observed results of my 3v1 battles. Furthermore, the difference is great enough that there is a 99.9% chance that something OTHER THAN RANDOM CHANCE is affecting the results of my battles.

Incorrect. This is not what p-values indicate - they don't give the probability that the null hypothesis is correct.

Your sample size is too small: Granted, it's not 100,000 rolls, but it is more than enough to do a chi-square test on, and chi-square tests take sample size into consideration. The lower limit of accuracy for this test is when you have expected values of less than 5.

Sure, but the test is only as meaningful as the conclusion you draw from it. I choose not to assign any real meaning to your small sample. Neither should anyone else. If you actually know statistics, then you should know that a sample of this size cannot be used to refute the null hypothesis.

If you got similar results on a set of 12,500 assaults, then something might be up (because there are 50,000 numbers per list, and each 3v1 uses 4 of those numbers). I don't believe the results on a set of less than 2,000 assaults matter.

by **maasman** on Fri Oct 01, 2010 7:51 pm

The only thing I see is that you need more 3v1 rolls, but otherwise nice analysis. Come back when you have a ton more rolls and see if the numbers change significantly. It would interesting to see if things even out or stay this skewed.

by **WestWind** on Fri Oct 01, 2010 8:14 pm

Metsfanmax wrote:Incorrect. This is not what p-values indicate - they don't give the probability that the null hypothesis is correct.

Sue me for using the most easily understood and accepted meaning of the value. Let's go with the technical interpretation and say that the p-value of my 3v1 data means that in another sample with the exact same sample mean, there is a 0.1% chance that we will observe the same magnitude of difference between the observed and expected results. At this level of significance its splitting hairs, but if that makes you happy so be it.

Metsfanmax wrote:Sure, but the test is only as meaningful as the conclusion you draw from it. I choose not to assign any real meaning to your small sample. Neither should anyone else. If you actually know statistics, then you should know that a sample of this size cannot be used to refute the null hypothesis.

If you got similar results on a set of 12,500 assaults, then something might be up (because there are 50,000 numbers per list, and each 3v1 uses 4 of those numbers). I don't believe the results on a set of less than 2,000 assaults matter.

Alright, what about thousands of other studies, both statistical and scientific, that use sample sizes much smaller than 2000? Do you choose not to "assign any real meaning" to those either, even though they form the basis of the scientific and mathematical community? What sample size would you consider significant? Should a random sample of 2000 from random set of numbers be random, or do you allow for them to not be random because they do not include the entire list?

Also, my last 50 or so games would like to beg to differ that the results of a set of 2000 assaults matter. Let's not get too carried away with theorizing and forget about the very real effect this "randomness" has on the game.

maasman wrote:The only thing I see is that you need more 3v1 rolls, but otherwise nice analysis. Come back when you have a ton more rolls and see if the numbers change significantly. It would interesting to see if things even out or stay this skewed.

Thanks. I'm planning on keeping up with this and seeing where it goes. My guess is that they will eventually even out, but at this point it's showing what a huge effect this set of rolls has on the outcome of a series of ~50 games.

by **Metsfanmax** on Fri Oct 01, 2010 8:41 pm

WestWind wrote:Sue me for using the most easily understood and accepted meaning of the value. Let's go with the technical interpretation and say that the p-value of my 3v1 data means that in another sample with the exact same sample mean, there is a 0.1% chance that we will observe the same magnitude of difference between the observed and expected results. At this level of significance its splitting hairs, but if that makes you happy so be it.

It may be easily understood, but it's still wrong. The "technical interpretation" you gave here is also wrong. A p-value of 0.001 does not mean there's a 99.9% chance that if you repeated the sample, you would get a different result.

It is true that many scientists use P < 0.05 as an arbitrary line to determine statistical significance, but most statisticians do not. A p-value is literally just a number calculated from a formula. Choosing what number gives "strong" evidence that the null hypothesis is wrong, is completely arbitrary. Most statisticians would be loathe to say that P < 0.05 provides strong evidence. They would probably give a number more like 0.001, but even then they recognize that the test itself is not hard proof one way or the other - it's just evidence.

Alright, what about thousands of other studies, both statistical and scientific, that use sample sizes much smaller than 2000? Do you choose not to "assign any real meaning" to those either, even though they form the basis of the scientific and mathematical community? What sample size would you consider significant? Should a random sample of 2000 from random set of numbers be random, or do you allow for them to not be random because they do not include the entire list?

Those studies are limited by the results they have. They would prefer to have larger samples, but they don't. They have to use the information they have, so they're making the best guess they can. But as evidenced by the number of drugs that are publicly released with disastrous side effects, it is not an exact science, and it is precisely because their sample is so small. On CC, we have plenty of dice rolls to get data from. Choosing 2,000 assaults, when there are millions of assaults processed each month, just doesn't cut it.

Also, my last 50 or so games would like to beg to differ that the results of a set of 2000 assaults matter. Let's not get too carried away with theorizing and forget about the very real effect this "randomness" has on the game.

I didn't say the results don't matter, I just said it's not statistically significant. Randomness includes streaks. Anyone who believes otherwise is deluding themselves.

by **PLAYER57832** on Fri Oct 01, 2010 9:08 pm

WestWind wrote: there is a 0.1% chance that we will observe the same magnitude of difference between the observed and expected results. At this level of significance its splitting hairs, but if that makes you happy so be it.

Let's concentrate on this part, shall we. A 0.1% chance. Now, how many rolls do you expect CC sees in just one day?

roughly calculating there are maybe 20,000 online in any day. If each of those average just 3 rolls (not turns, ROLLS), then you have roughly 60,000 a day, or well over a million rolls in a week.

So, its pretty likely that an event with a 0.1% chance of happening happens more than once a week.

WestWind wrote: Should a random sample of 2000 from random set of numbers be random, or do you allow for them to not be random because they do not include the entire list?

What you do is show some extremely wide variations, to the point that such numbers are usually meaningless.

Or, you redefine your groups so that you can come up with a better distribution, better numbers.

WestWind wrote:Thanks. I'm planning on keeping up with this and seeing where it goes. My guess is that they will eventually even out, but at this point it's showing what a huge effect this set of rolls has on the outcome of a series of ~50 games.

Given the vast number of rolls, you may well see some anomolies. If you pick 1000 people from the US, you might just wind up with all midgets. It's not likely, but possible.

Statistics does not show the bounds of reality. It only shows the likelihood of approaching an average.

by **maasman** on Fri Oct 01, 2010 10:13 pm

I think the real question is, when will all 50,000 numbers be 6's? if it's random it has to happen someday...

by **the.killing.44** on Fri Oct 01, 2010 10:16 pm

maasman wrote:I think the real question is, when will all 50,000 numbers be 6's? if it's random it has to happen someday...

Fundamentally false.

by **Metsfanmax** on Fri Oct 01, 2010 10:20 pm

The probability of that, of course, is (1/6)^(50,000). There is no point in even writing that number down. (1/6)^(1000) = 7 * 10^-779. If you could roll a set one thousand dice every second, it would probably take you longer the age of the known universe to get all sixes. Specifically, if the age of the Universe is X, where X = 13.7 billion years, it would take you (X)^765.

by **the.killing.44** on Fri Oct 01, 2010 10:26 pm

And the certainty of that having to happen is 0.

by **WestWind** on Sat Oct 02, 2010 12:20 am

Metsfanmax wrote:
It may be easily understood, but it's still wrong. The "technical interpretation" you gave here is also wrong. A p-value of 0.001 does not mean there's a 99.9% chance that if you repeated the sample, you would get a different result.

It is true that many scientists use P < 0.05 as an arbitrary line to determine statistical significance, but most statisticians do not. A p-value is literally just a number calculated from a formula. Choosing what number gives "strong" evidence that the null hypothesis is wrong, is completely arbitrary. Most statisticians would be loathe to say that P < 0.05 provides strong evidence. They would probably give a number more like 0.001, but even then they recognize that the test itself is not hard proof one way or the other - it's just evidence.

That's pretty much the interpretation that any course, book, and article has given me so I'm not sure where we're differing on our understanding of it. Maybe you have more of a background with it in statistics and my background with it is more in science. Also, P < 0.05 has been accepted, tested, and disputed for a long time in the scientific community so it's far from "arbitrary". Thankfully my P < 0.001, so it still falls in your statisticians' categories of strong evidence

Also I never intended this to prove anything more than the fact that my experience with the CC dice has been pretty far from the expected experience in regards to luck.

Those studies are limited by the results they have. They would prefer to have larger samples, but they don't. They have to use the information they have, so they're making the best guess they can. But as evidenced by the number of drugs that are publicly released with disastrous side effects, it is not an exact science, and it is precisely because their sample is so small. On CC, we have plenty of dice rolls to get data from. Choosing 2,000 assaults, when there are millions of assaults processed each month, just doesn't cut it.

Like I said, I'm coming from a scientific point of view, so 2000 assaults is a hefty amount of data. Many times in science we're dealing with sample sizes of less than 500, and most problems occur when people start to accept sample sizes of less than 100. Honestly, I would love to see the results if more people showed their dice results. I would be fine if someone could show me some real honest data supporting the fact that the CC dice are totally random, rather than just the shadowy theories and explanations that are thrown around.

I didn't say the results don't matter, I just said it's not statistically significant. Randomness includes streaks. Anyone who believes otherwise is deluding themselves.

When one "bad streak" consists of an entire player's rolls for 40-50 games, we might want to at least examine the cause rather than poo-poo it away.

Once again, I would love to see some hard evidence that this system is working and stats like mine aren't just an anomaly. Maybe I'm just that one unlucky player in 1000, but until I see evidence otherwise I'm going to remain skeptical.

by **Metsfanmax** on Sat Oct 02, 2010 10:19 am

WestWind wrote:
That's pretty much the interpretation that any course, book, and article has given me so I'm not sure where we're differing on our understanding of it. Maybe you have more of a background with it in statistics and my background with it is more in science. Also, P < 0.05 has been accepted, tested, and disputed for a long time in the scientific community so it's far from "arbitrary". Thankfully my P < 0.001, so it still falls in your statisticians' categories of strong evidence Also I never intended this to prove anything more than the fact that my experience with the CC dice has been pretty far from the expected experience in regards to luck.

If you are confused about what p-values indicate, read this wonderful article on the subject: http://www.ncbi.nlm.nih.gov/pmc/article ... ool=pubmed

It's a fairly obvious thing to observe: since the p-value really is just a statistic calculated from a formula, where we agree to create a line to determine "significant" versus "non-significant" is indeed arbitrary. The article points out that statisticians have developed more objective ways of doing these tests. Remember, the "scientific community" is very diverse in how well it understands math ;P I would never trust a biologist to tell me what p-value is significant, because the only thing they know is what they were told was significant. I suppose there are biologists who are also statisticians, but I still believe they are wrong if they automatically call any p-value below 0.05 significant.

Like I said, I'm coming from a scientific point of view, so 2000 assaults is a hefty amount of data. Many times in science we're dealing with sample sizes of less than 500, and most problems occur when people start to accept sample sizes of less than 100. Honestly, I would love to see the results if more people showed their dice results. I would be fine if someone could show me some real honest data supporting the fact that the CC dice are totally random, rather than just the shadowy theories and explanations that are thrown around.

As I said, the reason the scientific community uses samples of such a small size because it's all they have. This is why the field of error analysis is so important in many fields of science - these people know that their results are significantly uncertain because they don't have a large sample size, and so they want to quantify just how uncertain their results are. At any rate, it is clearly the case that the uncertainties involved mean that most scientific "results" are not certainties at all - they're just our best guesses.

If you want data that show that the CC dice are random, I suggest you visit http://www.random.org/statistics/.

When one "bad streak" consists of an entire player's rolls for 40-50 games, we might want to at least examine the cause rather than poo-poo it away.

Once again, I would love to see some hard evidence that this system is working and stats like mine aren't just an anomaly. Maybe I'm just that one unlucky player in 1000, but until I see evidence otherwise I'm going to remain skeptical.

When you're making an argument based on math, even if I think it's bad math, I can at least respect your effort. But this statement here, not so much. I will simply repeat what I said: streaks are inherent in randomness. To make them go away would be to rig the dice, and from your first post, it sounds like you'd prefer it if the dice were actually random...

by **maasman** on Sat Oct 02, 2010 11:49 am

the.killing.44 wrote:Fundamentally false.

Metsfanmax wrote:The probability of that, of course, is (1/6)^(50,000). There is no point in even writing that number down. (1/6)^(1000) = 7 * 10^-779. If you could roll a set one thousand dice every second, it would probably take you longer the age of the known universe to get all sixes. Specifically, if the age of the Universe is X, where X = 13.7 billion years, it would take you (X)^765.

the.killing.44 wrote:And the certainty of that having to happen is 0.

I was trying to be funny guys, of course the chances of this happening are so small it doesn't matter. And it is certain to happen, it will just take longer than the life the universe, and many universes after (like approaching an infinite number of them).

by **general_c** on Sat Oct 02, 2010 12:22 pm

I don't think there is a problem with the overall number of battles won and lost. But I do think there is a problem with the standard deviation. I've had strange losing streaks and winning streak (more losing than winning).

by **WestWind** on Sat Oct 02, 2010 12:25 pm

Metsfanmax wrote:If you are confused about what p-values indicate, read this wonderful article on the subject: http://www.ncbi.nlm.nih.gov/pmc/article ... ool=pubmed

It's a fairly obvious thing to observe: since the p-value really is just a statistic calculated from a formula, where we agree to create a line to determine "significant" versus "non-significant" is indeed arbitrary. The article points out that statisticians have developed more objective ways of doing these tests. Remember, the "scientific community" is very diverse in how well it understands math ;P I would never trust a biologist to tell me what p-value is significant, because the only thing they know is what they were told was significant. I suppose there are biologists who are also statisticians, but I still believe they are wrong if they automatically call any p-value below 0.05 significant.

Thanks for the link to the article- I'll be sure to read it once I get the chance. Not all biologists are statisticians, but I would argue that quite a few are. Even in my small university there was a class devoted to nothing but how statistics are used in biology. We called it Biometrics, but I think that's a bit of a misnomer and wouldn't be surprised to see it called something else and have much more of a field around it in other universities. I would never say that p < 0.05 is a perfect rule, but I do think that it has at least served the scientific community fairly well, and at least gives us a starting point.

As I said, the reason the scientific community uses samples of such a small size because it's all they have. This is why the field of error analysis is so important in many fields of science - these people know that their results are significantly uncertain because they don't have a large sample size, and so they want to quantify just how uncertain their results are. At any rate, it is clearly the case that the uncertainties involved mean that most scientific "results" are not certainties at all - they're just our best guesses.

If you want data that show that the CC dice are random, I suggest you visit http://www.random.org/statistics/.

Heck, you don't have to tell me that anything in the scientific community is nothing but a best guess. I'm fine with that and totally agree. I didn't mean to come off saying that my little analysis clearly showed something is wrong with the CC dice- just that, in my case, the dice are not distributed in the way we would necessarily expect them to be.

I'll look at the random.org site, but I'll admit it's mostly over my head. For now I'll accept that they generate random numbers just fine- what I've been more concerned with is how the CC site uses these random numbers.

When you're making an argument based on math, even if I think it's bad math, I can at least respect your effort. But this statement here, not so much. I will simply repeat what I said: streaks are inherent in randomness. To make them go away would be to rig the dice, and from your first post, it sounds like you'd prefer it if the dice were actually random...

It's not the fact that I'm apparently having a major bad streak. My issue is that I've seen others post data showing the same skewed results, and I'm curious as to just how many of these hot/cold streaks are going on at one time. It just annoys me that rather than encourage discussion and provide some detailed information for how the site works, most of the admins/developers just harass and flame those that take the effort to attempt to analyze their own situation. If the numbers are randomly generated and used, fine. But show us. Do that, and I'll gladly take my apparent bad luck and keep playing.

Also, this kind of reminds me of one of the Bush vs. Gore debates where they started accusing each other of using fuzzy numbers to support all their claims

by **Metsfanmax** on Sat Oct 02, 2010 1:11 pm

Well, more or less everyone agrees that random.org does indeed produce high-quality numbers. Thus the only potential source of "error," as it were, is the PRNG that lack uses to pick a spot on the list of 50,000 numbers. Of course, since the original list is theoretically random, the PRNG doesn't change the true randomness of the dice you get. However, it's theoretically possible that the PRNG is biased towards picking spots on the list that give abnormal results. I find this to be incredibly unlikely over the long term, since the list is replaced every hour, and so even if one list is such that the PRNG gives abnormal results, the chances are slim that many of the lists are.

I have asked the admins, in an eTicket, to release some of the old lists of numbers so that the community can analyze them.

by **AndyDufresne** on Mon Oct 04, 2010 9:29 am

WestWind wrote:.....Honestly, I would love to see the results if more people showed their dice results. I would be fine if someone could show me some real honest data supporting the fact that the CC dice are totally random, rather than just the shadowy theories and explanations that are thrown around.

...

Once again, I would love to see some hard evidence that this system is working and stats like mine aren't just an anomaly. Maybe I'm just that one unlucky player in 1000, but until I see evidence otherwise I'm going to remain skeptical.

If I was more of a maths person I feel like I could be of more assistance in this topic, but I am not and certainly will not try to play the role!

Over the years of Conquer Club, we've had dozens and dozens of users contribute to the dice/intensity cube analysis debate. Perhaps a forum search can find the other topics where some users have come to the agreement that there isn't anything out of the ordinary.

--Andy