Moderator: Community Team
WestWind wrote:Conclusions
1. There is no significant difference between the expected and observed results of my 3v2 battles. However, there is only a 10-20% chance that the differences can be attributed to random chance- this is not enough to be significant, but it is interesting.
2. There IS a significant difference between the expected and observed results of my 3v1 battles. Furthermore, the difference is great enough that there is a 99.9% chance that something OTHER THAN RANDOM CHANCE is affecting the results of my battles.
Your sample size is too small: Granted, it's not 100,000 rolls, but it is more than enough to do a chi-square test on, and chi-square tests take sample size into consideration. The lower limit of accuracy for this test is when you have expected values of less than 5.
Metsfanmax wrote:Incorrect. This is not what p-values indicate - they don't give the probability that the null hypothesis is correct.
Metsfanmax wrote:Sure, but the test is only as meaningful as the conclusion you draw from it. I choose not to assign any real meaning to your small sample. Neither should anyone else. If you actually know statistics, then you should know that a sample of this size cannot be used to refute the null hypothesis.
If you got similar results on a set of 12,500 assaults, then something might be up (because there are 50,000 numbers per list, and each 3v1 uses 4 of those numbers). I don't believe the results on a set of less than 2,000 assaults matter.
maasman wrote:The only thing I see is that you need more 3v1 rolls, but otherwise nice analysis. Come back when you have a ton more rolls and see if the numbers change significantly. It would interesting to see if things even out or stay this skewed.
WestWind wrote:Sue me for using the most easily understood and accepted meaning of the value. Let's go with the technical interpretation and say that the p-value of my 3v1 data means that in another sample with the exact same sample mean, there is a 0.1% chance that we will observe the same magnitude of difference between the observed and expected results. At this level of significance its splitting hairs, but if that makes you happy so be it.
Alright, what about thousands of other studies, both statistical and scientific, that use sample sizes much smaller than 2000? Do you choose not to "assign any real meaning" to those either, even though they form the basis of the scientific and mathematical community? What sample size would you consider significant? Should a random sample of 2000 from random set of numbers be random, or do you allow for them to not be random because they do not include the entire list?
Also, my last 50 or so games would like to beg to differ that the results of a set of 2000 assaults matter. Let's not get too carried away with theorizing and forget about the very real effect this "randomness" has on the game.
WestWind wrote: there is a 0.1% chance that we will observe the same magnitude of difference between the observed and expected results. At this level of significance its splitting hairs, but if that makes you happy so be it.
WestWind wrote: Should a random sample of 2000 from random set of numbers be random, or do you allow for them to not be random because they do not include the entire list?
WestWind wrote:Thanks. I'm planning on keeping up with this and seeing where it goes. My guess is that they will eventually even out, but at this point it's showing what a huge effect this set of rolls has on the outcome of a series of ~50 games.
maasman wrote:I think the real question is, when will all 50,000 numbers be 6's? if it's random it has to happen someday...
Metsfanmax wrote:
It may be easily understood, but it's still wrong. The "technical interpretation" you gave here is also wrong. A p-value of 0.001 does not mean there's a 99.9% chance that if you repeated the sample, you would get a different result.
It is true that many scientists use P < 0.05 as an arbitrary line to determine statistical significance, but most statisticians do not. A p-value is literally just a number calculated from a formula. Choosing what number gives "strong" evidence that the null hypothesis is wrong, is completely arbitrary. Most statisticians would be loathe to say that P < 0.05 provides strong evidence. They would probably give a number more like 0.001, but even then they recognize that the test itself is not hard proof one way or the other - it's just evidence.
Those studies are limited by the results they have. They would prefer to have larger samples, but they don't. They have to use the information they have, so they're making the best guess they can. But as evidenced by the number of drugs that are publicly released with disastrous side effects, it is not an exact science, and it is precisely because their sample is so small. On CC, we have plenty of dice rolls to get data from. Choosing 2,000 assaults, when there are millions of assaults processed each month, just doesn't cut it.
I didn't say the results don't matter, I just said it's not statistically significant. Randomness includes streaks. Anyone who believes otherwise is deluding themselves.
WestWind wrote:
That's pretty much the interpretation that any course, book, and article has given me so I'm not sure where we're differing on our understanding of it. Maybe you have more of a background with it in statistics and my background with it is more in science. Also, P < 0.05 has been accepted, tested, and disputed for a long time in the scientific community so it's far from "arbitrary". Thankfully my P < 0.001, so it still falls in your statisticians' categories of strong evidenceAlso I never intended this to prove anything more than the fact that my experience with the CC dice has been pretty far from the expected experience in regards to luck.
Like I said, I'm coming from a scientific point of view, so 2000 assaults is a hefty amount of data. Many times in science we're dealing with sample sizes of less than 500, and most problems occur when people start to accept sample sizes of less than 100. Honestly, I would love to see the results if more people showed their dice results. I would be fine if someone could show me some real honest data supporting the fact that the CC dice are totally random, rather than just the shadowy theories and explanations that are thrown around.
When one "bad streak" consists of an entire player's rolls for 40-50 games, we might want to at least examine the cause rather than poo-poo it away.
Once again, I would love to see some hard evidence that this system is working and stats like mine aren't just an anomaly. Maybe I'm just that one unlucky player in 1000, but until I see evidence otherwise I'm going to remain skeptical.
the.killing.44 wrote:Fundamentally false.
Metsfanmax wrote:The probability of that, of course, is (1/6)^(50,000). There is no point in even writing that number down. (1/6)^(1000) = 7 * 10^-779. If you could roll a set one thousand dice every second, it would probably take you longer the age of the known universe to get all sixes. Specifically, if the age of the Universe is X, where X = 13.7 billion years, it would take you (X)^765.
the.killing.44 wrote:And the certainty of that having to happen is 0.
Metsfanmax wrote:If you are confused about what p-values indicate, read this wonderful article on the subject: http://www.ncbi.nlm.nih.gov/pmc/article ... ool=pubmed
It's a fairly obvious thing to observe: since the p-value really is just a statistic calculated from a formula, where we agree to create a line to determine "significant" versus "non-significant" is indeed arbitrary. The article points out that statisticians have developed more objective ways of doing these tests. Remember, the "scientific community" is very diverse in how well it understands math ;P I would never trust a biologist to tell me what p-value is significant, because the only thing they know is what they were told was significant. I suppose there are biologists who are also statisticians, but I still believe they are wrong if they automatically call any p-value below 0.05 significant.
As I said, the reason the scientific community uses samples of such a small size because it's all they have. This is why the field of error analysis is so important in many fields of science - these people know that their results are significantly uncertain because they don't have a large sample size, and so they want to quantify just how uncertain their results are. At any rate, it is clearly the case that the uncertainties involved mean that most scientific "results" are not certainties at all - they're just our best guesses.
If you want data that show that the CC dice are random, I suggest you visit http://www.random.org/statistics/.
When you're making an argument based on math, even if I think it's bad math, I can at least respect your effort. But this statement here, not so much. I will simply repeat what I said: streaks are inherent in randomness. To make them go away would be to rig the dice, and from your first post, it sounds like you'd prefer it if the dice were actually random...
WestWind wrote:.....Honestly, I would love to see the results if more people showed their dice results. I would be fine if someone could show me some real honest data supporting the fact that the CC dice are totally random, rather than just the shadowy theories and explanations that are thrown around.
...
Once again, I would love to see some hard evidence that this system is working and stats like mine aren't just an anomaly. Maybe I'm just that one unlucky player in 1000, but until I see evidence otherwise I'm going to remain skeptical.
Return to Conquer Club Discussion
Users browsing this forum: No registered users