How random is VASSAL's dice-roller?
Stephen P. Berry has done some statistical analysis on a sample of dice rolls generated by the VASSAL engine. The short answer is that the results are indistinguishable from a random sample as far as any honest player would be able to tell. A maniacally motivated person might be able to guess the generator's seeding algorithm and therefore be able to make statistical predictions on future dice rolls based on the history of past dice rolls. However, I can recommend some much easier methods of cheating if that's what you're interested in. Here is the analysis:
Distribution of rolls From a test set of just over 35,000 2d6 rolls from the dicebot. The distribution looks pretty good:
DR Count Expected
2 525 487.4
3 943 974.9
4 1484 1462.3
5 1864 1949.8
6 2418 2437.2
7 2934 2924.7
8 2454 2437.2
9 1948 1949.8
10 1490 1462.3
11 1005 974.9
12 483 487.4
-------
Total 17548
Doing a simple chi-square hypothesis test, we get a chi-square of 9.82, which tells us that we do not reject the null hypothesis that the data conforms to the random distribution at a significance level of 0.001. The numbers work out even better if you just look at 1d6 instead of 2d6. So ...this tells us that the PRNG is spitting out numbers that are pretty uniform on {1, 2, 3, 4, 5, 6}.
Correlation between rolls To test correlation between rolls, a count is kept for each pair {d[n - 1], d[n]} for n from 2 to the number of total dr. The average is figured as ((1/6)^2) * (n - 1), and a chi-square computed. The results for the 85450 1d6 I have data on are fine:
d[n] d[n-1] Count
1 1 2402
1 2 2334
1 3 2387
1 4 2306
1 5 2361
1 6 2428
2 1 2378
2 2 2334
2 3 2427
2 4 2343
2 5 2381
2 6 2352
3 1 2432
3 2 2325
3 3 2329
3 4 2347
3 5 2404
3 6 2429
4 1 2309
4 2 2390
4 3 2337
4 4 2295
4 5 2350
4 6 2301
5 1 2347
5 2 2475
5 3 2390
5 4 2302
5 5 2547
5 6 2424
6 1 2351
6 2 2357
6 3 2396
6 4 2388
6 5 2442
6 6 2349
Expected value for each count: 2373.58333333333
Chi-square: 43.0407260471158
For df=35: p=0.01 p=0.005 p=0.001
49.80 57.34 66.62
...so there's no evidence there that the distribution of pairs of successive dr are not independent.
Another test of correlations is popularly called the coupon collector's test. For the current audience, instead of coupons we might think of baseball cards or any other widget that comes in sets of which you want to 'collect all n', for some sufficiently marketable value of n.
The idea is that we walk through our data, starting with an empty set, then add the elements of our data to the set, one by one, until we have acomplete set of one of each type (not counting duplicates). So for my dataset of 85450 1d6, we start with a set of zero 1s, zero 2s, and so on. Each time we see a new dr, we add it to our set. We keep track of how long it takes us to get a complete set of the numbers in [1, 6], then re-start from an empty set (selling the completed set on eBay, complete with photocopies of a couple other dr) and do it again.
After counting up the length between complete sets (i.e., the number of dr we had look through before we got one of each possible dr) we can do a trust ol' chi-square on the resulting data, where the probability for each length will be:
d!
p(r) = ----- * S((r - 1), (d - 1)) , d <= r < t
d^r
d!
p(t) = 1 - ----------- * S((t - 1), d)
d^(t - 1)
...where: r is the length in question; d is the number of items in a complete set; S(n,m) is the Stirling number of the second kind (which is the number of ways you can partition a set of n elements into m disjoint, nonempty subsets; and t the upper bound of the lengths we're going to consider (so if we were interested in lengths from 1 to 10, t would be 11. we'll choose t such that the expected value for the highest r will be around 1. If this doesn't make sense, just peek down at the data and it'll probably be more clear).
We multiply these values for p(r) with the total number of lengths we measure to obtain an expected value for each length, then do our chi-square.
The data once again looks good:
Length Count Expected Value
6 85 89.75309
7 223 224.38272
8 372 349.03978
9 420 436.29973
10 475 481.38403
11 477 490.83719
12 490 474.63947
13 477 442.26326
14 401 401.22929
15 377 356.91274
16 293 312.85229
17 247 271.18876
18 235 233.07425
19 185 199.00003
20 174 169.03892
21 162 143.01511
22 104 120.61831
23 99 101.47771
24 75 85.20793
25 68 71.43616
26 50 59.81688
27 57 50.03876
28 54 41.82664
29 24 34.94069
30 39 29.17404
31 23 24.34958
32 24 20.31657
33 21 16.94732
34 13 14.13400
35 11 11.78582
36 10 9.82651
37 11 8.19208
38 7 6.82895
39 9 5.69227
40 5 4.74455
41 1 3.95445
42 5 3.29581
43 3 2.74680
44 2 2.28920
45 1 1.90779
46 1 1.58991
47 0 1.32499
48 1 1.10419
49 0 0.92019
=50 5 4.60124
Chi-square: 41.73981
For df=43: p=0.01 p=0.005 p=0.001
59.30 67.46 77.42
So...once again we aren't forced to reject the null hypothesis; there's no evidence that the dice are biased in a statistically significant way.
There's a more general test for this sort of thing, called a 'birthday spacings' test (e.g., in Marsaglia's DIEHARD).
Imagine you have an arbitrary calendar of d days. Select b birthdays out of that calendar, and list them in nondescending order. Make a list of the distances between the birthdays, then sort them into nondescending order. Go through the list of distances, and count the number of times a distance is the same as the one that preceeded it (i.e., for distances s[0, 1, ...], count the times s[i] = s[i - 1]). Call this number n. Assuming that the birthdays were selected at random, then n should be Poisson distributed with a lambda of (b^3)/(4d). According to Marsaglia, experimentation suggests that you need d >= 2^18.
What are we doing here? We're just trying to come up with a general model that lets us figure out what an expected distribution for the differences between dr is. The birthday spacings test gives us such a model. Now we need to find a way to convert our dr into birthdays in a calendar (with lots and lots of days...specifically, at least 2^18 days).
Okay. If we tried just mapping our die rolls directly to birthdays, we'd end up with a calendar with at most 6 days. That's somewhat less than our desired 2^18. So we figure ln(2^18)/ln(6) = 6.96. This tells us that 2^18 is roughly 6^6.96. What does this mean? Well, we're trying to figure out how many dr we need to string together to make a problem space roughly the same size as our desired calendar. Put in somewhat different terms, we can imagine a binary number as a bunch of coin tosses (each coin has two states---heads or tails---just like each bit has two states---0 or 1). What we're figuring out is more or less how many die rolls (things with six states) have as many possible combinations as 18 coin tosses (things with two states).
So if we do 18 coin tosses, there are 2^18 or 262144 possible results. By doing the ln(2^18)/ln(6) = 6.96, we're discovering that in order to get this many possible results out of die rolls, we need to roll about 6.96 dice. Call it 7 dr, for a total of 6^7 = 279936 possible results---just over what we need.
This also tells us that when we're looking at the dice to generate birthdays in our calendar with 279936 days, we'll need to roll 7 dice at a time. We'll also have to subtract one from each dr to insure we can generate numbers from 0 to 279935. So if we look at seven dr {1, 1, 1, 1, 1, 1, 2} we'd read that as (6^6 * 0) + (6^5 * 0) + (6^4 * 0) + (6^3 * 0) + (6^2 * 0) + (6^1 * 0) + (6^0 * 1) = 1. And the dr {6, 6, 6, 6, 6, 6, 6} would be (6^6 * 5) + (6^5 * 5) + (6^4 * 5) + (6^3 * 5) + (6^2 * 5) + (6^1 * 5) + (6^0 * 5) = 279935.
Okay. This all defined, we can grab dr, seven at a time, and convert them into an integer in [0, 279935]. We can then walk through our data starting from the first dr and compute the number of recurrent birthday distances (our value n above in the original description of the test). We'll choose a number of birthdays per test that will keep our lambda managable. With 130 birthdays per test, it works out to (130^3) / (4 * 6^7) or about 1.96 . We plug this into the Poisson equation:
(e^(-l))(l^x)
P(x) = ---------------
x!
...where: l is our lambda and x is the number of collisions we're interested in {0, 1, 2, 3...}. We'll actually decide on a maximum value for x based on the data (the number of collisions will always be fairly small, and the expected number will depend on the number of samples we take). We'll also use the expected number and the observed number to do a chi-square test to get a p value for the test.
So much for the theory. Propping the poncho lifter back on my F2 key, I generated another 50354 dr which, along with my original data total up to 85450 dr.
I knocked together some code to run through multiple tests on these data, grabbing 910 dr at a go. That's 130 birthdays per test (as determined above), and 7 dr per birthday. So we'll get a total of (85450 / 910) = 93.9 collision counts. Ignoring the partial data set, we'll go with 93 values for n.
Reading through the data, before doing anything else, a chi-square value for the birthdays is determined---just to insure that our data aren't 'failing' before we go further. Then the rest of the mechanics of the birthday spacings test are run through.
So much for the code. After a little number crunching, we learn:
Duplicate Observed Expected
Spacings Number Number
-------- ------ ------
0 12 13.073
1 34 25.650
2 16 25.163
3 19 16.457
4 10 8.072
5 2 3.168
6 0 1.036
7 0 0.290
8 0 0.071
9 0 0.016
Chi-square: 8.83976326165167
This is pretty darn good. We're certainly not lead to reject H0 at any meaningful significance level. It's also worth noting that the birthday spacings test is often failed by PRNGs[3], so this is heartening for the of the underlying PRNG for use in the dicebot.
So this is a pretty strong indicator that we don't have statistically significant correlations between arbitrary numbers output by the dice bot.