How random is VASSAL's dice-roller?

Stephen P. Berry has done some statistical analysis on a sample of dice rolls generated by the VASSAL engine. The short answer is that the results are indistinguishable from a random sample as far as any honest player would be able to tell. A maniacally motivated person might be able to guess the generator's seeding algorithm and therefore be able to make statistical predictions on future dice rolls based on the history of past dice rolls. However, I can recommend some much easier methods of cheating if that's what you're interested in. Here is the analysis:

Distribution of rolls From a test set of just over 35,000 2d6 rolls from the dicebot. The distribution looks pretty good:

DR Count Expected

     2  525   487.4
     3  943   974.9
     4 1484  1462.3
     5 1864  1949.8
     6 2418  2437.2
     7 2934  2924.7
     8 2454  2437.2
     9 1948  1949.8
    10 1490  1462.3
    11 1005   974.9
    12  483   487.4
            -------
  Total        17548

Doing a simple chi-square hypothesis test, we get a chi-square of 9.82, which tells us that we do not reject the null hypothesis that the data conforms to the random distribution at a significance level of 0.001. The numbers work out even better if you just look at 1d6 instead of 2d6. So ...this tells us that the PRNG is spitting out numbers that are pretty uniform on {1, 2, 3, 4, 5, 6}.

Correlation between rolls To test correlation between rolls, a count is kept for each pair {d[n - 1], d[n]} for n from 2 to the number of total dr.  The average is figured as ((1/6)^2) * (n - 1), and a chi-square computed.  The results for the 85450 1d6 I have data on are fine:

  d[n] d[n-1] Count

  1 1 2402
  1 2 2334
  1 3 2387
  1 4 2306
  1 5 2361
  1 6 2428
  2 1 2378
  2 2 2334
  2 3 2427
  2 4 2343
  2 5 2381
  2 6 2352
  3 1 2432
  3 2 2325
  3 3 2329
  3 4 2347
  3 5 2404
  3 6 2429
  4 1 2309
  4 2 2390
  4 3 2337
  4 4 2295
  4 5 2350
  4 6 2301
  5 1 2347
  5 2 2475
  5 3 2390
  5 4 2302
  5 5 2547
  5 6 2424
  6 1 2351
  6 2 2357
  6 3 2396
  6 4 2388
  6 5 2442
  6 6 2349

Expected value for each count:  2373.58333333333
Chi-square:  43.0407260471158
For df=35:  p=0.01 p=0.005 p=0.001
             49.80 57.34 66.62

...so there's no evidence there that the distribution of pairs of successive dr are not independent.

Another test of correlations is popularly called the coupon collector's test. For the current audience, instead of coupons we might think of baseball cards or any other widget that comes in sets of which you want to 'collect all n', for some sufficiently marketable value of n.

The idea is that we walk through our data, starting with an empty set, then add the elements of our data to the set, one by one, until we have acomplete set of one of each type (not counting duplicates). So for my dataset of 85450 1d6, we start with a set of zero 1s, zero 2s, and so on. Each time we see a new dr, we add it to our set. We keep track of how long it takes us to get a complete set of the numbers in [1, 6], then re-start from an empty set (selling the completed set on eBay, complete with photocopies of a couple other dr) and do it again.

After counting up the length between complete sets (i.e., the number of dr we had look through before we got one of each possible dr) we can do a trust ol' chi-square on the resulting data, where the probability for each length will be:

                   d!  p(r) = ----- * S((r - 1), (d - 1)) , d <= r < t
         d^r

                d!
 p(t) = 1 - ----------- * S((t - 1), d)
            d^(t - 1)

...where: r is the length in question; d is the number of items in a complete set; S(n,m) is the Stirling number of the second kind (which is the number of ways you can partition a set of n elements into m disjoint, nonempty subsets; and t the upper bound of the lengths we're going to consider (so if we were interested in lengths from 1 to 10, t would be 11. we'll choose t such that the expected value for the highest r will be around 1. If this doesn't make sense, just peek down at the data and it'll probably be more clear).

We multiply these values for p(r) with the total number of lengths we measure to obtain an expected value for each length, then do our chi-square.

The data once again looks good:

Length Count Expected Value

      6  85   89.75309
      7 223  224.38272
      8 372  349.03978
      9 420  436.29973
     10 475  481.38403
     11 477  490.83719
     12 490  474.63947
     13 477  442.26326
     14 401  401.22929
     15 377  356.91274
     16 293  312.85229
     17 247  271.18876
     18 235  233.07425
     19 185  199.00003
     20 174  169.03892
     21 162  143.01511
     22 104  120.61831
     23  99  101.47771
     24  75   85.20793
     25  68   71.43616
     26  50   59.81688
     27  57   50.03876
     28  54   41.82664
     29  24   34.94069
     30  39   29.17404
     31  23   24.34958
     32  24   20.31657
     33  21   16.94732
     34  13   14.13400
     35  11   11.78582
     36  10    9.82651
     37  11    8.19208
     38   7    6.82895
     39   9    5.69227
     40   5    4.74455
     41   1    3.95445
     42   5    3.29581
     43   3    2.74680
     44   2    2.28920
     45   1    1.90779
     46   1    1.58991
     47   0    1.32499
     48   1    1.10419
     49   0    0.92019
    =50   5    4.60124
 
Chi-square:    41.73981
For df=43:  p=0.01 p=0.005 p=0.001
              59.30 67.46 77.42

So...once again we aren't forced to reject the null hypothesis; there's no evidence that the dice are biased in a statistically significant way.

There's a more general test for this sort of thing, called a 'birthday spacings' test (e.g., in Marsaglia's DIEHARD).

Imagine you have an arbitrary calendar of d days. Select b birthdays out of that calendar, and list them in nondescending order. Make a list of the distances between the birthdays, then sort them into nondescending order. Go through the list of distances, and count the number of times a distance is the same as the one that preceeded it (i.e., for distances s[0, 1, ...], count the times s[i] = s[i - 1]). Call this number n. Assuming that the birthdays were selected at random, then n should be Poisson distributed with a lambda of (b^3)/(4d). According to Marsaglia, experimentation suggests that you need d >= 2^18.

What are we doing here? We're just trying to come up with a general model that lets us figure out what an expected distribution for the differences between dr is. The birthday spacings test gives us such a model. Now we need to find a way to convert our dr into birthdays in a calendar (with lots and lots of days...specifically, at least 2^18 days).

Okay. If we tried just mapping our die rolls directly to birthdays, we'd end up with a calendar with at most 6 days. That's somewhat less than our desired 2^18. So we figure ln(2^18)/ln(6) = 6.96. This tells us that 2^18 is roughly 6^6.96. What does this mean? Well, we're trying to figure out how many dr we need to string together to make a problem space roughly the same size as our desired calendar. Put in somewhat different terms, we can imagine a binary number as a bunch of coin tosses (each coin has two states---heads or tails---just like each bit has two states---0 or 1). What we're figuring out is more or less how many die rolls (things with six states) have as many possible combinations as 18 coin tosses (things with two states).

So if we do 18 coin tosses, there are 2^18 or 262144 possible results. By doing the ln(2^18)/ln(6) = 6.96, we're discovering that in order to get this many possible results out of die rolls, we need to roll about 6.96 dice. Call it 7 dr, for a total of 6^7 = 279936 possible results---just over what we need.

This also tells us that when we're looking at the dice to generate birthdays in our calendar with 279936 days, we'll need to roll 7 dice at a time. We'll also have to subtract one from each dr to insure we can generate numbers from 0 to 279935. So if we look at seven dr {1, 1, 1, 1, 1, 1, 2} we'd read that as (6^6 * 0) + (6^5 * 0) + (6^4 * 0) + (6^3 * 0) + (6^2 * 0) + (6^1 * 0) + (6^0 * 1) = 1. And the dr {6, 6, 6, 6, 6, 6, 6} would be (6^6 * 5) + (6^5 * 5) + (6^4 * 5) + (6^3 * 5) + (6^2 * 5) + (6^1 * 5) + (6^0 * 5) = 279935.

Okay. This all defined, we can grab dr, seven at a time, and convert them into an integer in [0, 279935]. We can then walk through our data starting from the first dr and compute the number of recurrent birthday distances (our value n above in the original description of the test). We'll choose a number of birthdays per test that will keep our lambda managable. With 130 birthdays per test, it works out to (130^3) / (4 * 6^7) or about 1.96 . We plug this into the Poisson equation:

   (e^(-l))(l^x)
  P(x) = ---------------
              x!

...where: l is our lambda and x is the number of collisions we're interested in {0, 1, 2, 3...}. We'll actually decide on a maximum value for x based on the data (the number of collisions will always be fairly small, and the expected number will depend on the number of samples we take). We'll also use the expected number and the observed number to do a chi-square test to get a p value for the test.

So much for the theory. Propping the poncho lifter back on my F2 key, I generated another 50354 dr which, along with my original data total up to 85450 dr.

I knocked together some code to run through multiple tests on these data, grabbing 910 dr at a go. That's 130 birthdays per test (as determined above), and 7 dr per birthday. So we'll get a total of (85450 / 910) = 93.9 collision counts. Ignoring the partial data set, we'll go with 93 values for n.

Reading through the data, before doing anything else, a chi-square value for the birthdays is determined---just to insure that our data aren't 'failing' before we go further. Then the rest of the mechanics of the birthday spacings test are run through.

So much for the code. After a little number crunching, we learn:

 Duplicate       Observed       Expected
 Spacings        Number         Number
 --------        ------         ------
 0                12         13.073
 1                34         25.650
 2                16         25.163
 3                19         16.457
 4                10          8.072
 5                 2          3.168
 6                 0          1.036
 7                 0          0.290
 8                 0          0.071
 9                 0          0.016

Chi-square: 8.83976326165167

This is pretty darn good. We're certainly not lead to reject H0 at any meaningful significance level. It's also worth noting that the birthday spacings test is often failed by PRNGs[3], so this is heartening for the suitability of the underlying PRNG for use in the dicebot.

So this is a pretty strong indicator that we don't have statistically significant correlations between arbitrary numbers output by the dice bot.