Board Game Arena, Catan, virtual dice and randomness - a short trial

Forum rules
Please DO NOT POST BUGS on this forum. Please report (and vote) bugs on : https://boardgamearena.com/bugs
User avatar
Romain672
Posts: 1014
Joined: 05 April 2016, 13:53

Re: Board Game Arena, Catan, virtual dice and randomness - a short trial

Post by Romain672 »

Mmmm okay.
That look much more convincing even on the specific point I checked.
So from my 10 randoms trials, 8 were far from what you checked, and 2 were one below (I got two 19s).
Still no one of those were at 20 or more.


Now for your second point about the weird results you got, I added a second page to it.
I take this formula: "=INT(1/RAND())*20" which should give random numbers outside of the 95% interval.

If I focus on the 5 most unlikely results you find those as being 230177>471>452>380>340.
In 20 tries here is the result which got the highest 5th: 1340>400>300>140>140.
Here is the result with the second highest 5th: 2960>340>340>160>120.
Here is the one with the highest highest: 63240>2960>200>80>80.
So you beat all those... Again.

Note those series are independant. The only thing which could prevent me from multiplying them (which would be for my simulations to be the most unlikely of 10*20, so 200+ games) is if Muntzer could have taken differents ways of looking at his set.

But yeah that look weird. I suppose there will be follow up by other people, but would be nice for this experience to be done again.
User avatar
siverure
Posts: 21
Joined: 24 January 2021, 16:01

Re: Board Game Arena, Catan, virtual dice and randomness - a short trial

Post by siverure »

I'm not sure about everything here, but my understanding is that you examined individual numbers over 20 games to see if anything was exceptionally above or below the expected rolls. It occurs to me that rolling one number an above average amount of times and another number a below average amount of times in the same game aren't independent events. I'm uncertain if you've accounted for this at all, especially given that the result is a little less than double what you expected. If you flip a coin 10 times and count 10 heads and 0 tails you only had one unusual set of coinflips, not two. I'm unsure how much this applies when you have eleven results but the same principle should have some effect.

The later post stating that similar expected numbers had massively disproportionate actual rolls in games seems to me like a flaw of a game involving randomness, not a flaw in the randomness. Again, for one number to be rolled an above average amount, another must have been rolled a below average amount.
User avatar
euklid314
Posts: 292
Joined: 06 April 2020, 22:56

Re: Board Game Arena, Catan, virtual dice and randomness - a short trial

Post by euklid314 »

As far as I can see, all percentages of muntzer are calculated correctly and everything is explained very clearly. Respect! I will try to achieve the same with my words, but I am not sure if I will succeed.

First, I want to comment on the fact that many of the examined events were outside their 95%-confidence interval. In a seperate thread I will perhaps later discuss the most striking outlier on its own (28 nines out of 102 rolls in a single game) and its consequences.
muntzer wrote: 01 November 2022, 12:20 I was surprised by these results - there are lots more outcomes outside our confidence intervals than I would have expected in this case. From 20 trials, I would have expected to see values outside those confidence intervals about once per number rolled; in other words, about 11 in total. What I found instead was 20 of those outliers.
As you and Romain both found: In the 20 games 11 numbers each where considered, thus, there were a total of 220 "events" that could either lie within their 95%-interval or lie outside their 95%-interval. From these 220 events only 11 should be expected to lie outside the 95%-interval, but in muntzers data there were 20 instead of the expected 11 "outliers". This was surprising for muntzer. But is this really that unheard-of, if we consider fair dice?

Please note that we can use binomial distribution again. We have 220 "experiments", which are "independent" (I will come to that later) from each other. Each experiment will "succeed" with 95% and will "fail" with 5%. To make it clear what I mean with the 220 "experiments": One single of these experiments would be counting the number of a specific roll, say 10s, in a specific game, say game 7. This single experiment would "succeed" if the number of 10s in game 7 was between 4 and 14 and the experiment would "fail" if the number of 10s were below 4 or above 14.

So we have n=220 experiments and the fail percentage is p=0.05.
The expected value E=n*p=220*0.05=11, as noted above. But of course we don´t expect to get 11 "failures" every time that we play 20 games. The failure rate has a standard deviation and confidence intervals on its own.

The probability that we get 20 or more failures is 100% - BINOM.DIST(19, 220, 0.05, TRUE) = 0.8%. The probability that we get 3 or less failures is BINOM.DIST(3, 220, 0.05, TRUE) = 0.4% - this event of 3 or less outliers would be similar surprising than getting 20 or more outliers.

So there is a probability of 1.2% that in a series of 20 Catan games (with approx 100 moves each) there are 3- or 20+ numbers outside their expected 95%-interval.

My conclusion: muntzer found an unlikely event in his 20 games which has a probability of only 1.2%. Quite an unlikely event, but not shocking at all.

But please note one further aspect: Please note that the 220 "experiments" which I defined above are NOT independent from each other. If one number is an outlier *above* their expected 95%-interval then this significantly(!) increases the probability that some other number will be *below" their 95%-interval. Thus one outlier will induce other outliers. You have either no outliers in one game or several of them in one game. Thus, getting 20 events outside the 95%-interval instead of 11 events is probably much more likely than the calculation above shows.

This clustering is what muntzer noted in one of his threads
muntzer wrote: 01 November 2022, 12:28 To make things worse, the combined counts for different sets of hexes were often ridiculously skewed, i.e.

Game #3: 3s came up once; 2s and 10s = 22 rolls
So what you noticed here, muntzer, is not making matters worse, but this is normal. If 3s come up only once (outlier downwards) the probability of getting some other numbers as outliers upwards is perfectly normal and to be expected.

Edit: While I was writing my lengthy comment siverure rightly mentioned one of the important aspects that I did cover independently from him. He is perfectly hitting a valid point!
User avatar
Jellby
Posts: 1350
Joined: 31 December 2013, 12:22

Re: Board Game Arena, Catan, virtual dice and randomness - a short trial

Post by Jellby »

How many possible "unlikely events" are there? What is the probability of some "unlikely event" happening? As I indirectly mentioned about another game with dice rolls, the probability of any specific sequence of rolls is very small, and yet, it's 100% guaranteed that every game will contain some specific sequence of rolls (well, maybe not the games abandoned before the first roll).

You are not allowed to analyse the results afterwards and identify unlikely events, there will (almost) always be some unlikely event, and if there isn't... hey, that's an unlikely event! You have to define what's the unlikely event you're searching for beforehand, compute the odds, and verify after.

That said, I'm pretty sure the RNG used in BGA is flawed. All deterministic RNGs are flawed (and I presume the one used is deterministic). The question is whether that leads to any identifiable pattern or bias when applied to dice rolls, card shuffles, etc., and there I'm also pretty sure the answer is no. But I'm no authority.
User avatar
SwHawk
Posts: 133
Joined: 23 August 2015, 16:45

Re: Board Game Arena, Catan, virtual dice and randomness - a short trial

Post by SwHawk »

What always amazes me in this kind of thread, is that you pretty much all compare results from dice throws that were done in Excel, while we know for a fact that dice throws in BGA don't come from Excel. The underlying architecture is based on PHP. Has anyone bothered to check whether the RNG algorithm in Excel matches the RNGs in PHP? In PHP there are at least 3 functions that can generate pseudo random numbers. But you're using Excel's as a comparison as if it were the best RNG available... I'm no specialist about Excel or RNGs but the main purpose of Excel is not to generate random numbers so I wouldn't be surprised that it is an average RNG... EDIT : I've checked, Excel uses the Mersen Twister (MT) algorithm to generate pseudo random numbers, which is an average RNG, but not suited for cryptographic operations... Actually, PHP also offers an implementation of the MT algorithm, but then again, is it the one BGA uses ?

Also, yes, there I've said it, pseudo random. But hey, that's the max computers and humans can do. So yes they are inherently flawed. The question is, as Jelby mentioned, whether this inherent flaw has a significant impact on the games played in BGA... To know that, you need to know which of the 3 methods PHP makes available to generate pseudo random numbers. One simple way to do it it to compare the outputs of those 3 methods with the output of BGA... With all the strictness necessary... Once that is done, to see if it interferes with fairness on games you should compare with real dice throws (which is supposed to give you random numbers)...

As I said I'm no specialist, but I've always been told that real dice throws aren't exactly random... My guess is that computer RNGs are better than the real deal, but hey... I may be mistaken...

But if you want to prove that BGA's RNG isn't fair to players, then the least you can do is simulate the dice throws with the same algorithm. Otherwise, if you're using Excel, you're just checking whether Excel's RNG is fair or not... Not BGA's.
Last edited by SwHawk on 01 November 2022, 21:19, edited 1 time in total.
User avatar
euklid314
Posts: 292
Joined: 06 April 2020, 22:56

Re: Board Game Arena, Catan, virtual dice and randomness - a short trial

Post by euklid314 »

Jellby wrote: 01 November 2022, 18:47 How many possible "unlikely events" are there? What is the probability of some "unlikely event" happening? As I indirectly mentioned about another game with dice rolls, the probability of any specific sequence of rolls is very small, and yet, it's 100% guaranteed that every game will contain some specific sequence of rolls (well, maybe not the games abandoned before the first roll).

You are not allowed to analyse the results afterwards and identify unlikely events, there will (almost) always be some unlikely event, and if there isn't... hey, that's an unlikely event! You have to define what's the unlikely event you're searching for beforehand, compute the odds, and verify after.

That said, I'm pretty sure the RNG used in BGA is flawed. All deterministic RNGs are flawed (and I presume the one used is deterministic). The question is whether that leads to any identifiable pattern or bias when applied to dice rolls, card shuffles, etc., and there I'm also pretty sure the answer is no. But I'm no authority.
You are correct with the "unlikely events", Jellby, but this time muntzer did take care of your concerns. He defined beforehand what he wanted to look at (no reason to doubt his words) and he did analyze with some decent understanding of probabilities. He found something that worried him - more than it should have. :-)

With your concerns about a "flawed" RNG: I am absolutely convinced that the RNG are far more fair than any ordinary physical dice you and me have at home and the RNG is working just as fine as the best precision dice you can buy. The pseudo-random numbers that are generated here are enough "random" for any purposes that this site wants them to be.
muntzer
Posts: 22
Joined: 13 September 2021, 18:12

Re: Board Game Arena, Catan, virtual dice and randomness - a short trial

Post by muntzer »

SwHawk wrote: 01 November 2022, 21:09 What always amazes me in this kind of thread, is that you pretty much all compare results from dice throws that were done in Excel, while we know for a fact that dice throws in BGA don't come from Excel.
What always amazes me are the people who debunk analysis without understanding the method used in the analysis.

I didn't use Excel to simulate dice rolls. I used Excel to calculate the exact expectations of dice rolls in a game of Catan. I could have done all those calculations using pen and paper, but hey. Computers are faster.
User avatar
SwHawk
Posts: 133
Joined: 23 August 2015, 16:45

Re: Board Game Arena, Catan, virtual dice and randomness - a short trial

Post by SwHawk »

And yet, you miss the point entirely: The real concern isn't whether the RNG is flawed, because we know it is up to a certain standard... The question is, whether it is actually more or less flawed than real dice throws, for a real game of Catan. If the BGA implementation is actually fairer than a real game of Catan, is it a problem or not ?

Also I'm not trying to debunk your results... Which appear sound for now. I'm just analyzing the method.
User avatar
euklid314
Posts: 292
Joined: 06 April 2020, 22:56

Re: Board Game Arena, Catan, virtual dice and randomness - a short trial

Post by euklid314 »

SwHawk wrote: 01 November 2022, 21:09 What always amazes me in this kind of thread, is that you pretty much all compare results from dice throws that were done in Excel, while we know for a fact that dice throws in BGA don't come from Excel. The underlying architecture is based on PHP. Has anyone bothered to check whether the RNG algorithm in Excel matches the RNGs in PHP? In PHP there are at least 3 functions that can generate pseudo random numbers. But you're using Excel's as a comparison as if it were the best RNG available... I'm no specialist about Excel or RNGs but the main purpose of Excel is not to generate random numbers so I wouldn't be surprised that it is an average RNG... EDIT : I've checked, Excel uses the Mersen Twister (MT) algorithm to generate pseudo random numbers, which is an average RNG, but not suited for cryptographic operations... Actually, PHP also offers an implementation of the MT algorithm, but then again, is it the one BGA uses ?

Also, yes, there I've said it, pseudo random. But hey, that's the max computers and humans can do. So yes they are inherently flawed. The question is, as Jelby mentioned, whether this inherent flaw has a significant impact on the games played in BGA... To know that, you need to know which of the 3 methods PHP makes available to generate pseudo random numbers. One simple way to do it it to compare the outputs of those 3 methods with the output of BGA... With all the strictness necessary... Once that is done, to see if it interferes with fairness on games you should compare with real dice throws (which is supposed to give you random numbers)...

As I said I'm no specialist, but I've always been told that real dice throws aren't exactly random... My guess is that computer RNGs are better than the real deal, but hey... I may be mistaken...

But if you want to prove that BGA's RNG isn't fair to players, then the least you can do is simulate the dice throws with the same algorithm. Otherwise, if you're using Excel, you're just checking whether Excel's RNG is fair or not... Not BGA's.
Only Romain did a simulation on a problem that is rather hard to calculate. In this case it is good that he did use a different RNG than the one that BGA uses because he wanted to get a feeling what is "to be expected". That is no rocket science and every average RNG will give you an estimate of probabilities that are difficult to calculate by hand. Since he uses different RNG than BGA uses for his Excel simulation one can argue that the BGA RNG is fine since it shows the same behaviour that is expected by RNGs. Note that nobody needs RNGs suited for cryptographic operations here...

Lets come back to the example of how many number counts should lie outside their 95%-confidence intervals within 20 games. If all these 220 number counts were independent one would expect 11 "outliers", i.e. on average one in approx. every 2 games. But since these number counts are not independent it is hugely difficult to compute how many outliers one has to expect in reality. It could be still 11, but perhaps it is 15 in reality?

Lets look at one example. Remember that for a 100-roll game we have to expect
9s: between 6 and 17 rolls per game (with 94.5% confidence)
8s: between 8 and 21 rolls per game (with 95.7% confidence)
But we know that on average every second game one of the 11 possible numbers will be rolled outside its 95% interval.
Lets take the example of 21 9s in 100 rolls, which happened to muntzer. This is outside the 95%-interval but something that will happen every second game. But then there are only 79 rolls left for the other numbers. The probability that 8 will occur between 8 and 21 rolls in this same game is now only 87.3%.

Summary: As soon as the unlikely (but not so rare) event of 21 9s happens in a game the chance that 8s will be outside of its 8-21 rolls per game range drastically surges from 4.3% to 12.7%. The same is true for all the other numbers as well. Thus, the 1-in-2-games event of too many 9s will trigger with high certainty other "unlikely" events. Thus these unlikely (but not so rare) events will cluster and will happen more frequently than expected.

There is no way anybody can calculate these probabilities that are dependent of each other by hand for the general case. It is only safe to say that 11 "outliers" within 20 games is a lower boundary of the expected number of outliers. An Excel-simulation might give a very precise solution to this problem though.
muntzer
Posts: 22
Joined: 13 September 2021, 18:12

Re: Board Game Arena, Catan, virtual dice and randomness - a short trial

Post by muntzer »

SwHawk wrote: 01 November 2022, 21:28 And yet, you miss the point entirely: The real concern isn't whether the RNG is flawed, because we know it is up to a certain standard... The question is, whether it is actually more or less flawed than real dice throws, for a real game of Catan. If the BGA implementation is actually fairer than a real game of Catan, is it a problem or not ?
So - why did you criticise me for using Excel?
Otherwise, if you're using Excel, you're just checking whether Excel's RNG is fair or not... Not BGA's.
Post Reply

Return to “Catan”