I had an interesting discussion with a group of students this afternoon which highlighted how important it is to understand conditional probability and how code structure can produce unexpected results.

The students’ code contained three lists and a random number generator. The random number generator was meant to randomly chose which list to take something from with equal probability (that is 1/3 chance of selecting from each of the lists). However, somehow one of their lists was being selected far more than the others.

Their code looked similar to the following:

`if (random.nextInt(3) == 0) {`

// select from list 1

} else if (random.nextInt(3) == 1) {

// select from list 2

} else {

// select from list 3

}

Can you spot the bug? Why would this not generate equal probabilities of selecting from each list?

I wrote some of my own code to test this code structure almost 100 million times:

`int count0 = 0;`

int count1 = 0;

int count2 = 0;

```
```

`// loop almost 100 million times`

for (int i = 0; i < 99999999; i++) {

if (r.nextInt(3) == 0) {

count0++;

} else if (r.nextInt(3) == 1) {

count1++;

} else {

count2++;

}

}

Which generated the results:

`Count0 = 33333271`

Count1 = 22221788

Count2 = 44444940

Which demonstrates this code doesn’t generate equal probabilities.

The problem is the second random number generation:

`if (r.nextInt(3) == 0) {`

count0++; // this has 1/3 probability (0.333333333333333)

**} else if (r.nextInt(3) == 1) { // this else will run 2/3 of the time**

// and resolve true 1/3 of the times it runs

count1++; // 1/3 * 2/3 this line will run... i.e. 2/9 probability (0.22222222222)

} else {

count2++; // 2/3 * 2/3 this line will run... i.e. 4/9 (0.4444444444444)

}

Restructuring the code to only generate the random number once and storing this in a temporary variable makes a huge difference:

**int rInt = r.nextInt(3); // only generate a random number once.**

if (rInt == 0) {

count0++; // this will run 1/3 of the time.

} else if (rInt == 1) {

count1++; // this will run 1/3 of the time.

} else {

count2++; // this will run 1/3 of the time.

}

And the results of this are:

`Count0 = 33333907`

Count1 = 33332586

Count2 = 33333506

Which looks much better.

(Note: alternatively you could make the second random number generation be out of 2 rather than 3 but this would be slower as the random number generator will take a few CPU cycles to calculate.)