Algorithms In Context #8: Shuffling

How random number generation affects your algorithm

Can Bayar
6 min readFeb 26, 2022
Photo by Jarosław Kwoczała on Unsplash

Shuffling itself is not a very complex subject. Randomness on the other hand is a whole another thing… Unlike the previous entries in the series, our real purpose is not to find the best solution but to understand how the underlying mechanism works this time around.

This one actually has a funny story. I have been working on a card game for a while now and every now and then, we receive some racist comments on how certain groups of people receive better hands. As the developer of the game, I know that we make absolutely no distinction between any two players at any point during the game. Still, last night it got me thinking: Could there be a problem with the random number generation?

Most of you probably haven’t really thought about how do computers generate random numbers before. Luckily, I had some experience with random numbers in the past. Let me walk you through my research on shuffling algorithms and random number generators.

Note that, unlike my previous posts I will use Java instead of C++ since the aforementioned game usesCollections.shuffle()method from the standard library.

Random Number Generation

Random number generation is a very hard problem in computer science. By random number generation, we mean generating truly random numbers, a sequence of statistically independent numbers. Since the algorithms are deterministic (at least for the computers we use today, there are different problem sets that are out of the topic at the moment), it is not possible to generate truly random numbers without the help of embedded devices.

The random number generators we use are actually called pseudorandom number generators (PRNG). The numbers they generate look random but they are actually predetermined and mathematically calculated. Each value in the sequence are determined by an initial seed and as long as you know the seed, you can calculate every number in the sequence.

Consequently, you should be careful when using random number generators. PRNGs usually should suffice for your daily development activities. However, if you are developing something mission-critical like a cryptography algorithm, you may want to consider using something more sophisticated.

If you are interested in the mathematical background of pseudorandom number generation, you can read about linear congruential method.

Testing Randomness

In this example, we are using a standard deck of 52 cards. The cards are numbered from 1 to 52 and stored in a list. At the start of each game, we will randomly shuffle the deck and give a constant number of cards to each player one by one from the top of the deck (which is unimportant for our purposes).

I have written a simple test code to check how uniformly the cards are distributed in the deck. We will shuffle the deck 10000 times and find the average value for each element of the shuffled deck. Since the cards are numbered from 1 to 52, we expect the average to be close to 26,5 for each element.

public static final int DECK_SIZE = 52;
public static final int TEST_COUNT = 10000;
public void shuffleTest() {

double[] totals = new double[DECK_SIZE];
for (int i = 0; i < TEST_COUNT; i++) {
List<Integer> cards = shuffleDeck();
for (int j = 0; j < cards.size(); j++) {
int value = cards.get(j);
totals[j] += value;
}
}

double min = DECK_SIZE;
double max = 0;
for (double total : totals) {
double average = total / TEST_COUNT;
if (average < min) {
min = average;
}

if (average > max) {
max = average;
}
}

System.out.println("Minimum: " + min);
System.out.println("Maximum: " + max);
System.out.println("Mean: " + (max - min));
}

Note that, an actual randomness test requires more quantitative measures, like how uniformly distributed the successive numbers are. For example, a set of cards may not be as random as a single card is. Or the least significant bits of the generated values might be less random than the most significant bits. I wanted to keep it as simple as possible so this one should suffice for our needs at the moment.

An example outcome forCollections.shuffle() is given below. Both the minimum and maximum average values are very close to 26,5 which is good. As the number of test cases increases, it shall get closer to 26,5 and vice versa.

Minimum: 26.243
Maximum: 26.8309
Mean: 0.5879000000000012

Test results for standard library method call seems satisfying enough. Let’s pause this conversation for a while and first try to develop our own shuffling algorithm. We will come back to here in a bit. Before moving on, think a little while on how would you design your own shuffling algorithm.

Naive Shuffle

The first algorithm most developers think of would be to choose two random elements and swap them up to a given number of times. This algorithm is very easy to implement.

public List<Integer> naiveShuffle() {

List<Integer> deck = new ArrayList<>(cards);
for (int i = 0; i < DECK_SIZE; i++) {
int index1 = random.nextInt(DECK_SIZE);
int index2 = random.nextInt(DECK_SIZE);
swap(index1, index2);
}

return deck;
}

It is all fun and games until we see the test results…

Minimum: 22.9138
Maximum: 29.8975
Mean: 6.9837000000000025

Why is this happening? Because in this approach different permutations have different probabilities. The mathematical explanation can be read here. By now you should be used to not having the best result on the first try anyway. Moving on…

Fisher-Yates Shuffle

This one also is a very simple algorithm, maybe even more simple than the naive approach. We will loop through the list from the last element to the first. Then we will pick a random index between 0 and the index of the current element in the loop, and we swap them.

public List<Integer> naiveShuffle() {   List<Integer> deck = new ArrayList<>(cards);
for (int i = DECK_SIZE - 1; i > 0; i--) {
int index = random.nextInt(i + 1);
swap(index, i);
}
return deck;
}

Let’s see the results once more:

Minimum: 26.1994
Maximum: 26.8658
Mean: 0.6663999999999994

Voila! The algorithm is simple, yet effective. If you inspect the codes Java for Collections.shuffle() method in the standard library, you would see that Java also uses the Fisher-Yates algorithm, anyway. Read its documentation and you will see it says

All permutations occur with approximately equal likelihood.

It also says

If it were a perfect source of randomly chosen bits, then the algorithm would choose permutations with perfect uniformity.

Unfortunately, it is not really easy to find a perfect source. Now we can pick it up from where we left off…

Everything up to this point seems quite alright, don’t you think? We have learned that we are unable to achieve true randomness, but we still got satisfactory results for algorithm. You may even wonder why I dedicated a section for such a simple algorithm, if you have read my previous posts. Here comes the pilot twist…

What you are unable to see is that you cannot generate all the possible permutations of the 52 cards using default PRNGs. Depending on the language you use, the internal state of PRNG generally uses 32 or 64 bits. Java uses 64 bits, meaning that it can only produce 2⁶⁴ different seeds and 2⁶⁴ seed means 2⁶⁴ permutations. A 52-card deck on the other hand has 52! ≈ 2²²⁵ permutations, a number much larger than 2⁶⁴ that the PRNG can provide. In fact, it is not possible for a generator using less than 225 bits to generate all possible permutations for our deck.

This means that there are significantly fewer possibilities for the next card the player will pick. Even worse than that, it can be possible to guess the entire deck if you know the seed or maybe even some of the cards in succession since PRNG generated values are calculated according to a mathematical formula in reality.

Depending on your application, using the default PRNG may or may not be acceptable. It is not always easy to make a custom solution so you may prefer to use the default implementation knowing its side effects. The lesson here is that without proper knowledge of how the underlying algorithm works, you would be completely unaware of the situation here and move on.

And that is all I wanted to say in this post. We did not actually come up with a solution this time, like I said at the beginning, our motivation was to understand what’s happening behind the curtain. By the way, I am not planning to do anything about my card game. More stuff coming very soon so keep up!

--

--

Can Bayar

Senior Software Engineer, Rock Vocalist & Guitarist