Algorithms In Context #7: Decision Trees & Alpha-Beta Pruning

How your AI opponent made decisions before machine learning

Can Bayar
Towards Data Science

--

Photo by Pietro Jeng on Unsplash

In 1997, IBM’s supercomputer Deep Blue defeated the world champion Garry Kasparov in a chess game. An iconic moment in the AI history, indeed! In this section, we will see that writing artificial intelligence algorithms are not as nearly interesting as it is in the Hollywood movies.

Decision Trees

Think about any two-player board game: chess, backgammon, whatever you prefer... You and your rival take turns and each of you trying to reach that definitive state that you crush your opponent. At any point, you have a limited set of moves and at each step, your aim is to make the best possible move. But this is how the humans play the game, how about a machine?

They use something called a decision tree. We start with a root node, which symbolizes the current state of the game. Every possible move we can make at that point will be the children of that node. Then for each child, there is a new set of possible moves for the opponent. The tree branches out until it covers every possible state in the game and the game ends when it reaches a leaf node.

The very first artificial intelligence algorithms were based on making a brute-force search on the decision trees. The search algorithm tries to reach any leaf node that makes the machine win and makes decisions so that it can reach one of these winning nodes. We shall now see one of these algorithms in action.

Minimax Algorithm

Abstract yourself from the game now and assume we have assigned a score to every possible outcome of the game. The scores are assigned to the leaf nodes of the tree. A positive score indicates that the machine wins and a negative score indicates that you win. So, the goal of AI is to maximize the score and yours is to minimize. Green arrows indicate the turn of the AI (maximizer) and red arrows indicate yours (minimizer) in the tree.

The minimax algorithm is very simple and it is a modified version of depth-first search. The AI (green) will always choose the move with the maximum possible outcome, assuming that its opponent (red) will play optimally and choose the minimum possible outcome all the time. It is intelligent to assume that your opponent plays optimally so that you will be prepared for the worst. Now take a while and follow the tree from the bottom to the top to see the moves each opponent made.

You can see that this algorithm pretty much searches all possible scenarios by brute force. If we assume that b is the branching factor and d is the depth of the decision tree, the algorithm works in O(bᵈ) which exponential.

If you are implementing a Tic-Tac-Toe game, this may not be that bad. After all, there are 9 possible moves on the first turn, 8 in the next, 7, and so on, which will make up to 9! scenarios in total. However, if you were making a chess game, the number of possibilities will grow up by an insane amount! It would take millions of years for any computer to calculate all possibilities.

Do I need to say there must be a better way?

Alpha-Beta Pruning

Alpha-beta pruning is the strategy of eliminating the branches that will not be contributing to the solution. I will explain this with an example. The red lines in the tree below mark the current state of our search. The maximizer (AI) has chosen 9 and 5, which are the maximum reachable values on the corresponding subtrees. At this point, the minimizer currently holds the value 5, which is the smaller one among 9 and 5.

There is still one branch left to search and the first value the depth-first search sees is the value 6. Now we know that whatever maximizer chooses will be at least 6. But we also know that minimizer has chosen 5, which is already smaller than 6. At this point, we no longer need to check the remaining children (1 and 7) because we know for sure that the minimizer will select 5.

The reverse could also be true: If the maximizer has already chosen a value bigger than the value minimizer chose, we wouldn’t need to search the rest of the subtrees. In the tree below, the maximizer has already chosen 5 at the root node. Since -2 is smaller than 5 and anything minimizer chooses will be at most -2, we no longer need to search the rest of the subtrees.

So go ahead, apply this strategy to your entire search and prune the c**p out of the decision tree. There is one last subtree you can prune out on the very right side, which I’m leaving you to check why we can skip it. The final result of the alpha-beta pruning algorithm shall be this:

We pruned the tree quite a bit. Alpha-beta pruning can provide performance optimization up to the square root of the performance of the original minimax algorithm. It may also provide no performance improvement at all, depending on how unlucky you are.

Depth-Limited Search

Even though alpha-beta pruning provides a great amount of performance improvement, searching the entire set of possible scenarios still can be overkill. We can employ intelligent strategies to avoid searching the entire tree and still get very good results.

One such strategy is depth-limited search and it is exactly what it sounds like. Instead of searching the entire tree, you search it to a limited depth that you predefined. For example, you can search the next 5 moves in chess. But in order to do that, you need a deterministic way to score the current state of the game, since you no longer know who wins the game when you reach the end of your search. In order to do that, we will use an evaluation function.

Note: There are other alternatives to the depth-limited search, like iterative deepening but I preferred not to include them in order to keep the story short.

How deep is your search? Get it? That was a joke…

Evaluation Functions

So, you decided to limit your search to the next 5 moves you and your opponent will make in the game. Then you realized that most of the time the game still continues after 5 moves and you are now stuck in this intermediary step. How do you feed the numbers into the minimax algorithm? What you need is an evaluation function.

An evaluation function is a way to deterministically score the current state of the game. If you are playing chess, for example, an evaluation function can be the numeric difference between the number of chess pieces you and your opponent have. The bigger the difference is, the better chance you have.

A better evaluation function can use a weighted calculation in which each piece has a weight depending on how important it is. This will probably produce better results than simply counting them. And an even better one may use the locations on the chessboard in addition to their weights.

How you define your evaluation function is entirely depends on you. Beware this will be the most critical part of your algorithm, though. How good your evaluation function determines the state of the game will greatly affect the success of your algorithm. There are two other things you should be careful when you are writing an evaluation function:

  1. The function should be deterministic: Given the same state, it should always produce the same result.
  2. The function should work fast: You will be making a lot of calls to your evaluation function. If it works slow, then your AI will respond slow.

Here is a pseudocode sort of thing for the depth-limited minimax algorithm:

int minimax(Node* current, int depth, bool isMaximizer) {    if (depth == DEPTH_LIMIT || current.isLeaf()) {
return evaluate(current);
}
if (isMaximizer) { int val = INT_MIN;
for (Node* child : current.getChildren()) {
val = max(val, minimax(child, depth + 1, false));
}
return val; } else { int val = INT_MAX;
for (Node* child : current.getChildren()) {
val = min(val, minimax(child, depth + 1, true));
}
return val;
}
}

Adding alpha-beta pruning, we get something like this:

int minimax(..., int alpha, int beta) {    ...    if (isMaximizer) {        int val = INT_MIN;
for (Node* child : current.getChildren()) {
val = max(val, minimax(..., alpha, beta));
alpha = max(alpha, val);
if (alpha >= beta) {
break;
}
}
return val; } else { int val = INT_MAX;
for (Node* child : current.getChildren()) {
val = min(val, minimax(..., alpha, beta));
beta = min(beta, val);
if (beta <= alpha) {
break;
}
}
return val;
}
}

There is one more improvement I would like to suggest. In your search, it is highly likely for the algorithm to come over the same states over and over again. So instead of running the evaluation function multiple times, you can use a lookup table to store the results. Hell, you can even precalculate all possible results. You can read my previous story on Dynamic Programming if you want to learn about this strategy.

You know that cool, good-looking scientists in the Hollywood movies walking through a long hall while giving a charismatic speech on how they made a scientific breakthrough and can explain it to a police chief in a few sentences? Yeah, that’s not real. In reality, there are misshapen body-figures with lifeless eyes having communication problems among each other. Except me, of course…

--

--