Preface
Introduction
The Analysis Function
The Game Tree
The MiniMax Algorithm
The NegaMax Algorithm
Alpha-Beta Pruning and Branching
The Horizon Effect
References
Preface
A number of people have asked me about the algorithms I used in my chess and reversi programs.
What you see here has been put together in an attempt to answer these questions. Please
contact me if you still have questions
after reading this!
Please remember that the methods presented here serve only as an introduction, and are not the only methods. On the contrary, the alpha-beta pruning described here can be improved, and a number of other algorithms also exist.
Introduction
I will be assuming that the game is a board game, and that the two players are blue and red, with blue moving first.
The general approach to writing a program to play such a game is to look ahead a few moves,
and analyze different possibilities. Thus we have two major concerns;
devising an analysis function, and searching through different positions.
The Analysis Function
The analysis function is a heuristic; in actuality it is very difficult to find a precise value.
However, that doesn't stop us from trying! A good analysis function takes into effect many different aspects
of the board position, such as the number of pieces, control of the board, control of certain key squares on the board,
and also mobility. Mobility indicates how freely each player can move his/her pieces, and how many moves she/he has.
Much of the work in writing a game program involves writing a good analysis function.
Usually, after you implement a good search procedure, the best way to improve your program's gameplay
is to improve your analysis function! This stage is so important that often complicated methods,
like genetic algorithms, are used to tune the analysis function.
However, I recommend that you not spend too much time on analysis until you write the actual search procedure.
The Game Tree
Therefore, we need a way to "look forward" a few moves. This is accomplished by searching in
a game tree, in which the board positions are organized.
In a game tree,
every node corresponds to a board position. The children of each node N are the
different positions that result from a single move in N, made by the player whose turn
it is at N.
The root of the tree is the starting position, or the position from which we want to find a move.
The leaves (nodes which have no children) in the game tree correspond to terminating positions;
positions in which the game has ended.
A portion of the game
tree for "Tic-tac-toe" is shown in Figure 1. Note that each level in the tree pertains to the moves
made by one particular player.
The MiniMax Algorithm
Each node that is not a leaf corresponds to a decision made by the player whose turn it is at the
node's level. The player making a move at the node must decide which of that node's children it
should choose as the next board position. It is clear that if the player is blue, he should choose
the child with the largest value. Likewise, red should choose the position which minimizes her node's
value. In this manner, we can determine the value for each node, and in doing so we find the
best move possible from each node. Figure 2 illustrates this method.
For a small game like Tic-tac-toe, we can build and search the entire tree, all the way down to the
actual leaves. However, this is clearly impossible for a more complicated game
like reversi or chess. Here is where we finally use the analysis function. Since we can't
search the entire tree, we only consider the first few levels. In this manner we pretend that the
nodes below a certain level are leaves, and determine their value using the analysis function.
We can implement the procedure recursively, using two functions, one for red and one for blue.
Where MaxDepth is a constant defining the maximum depth.
Each of these functions is initially called with depth=0.
The value returned is the value of the node corresponding to board b. Now, in order to find the best
move from any position, we just need to choose the child with the best (minimum or maximum
depending on color) value.
The NegaMax Algorithm
One way to implement negamax is shown below.
Note also that we multiply the analysis value of leaves by sign[color] in line 5.
We need to be careful not to leave out things like that, or make mistakes about them.
A common mistake in negamax is to get the positive and negative mixed up. It is so easy to make
a mistake and end up with a program that always finds the worst possible move for each player. Even
worse, you might end up with a program that always finds the best move for one color, but the worse
move for the other color.
Alpha-Beta Pruning and Branching
Thus we need a way to speed up our search procedure. How? Consider how a human would play a game.
She would say, "OK, if I move my queen there, it would be captured immediately by my opponent's
knight. So I won't move my queen there." But what would our program do in the same situation?
It would consider moving
the queen to where it could easily be captured, and then waste time search through thousands of
nodes which result from moving pawns on the other side of the board! Eventually it would find
the move in which the knight captures the queen, but it didn't need to take so long.
Alpha-Beta pruning uses the above ideas to drastically reduce the number of nodes
we search. Pruning is the elimination of nodes found to be unnecessary to process while
searching.
Consider the tree in Figure 4. Note that for simplicity, this tree has not been
"negamaxed".
Let's follow our algorithm, assuming that nodes are processed from left to right. While trying to
find the value of A, we go down to B and determine its value to be 6, the minimum of 6 and 9.
Then control is passed to C.
From there we compute E to be 4, the maximum of 4 and -2. Since C is a minimizer, we now know
that the value of C is less than or equal to 4. Therefore, C will not be chosen by A as the maximum,
since A already has the child B with value 6, which will be greater than any value C may later have.
Hence, searching F and G is useless, and we can immediately stop processing (prune) C. Control then passes
to D.
The same thing can happen if A is a minimizer. In general, we can define lower and upper bounds
for the value of a node, called alpha and beta, respectively.
One of these bounds (depending on the player's color) does not change within the node, and if the
node's value crosses this bound, we can prune the node. The other bound, however, changes
as the node's value is updated.
This can be implemented using minimax as below.
Note that we can write alpha>=beta instead of alpha>beta, which
slightly increases our program's speed. These functions are intially called with depth=0,
alpha=-∞, and
beta=∞.
The negamax version can be written as below.
Alpha-Beta pruning, used properly, will usually allow us to search at least 2-3 extra levels.
However, pruning alone is not enough! Let's look back for a moment at Figure 4. Why was C pruned?
C was pruned because B, which was searched before C, had a value better than any value C could
possibly have. We knew that C will not be chosen, because we'd already found a better move in A.
Thus, in order to maximize the number of nodes that get pruned, we try to check the moves that
"look better" first. That is, we process the children of a node in the order of how good we expect
the move to be. This is called branching.
Here, we use another heuristic to try to compare moves. Some of the things we could
use are:
The best way to find which method to use is to experiment! See which works best for your
program. After we find the move values, we can sort them in decreasing order, and process
them in that order. However, a heap data structure may prove to be more efficient than sorting,
because when pruning takes place, we leave the function and thus don't need to have the rest
of the moves sorted.
Another method sometimes used to increase speed is to divide the number returned by the
analysis function by a constant value. (For example divide it by 10.) This reduces the range
of values that the nodes in the game tree can have, and can therefore result in more pruning.
However, this makes the analysis less accurate, so it should be used with care.
The Horizon Effect
The fact is, the problem is still there, though it is less drastic, because we often have time
to fix things before they get too bad.
The problem is called the horizon effect, and happens when we
use a constant maximum depth. In this situation, there is always a "horizon" over which we cannot
see. At every move, the horizon moves back a bit, revealing something that we couldn't see before.
How can we prevent being surprised by something bad suddenly appearing after the horizon moves back?
Usually such events occur when there is a node at our bottom level (at the horizon), in which
a strong move exists, but is not considered since we are at our maximum depth. In order to avoid
this situation, we allow our program to go beyond the maximum depth under certain circumstances, when
the position is "hot". For example, good chess programs usually process capturing do a great
depth. That is, we can extend the maximum depth for a node in which the capture of a piece other
than a pawn exists, or if the king is in check.
The games we will be talking about are two-player, zero sum games.
Zero sum games are games such as chess, checkers, and reversi;
in which the total amount of "payoff" is constant. In other words,
any game situation is as good for player A as it is bad for player B, and vice versa.
It also generally helps if the game is finite!
A finite game is a game in which every possible combination of moves leads to an end of the game.
That is, it is impossible for the game to go on forever!
The analysis function A(p) assigns a number to the board position p.
Higher values of A(p) indicate that the position
favors blue, while lower values mean that red is winning. Hence, for a position p in which
blue has won, A(p) is ∞, while A(p) is
-∞ if red has won.
A draw position can be given the value 0.
The first algorithm that comes to mind is to always make the move which yields the highest analysis
value. However, this will not work very well in practice, because as mentioned before, the analysis
is only a guess as to how good the position is. Consider for example, an analysis function for the
game of chess which uses only the material value (sum of piece values) for each player. A "greedy"
algorithm which always chooses the move that maximizes the analysis might easily capture a pawn with
its queen, ignoring the fact that the queen will be captured in the next move!
The minimax algorithm is used to find accurate values for the board positions. We assume
that each player always plays his/her best move in any given position. With this assumption, two observations
lead to the minimax algorithm: We have accurate analyses for leaves, and the value of a node can be
determined accurately from its chilren's values. The value for leaves is easily determined, because
in these positions the game has ended.
int BlueValue(Board b, int depth) {
if ((GameOver(b) or depth>MaxDepth)
return Analysis(b)
int max = -infinity
for each legal move m in board b {
copy b to c
make move m in board c
int x = RedValue(c, depth+1)
if (x>max) max = x
}
return max
}
int RedValue(Board b, int depth) {
if ((GameOver(b) or depth>MaxDepth)
return Analysis(b)
int min = infinity
for each legal move m in board b {
copy b to c
make move m in board c
int x = BlueValue(c, depth+1)
if (x<min) min = x
}
return min
}
NegaMax isn't really a different algorithm; it's just a more elegant implementation. The problem
with the above algorithm is that we have two different functions that are essentially doing the
exact same thing. Negamax merges these two into one, by always considering the children of the
node N, from N's point of view. That is, we multiply values in alternate rows by -1, so that
both players become maximizers. This concept is illustrated in Figure 3, which essentially depicts
the same tree as Figure 2. The resulting values are as they would be if we were to evaluate
node values using the minimax algorithm above, and then multiply blue rows by -1. Note that it is
not necessary to do this for the root of the tree.
1 const int sign[2]={1,-1} //0 is blue, 1 is red
2
3 int NegaMax(Board b, int depth, int color) {
4 if (GameOver(b) or depth>MaxDepth)
5 return sign[color]*Analysis(b)
6 int max = -infinity
7 for each legal move m in board b {
8 copy b to c
9 make move m in board c
10 int x= - NegaMax(c, depth+1, 1-color) //Note the "-" before "NegaMax"
11 if (x>max) max = x
12 }
13 return max
14 }
The function NegaMax returns the value of the node b, as seen by the player indicated by
color.
That is, the better the position is for color, the greater the value that it returns.
Note that this is exactly the opposite of what we see in Figure 3. (In Figure 3, nodes were
considered from their parent's point of view.) However, this is fixed in line 10, where the value of the child
is reversed (multiplied by -1). In this manner, node values are processed as in Figure 3, with the exception of the root.
Reversing the root, as mentioned before, is not important. In fact, it can be better if we don't reverse it. This way,
the true value (the value it would have had in minimax) of a node b with color c, is returned by NegaMax(b,0,c).
So now we have a search algorithm. How fast do you think it will be? How many levels can you
search in a reasonable time? Considering the fact that most games can easily have more than 10-20
possible moves from every position, the number of nodes we have to search is (dramatically) exponential
in the number of levels. In a game like chess, where we can sometimes have over 50 or even 100
possible moves in a position, this problem is even more drastic. Using what we've covered so
far, we can only search through 3-4 levels within a reasonable time.
int BlueValue(Board b, int depth, int alpha, int beta) {
if ((GameOver(b) or depth>MaxDepth)
return Analysis(b)
int max = -infinity
for each legal move m in board b {
copy b to c
make move m in board c
int x = RedValue(c, depth+1, alpha, beta)
if (x>max) max = x
if (x>alpha) alpha = x
if (alpha>=beta) return alpha
}
return max
}
int RedValue(Board b, int depth, int alpha, int beta) {
if ((GameOver(b) or depth>MaxDepth)
return Analysis(b)
int min = infinity
for each legal move m in board b {
copy b to c
make move m in board c
int x = BlueValue(c, depth+1, alpha, beta)
if (x<min) min = x
if (x<beta) beta = x
if (alpha>=beta) return beta
}
return min
}
const int sign[2]={1,-1} //0 is blue, 1 is red
int NegaMax(Board b, int depth, int color, int alpha, int beta) {
if ((GameOver(b) or depth>MaxDepth)
return sign[color]*Analysis(b)
int max = -infinity
for each legal move m in board b {
copy b to c
make move m in board c
int x= - NegaMax(c, depth+1, 1-color, -beta, -alpha)
if (x>max) max = x
if (x>alpha) alpha = x
if (alpha>=beta) return alpha
}
return max
}
Note that alpha and beta are replaced
with -beta and -alpha, respectively, when NegaMax is called
within itself.
Remember why we decided to use "look-ahead" with game trees in the first place? The problem
was that we couldn't detect dangerous situations, arising from us doing something bad that
the analysis function couldn't immediately detect. So why shouldn't the same problem still exist
when we're looking several moves ahead? We're still using the analysis function to guess the value
of most nodes.