Four Years Remaining

Self-learning Mini-checkers Machine

Posted by Konstantin 03.09.2012 2 Comments
For many people, the ability for learning and adaptation seems like something unique, extremely complicated and mysterious. Indeed, those are the abilities we almost exclusively associate with high levels of intelligence and knowledge. This is, however, an illusion. Although adaptive behaviour might indeed look complex, it is not necessarily driven by "intelligent" mechanisms. One of the best illustrations of this is a fully-fledged self-learning machine made from plain matchboxes.

A Tic-Tac-Toe machine by James Bridle

The idea for such a machine was first introduced in 1960 by Donald Michie, who devised a simple self-learning algorithm for Tic-Tac-Toe (reminiscent of what is now known to be Reinforcement Learning). Due to lack of appropriate computing power, he implemented it "in hardware" using 300 or so matchboxes.

The idea of the machine is simple. There is a matchbox corresponding to each game position, where the "computer" has to make a move. The matchbox contains colored beads, each color corresponding to a particular move. The decision is made by picking a random bead from the matchbox. Initially (when the machine is "untrained"), there is an equal number of beads of each color, and the machine thus makes equiprobably random turns. After each game, however, the machine is "punished" by removing beads, corresponding to losing turns, or "rewarded" by adding beads, corresponding to winning turns. Thus, after several games, the machine will adapt its strategy towards a winning one.

The idea was popularized by Martin Gardner in one of his Scientific American articles (later published in the book "The Unexpected Hanging and Other Mathematical Diversions"). Gardner invented a simple game of "Hexapawn", and derived a matchbox machine for it, which only required as little as 19 matchboxes. He also suggests in his article, however, to create a matchbox machine for "Mini-checkers" - checkers played on a 4x4 board. Ever since I saw this article some 20 or so years ago I was thinking of making one. This summer, while teaching a machine learning course in a summer school in Kiev, I actually made one. I could use it to both fulfil my ages-old desire as well as a teaching aid. You can make one too now, if you are interested.

The Mini-checkers Machine

The rules of mini-checkers are exactly like those of usual checkers, with three modifications:
- The game is played on a 4x4 field. White is the first one to move. Machine plays for black.
- Whenever both players get a King, the game immediately ends in a draw.
- The King must always move to the furthest possible position in the chosen direction.
To make the machine, you first have to buy and empty 24 matchboxes. Next, print out and stick the 24 game positions onto the boxes. Draw on each box all the possible black's moves as arrows using colored markers. Finally, for each colored arrow, add 2 beads of the same color into the matchbox. That's it, your machine is ready to play.

The Mini-checkers machine

The game proceeds as already described: whenever the machine (the black player) has to make a decision (i.e. whenever it has to make a move and there is more than one possibility), find the matchbox with the current position depicted on it, shake it, and pick a random bead. This will tell you the decision of the machine. If the corresponding matchbox is empty, the machine forfeits. You should keep the matchboxes, corresponding to the moves that were made, open until the end of the game.

Once the game is over, the machine is "taught":
- If the machine won, do nothing.
- If the game was a draw, remove the bead corresponding to the machine's last move from the matchbox, unless it was the last bead of that color in the box.
- If the machine lost, remove all the beads, corresponding to the machine's last move, from the last matchbox.
It takes about 30 games or so for the machine to actually learn to play well enough. Of course, a human would understand the strategy much earlier, but it's fun none the less.

Playing with the machine will immediately lead you towards two important questions:
- How efficient is the suggested learning procedure? Can it be improved and generalized?
- How do you make a matchbox machine for a more complex game without having to manage thousands of matchboxes.
As far as I know, contemporary machine learning has only partial answers to both of them.
Tags: Fun, Machine learning, Procrastination, Reinforcement learning, Teaching