I have permission from BoardGameArena to make use of an AI model that I trained specifically for the game of Boop known as AlphaBoop - https://boardgamearena.com/player?id=95583038
Naturally, a human must play behind the scenes to make the moves for it.
I named it AlphaBoop as inspired by AlphaGo - the legendary model that defeated the world's greatest Go player. But in actuality, AlphaBoop is modeled after AlphaZero - an algorithm that surpassed AlphaGo.
Unlike AlphaGo, AlphaZero is an algorithm that simply plays itself with zero exposure to data from humans and other computer play. The reason it's called AlphaZero is because it's an algorithm that starts from zero. It learned from a blank canvas.
Likewise, AlphaBoop has no exposure to human data or other AI agents. I trained it entirely on my laptop's NVIDIA GPU - taking about a few days to train.
Naturally I pit AlphaBoop against Ai Ai, an existing Ai bot for Boop, and AlphaBoop easily crushed it very early on in its training. Ai Ai got defeated in about 21 turns - which is rather quick for a Boop game.
To be clear, I am not an expert in AI. I'm just some nobody with mediocre programming skills who's a big fan of AlphaZero. In fact, this is the first reinforcement learning algorithm I have ever dealt with in my life, and I picked Boop to be my first game to tackle.
Kaepo, an actual AI researcher, had said in his post:
The beautiful thing about AlphaZero is that I can, in theory, apply it to all sorts of abstract-strategy board games found on BoardGameArena.
How does AlphaBoop work?
AlphaBoop uses a deep neural network, using convolutional layers and residual blocks. If that means nothing to you, then let me put it this way.
Imagine a black box. As an input, we shove in a representation of the board into this magical black box. In our case, Boop is a 6x6 board, so imagine a 6x6 matrix like so where 0 = empty, 1 = grey kitten, -1 = orange kitten, 2 = grey cat, -2 = orange cat:
[[ 1, 0, 1, 0, 0, 0],
[ 0, 0, -2, 0, 0, 2],
[ 0, 0, -2, 0, 0, 0],
[ 0, 0, 1, 0, 0, 0],
[ 1, 0, 1, 0,-1, 0],
[ 0, 0, 0, 0, 0, 0]])
However, it's not that simple. If all I give you is a 6x6 board, can you tell me the number of kittens and cats that each player has that's off the board? If the board has 2 grey kittens and 2 orange kittens, you have no idea how many cats each player has in their pool by only looking at the board itself.
Therefore, the input for this black box isn't simply a 6x6 board. I stacked a bunch of 2D arrays on top of it - most of them are representing the number of kittens and cats remaining for each player. As far as I was concerned, if two boards are identical but the number of pieces in the players' pool are different, then I consider those as two completely different board states.
So you feed this magical black box an input of the board state, and this magical black box will spit out two things:
Policy - a probability distribution on what a good move is. You can imagine that there are 72 possible actions in Boop. The first 36 actions represents the 36 spots on the board where you can play a kitten, and the next 36 actions represents the 36 spots on the board where you can play a cat. With the way I designed it, those actions could also represent taking a kitten off the board to promote it when you have 8 felines on the board. Or it could represent choosing which three-in-a-row you want to take off when you have multiple of them.
Value - which player is winning this game? So for example, this output could look like: [0.00254151, 0.9974585] which you could think of it as the first player having a ~0.3% chance of winning and second player has a 99.7% chance of winning.
Fun Fact: At the very start, AlphaBoop thinks the first player has a 60% of winning the game. I also trained an AlphaZero algorithm on the young player's variant as well, and it thought that the first player has a 90% of winning the game.
This neural network, as a standalone, could already be a formidable foe, but that's not all. The true terror of AlphaZero is the synergy between two algorithms: Deep Neural Network + MCTS (Monte Carlo Tree Search).
The neural network, having been trained on countless games of Boop through self-play, is simply an expert guide for a tree search, telling it what the good moves are.
Well, at any rate, if you guys want to reenact the historical event of AlphaGo versus Lee Sedol (AI vs Humans) but with Boop, you can go challenge AlphaBoop.
Naturally, a human must play behind the scenes to make the moves for it.
I named it AlphaBoop as inspired by AlphaGo - the legendary model that defeated the world's greatest Go player. But in actuality, AlphaBoop is modeled after AlphaZero - an algorithm that surpassed AlphaGo.
Unlike AlphaGo, AlphaZero is an algorithm that simply plays itself with zero exposure to data from humans and other computer play. The reason it's called AlphaZero is because it's an algorithm that starts from zero. It learned from a blank canvas.
Likewise, AlphaBoop has no exposure to human data or other AI agents. I trained it entirely on my laptop's NVIDIA GPU - taking about a few days to train.
Naturally I pit AlphaBoop against Ai Ai, an existing Ai bot for Boop, and AlphaBoop easily crushed it very early on in its training. Ai Ai got defeated in about 21 turns - which is rather quick for a Boop game.
To be clear, I am not an expert in AI. I'm just some nobody with mediocre programming skills who's a big fan of AlphaZero. In fact, this is the first reinforcement learning algorithm I have ever dealt with in my life, and I picked Boop to be my first game to tackle.
Kaepo, an actual AI researcher, had said in his post:
I wonder if AlphaBoop, which uses Deep Reinforcement Learning, has mastered Boop. Unlike Kaepo's research, my goal was very much to develop an AI algorithm to be as strong as possible in the game of Boop. I'm too dumb to create novel algorithms.I know that Deep Reinforcement Learning would easily master the game completely.
The beautiful thing about AlphaZero is that I can, in theory, apply it to all sorts of abstract-strategy board games found on BoardGameArena.
How does AlphaBoop work?
AlphaBoop uses a deep neural network, using convolutional layers and residual blocks. If that means nothing to you, then let me put it this way.
Imagine a black box. As an input, we shove in a representation of the board into this magical black box. In our case, Boop is a 6x6 board, so imagine a 6x6 matrix like so where 0 = empty, 1 = grey kitten, -1 = orange kitten, 2 = grey cat, -2 = orange cat:
[[ 1, 0, 1, 0, 0, 0],
[ 0, 0, -2, 0, 0, 2],
[ 0, 0, -2, 0, 0, 0],
[ 0, 0, 1, 0, 0, 0],
[ 1, 0, 1, 0,-1, 0],
[ 0, 0, 0, 0, 0, 0]])
However, it's not that simple. If all I give you is a 6x6 board, can you tell me the number of kittens and cats that each player has that's off the board? If the board has 2 grey kittens and 2 orange kittens, you have no idea how many cats each player has in their pool by only looking at the board itself.
Therefore, the input for this black box isn't simply a 6x6 board. I stacked a bunch of 2D arrays on top of it - most of them are representing the number of kittens and cats remaining for each player. As far as I was concerned, if two boards are identical but the number of pieces in the players' pool are different, then I consider those as two completely different board states.
So you feed this magical black box an input of the board state, and this magical black box will spit out two things:
Policy - a probability distribution on what a good move is. You can imagine that there are 72 possible actions in Boop. The first 36 actions represents the 36 spots on the board where you can play a kitten, and the next 36 actions represents the 36 spots on the board where you can play a cat. With the way I designed it, those actions could also represent taking a kitten off the board to promote it when you have 8 felines on the board. Or it could represent choosing which three-in-a-row you want to take off when you have multiple of them.
Value - which player is winning this game? So for example, this output could look like: [0.00254151, 0.9974585] which you could think of it as the first player having a ~0.3% chance of winning and second player has a 99.7% chance of winning.
Fun Fact: At the very start, AlphaBoop thinks the first player has a 60% of winning the game. I also trained an AlphaZero algorithm on the young player's variant as well, and it thought that the first player has a 90% of winning the game.
This neural network, as a standalone, could already be a formidable foe, but that's not all. The true terror of AlphaZero is the synergy between two algorithms: Deep Neural Network + MCTS (Monte Carlo Tree Search).
The neural network, having been trained on countless games of Boop through self-play, is simply an expert guide for a tree search, telling it what the good moves are.
Well, at any rate, if you guys want to reenact the historical event of AlphaGo versus Lee Sedol (AI vs Humans) but with Boop, you can go challenge AlphaBoop.