AlphaBoop - the strongest AI for Boop

Kiwigami · Post by **Kiwigami** » 14 February 2024, 15:59

I have permission from BoardGameArena to make use of an AI model that I trained specifically for the game of Boop known as AlphaBoop - https://boardgamearena.com/player?id=95583038

Naturally, a human must play behind the scenes to make the moves for it.

I named it AlphaBoop as inspired by AlphaGo - the legendary model that defeated the world's greatest Go player. But in actuality, AlphaBoop is modeled after AlphaZero - an algorithm that surpassed AlphaGo.

Unlike AlphaGo, AlphaZero is an algorithm that simply plays itself with zero exposure to data from humans and other computer play. The reason it's called AlphaZero is because it's an algorithm that starts from zero. It learned from a blank canvas.

Likewise, AlphaBoop has no exposure to human data or other AI agents. I trained it entirely on my laptop's NVIDIA GPU - taking about a few days to train.

Naturally I pit AlphaBoop against Ai Ai, an existing Ai bot for Boop, and AlphaBoop easily crushed it very early on in its training. Ai Ai got defeated in about 21 turns - which is rather quick for a Boop game.

To be clear, I am not an expert in AI. I'm just some nobody with mediocre programming skills who's a big fan of AlphaZero. In fact, this is the first reinforcement learning algorithm I have ever dealt with in my life, and I picked Boop to be my first game to tackle.

Kaepo, an actual AI researcher, had said in his post:

I know that Deep Reinforcement Learning would easily master the game completely.

I wonder if AlphaBoop, which uses Deep Reinforcement Learning, has mastered Boop. Unlike Kaepo's research, my goal was very much to develop an AI algorithm to be as strong as possible in the game of Boop. I'm too dumb to create novel algorithms.

The beautiful thing about AlphaZero is that I can, in theory, apply it to all sorts of abstract-strategy board games found on BoardGameArena.

How does AlphaBoop work?

AlphaBoop uses a deep neural network, using convolutional layers and residual blocks. If that means nothing to you, then let me put it this way.

Imagine a black box. As an input, we shove in a representation of the board into this magical black box. In our case, Boop is a 6x6 board, so imagine a 6x6 matrix like so where 0 = empty, 1 = grey kitten, -1 = orange kitten, 2 = grey cat, -2 = orange cat:
[[ 1, 0, 1, 0, 0, 0],
[ 0, 0, -2, 0, 0, 2],
[ 0, 0, -2, 0, 0, 0],
[ 0, 0, 1, 0, 0, 0],
[ 1, 0, 1, 0,-1, 0],
[ 0, 0, 0, 0, 0, 0]])

However, it's not that simple. If all I give you is a 6x6 board, can you tell me the number of kittens and cats that each player has that's off the board? If the board has 2 grey kittens and 2 orange kittens, you have no idea how many cats each player has in their pool by only looking at the board itself.

Therefore, the input for this black box isn't simply a 6x6 board. I stacked a bunch of 2D arrays on top of it - most of them are representing the number of kittens and cats remaining for each player. As far as I was concerned, if two boards are identical but the number of pieces in the players' pool are different, then I consider those as two completely different board states.

So you feed this magical black box an input of the board state, and this magical black box will spit out two things:
Policy - a probability distribution on what a good move is. You can imagine that there are 72 possible actions in Boop. The first 36 actions represents the 36 spots on the board where you can play a kitten, and the next 36 actions represents the 36 spots on the board where you can play a cat. With the way I designed it, those actions could also represent taking a kitten off the board to promote it when you have 8 felines on the board. Or it could represent choosing which three-in-a-row you want to take off when you have multiple of them.

Value - which player is winning this game? So for example, this output could look like: [0.00254151, 0.9974585] which you could think of it as the first player having a ~0.3% chance of winning and second player has a 99.7% chance of winning.

Fun Fact: At the very start, AlphaBoop thinks the first player has a 60% of winning the game. I also trained an AlphaZero algorithm on the young player's variant as well, and it thought that the first player has a 90% of winning the game.

This neural network, as a standalone, could already be a formidable foe, but that's not all. The true terror of AlphaZero is the synergy between two algorithms: Deep Neural Network + MCTS (Monte Carlo Tree Search).

The neural network, having been trained on countless games of Boop through self-play, is simply an expert guide for a tree search, telling it what the good moves are.

Well, at any rate, if you guys want to reenact the historical event of AlphaGo versus Lee Sedol (AI vs Humans) but with Boop, you can go challenge AlphaBoop.

Kaepo · Post by **Kaepo** » 14 February 2024, 16:18

Ha ha, nice.

I wonder how long it will take before someone made a DRL agent playing boop. Turns out, it didn't take long!

Kiwigami wrote: ↑14 February 2024, 15:59 Fun Fact: At the very start, AlphaBoop thinks the first player has a 60% of winning the game.

Yeah, that's funny. I also investigated (quickly) the win rate of a plain MCTS agent playing boop to evaluate if the game is well-balanced, and I don't have the same conclusion: the results tend to show that boop is fairly balanced. I describe this in a small section of the paper I submitted and I hope to be able to share with you all if it is accepted for publication.

Anyway, GG! I would like to try to play against your AI one of these days!

Ceaseless · Post by **Ceaseless** » 14 February 2024, 16:41

Kiwigami wrote: ↑14 February 2024, 15:59 I have permission from BoardGameArena to make use of an AI model that I trained specifically for the game of Boop known as AlphaBoop - https://boardgamearena.com/player?id=95583038

Naturally, a human must play behind the scenes to make the moves for it.

I named it AlphaBoop as inspired by AlphaGo - the legendary model that defeated the world's greatest Go player. But in actuality, AlphaBoop is modeled after AlphaZero - an algorithm that surpassed AlphaGo.

Unlike AlphaGo, AlphaZero is an algorithm that simply plays itself with zero exposure to data from humans and other computer play. The reason it's called AlphaZero is because it's an algorithm that starts from zero. It learned from a blank canvas.

Likewise, AlphaBoop has no exposure to human data or other AI agents. I trained it entirely on my laptop's NVIDIA GPU - taking about a few days to train.

Naturally I pit AlphaBoop against Ai Ai, an existing Ai bot for Boop, and AlphaBoop easily crushed it very early on in its training. Ai Ai got defeated in about 21 turns - which is rather quick for a Boop game.

To be clear, I am not an expert in AI. I'm just some nobody with mediocre programming skills who's a big fan of AlphaZero. In fact, this is the first reinforcement learning algorithm I have ever dealt with in my life, and I picked Boop to be my first game to tackle.

Kaepo, an actual AI researcher, had said in his post:
I know that Deep Reinforcement Learning would easily master the game completely.
I wonder if AlphaBoop, which uses Deep Reinforcement Learning, has mastered Boop. Unlike Kaepo's research, my goal was very much to develop an AI algorithm to be as strong as possible in the game of Boop. I'm too dumb to create novel algorithms.

The beautiful thing about AlphaZero is that I can, in theory, apply it to all sorts of abstract-strategy board games found on BoardGameArena.

How does AlphaBoop work?

AlphaBoop uses a deep neural network, using convolutional layers and residual blocks. If that means nothing to you, then let me put it this way.

Imagine a black box. As an input, we shove in a representation of the board into this magical black box. In our case, Boop is a 6x6 board, so imagine a 6x6 matrix like so where 0 = empty, 1 = grey kitten, -1 = orange kitten, 2 = grey cat, -2 = orange cat:
[[ 1, 0, 1, 0, 0, 0],
[ 0, 0, -2, 0, 0, 2],
[ 0, 0, -2, 0, 0, 0],
[ 0, 0, 1, 0, 0, 0],
[ 1, 0, 1, 0,-1, 0],
[ 0, 0, 0, 0, 0, 0]])

However, it's not that simple. If all I give you is a 6x6 board, can you tell me the number of kittens and cats that each player has that's off the board? If the board has 2 grey kittens and 2 orange kittens, you have no idea how many cats each player has in their pool by only looking at the board itself.

Therefore, the input for this black box isn't simply a 6x6 board. I stacked a bunch of 2D arrays on top of it - most of them are representing the number of kittens and cats remaining for each player. As far as I was concerned, if two boards are identical but the number of pieces in the players' pool are different, then I consider those as two completely different board states.

So you feed this magical black box an input of the board state, and this magical black box will spit out two things:
Policy - a probability distribution on what a good move is. You can imagine that there are 72 possible actions in Boop. The first 36 actions represents the 36 spots on the board where you can play a kitten, and the next 36 actions represents the 36 spots on the board where you can play a cat. With the way I designed it, those actions could also represent taking a kitten off the board to promote it when you have 8 felines on the board. Or it could represent choosing which three-in-a-row you want to take off when you have multiple of them.

Value - which player is winning this game? So for example, this output could look like: [0.00254151, 0.9974585] which you could think of it as the first player having a ~0.3% chance of winning and second player has a 99.7% chance of winning.

Fun Fact: At the very start, AlphaBoop thinks the first player has a 60% of winning the game. I also trained an AlphaZero algorithm on the young player's variant as well, and it thought that the first player has a 90% of winning the game.

This neural network, as a standalone, could already be a formidable foe, but that's not all. The true terror of AlphaZero is the synergy between two algorithms: Deep Neural Network + MCTS (Monte Carlo Tree Search).

The neural network, having been trained on countless games of Boop through self-play, is simply an expert guide for a tree search, telling it what the good moves are.

Well, at any rate, if you guys want to reenact the historical event of AlphaGo versus Lee Sedol (AI vs Humans) but with Boop, you can go challenge AlphaBoop.

Sure, I'm up for facing this bot in some unrated games. I never figured out how to get that previous computer model to work that I also saw mentioned on Kaepo's thread, though I did face Kaepo's bot.

Kiwigami · Post by **Kiwigami** » 14 February 2024, 16:49

You will have to teach me how to set up an unrated game then.

In Kaepo's post, I believe it was mentioned that games against the AI would affect ELO.

Is there a way to play a game that isn't rated?

Ceaseless · Post by **Ceaseless** » 14 February 2024, 16:59

Kiwigami wrote: ↑14 February 2024, 16:49 You will have to teach me how to set up an unrated game then.

In Kaepo's post, I believe it was mentioned that games against the AI would affect ELO.

Is there a way to play a game that isn't rated?

Friendly Mode.

Board Game Arena

AlphaBoop - the strongest AI for Boop

AlphaBoop - the strongest AI for Boop

Re: AlphaBoop - the strongest AI for Boop

Re: AlphaBoop - the strongest AI for Boop

Re: AlphaBoop - the strongest AI for Boop

Re: AlphaBoop - the strongest AI for Boop