generals io resnet
play against the model here
generals.io. is a fun & fast paced game that involves strategically marching troops around a game board with the objective of finding and capturing your opponent's "general" whilst defending & trying to hide your own. this is a write up of how i went about training a neural network to play the game.
(the game itself is a bit of an info-hazard/surprisingly addictive because the game rules are relatively straight forward and there is not much of a learning curve to get started. i've watched, one by one, as friends around me have become consumed & so proceed with discretion.)
sourcing the data
any good ML model will be hungry for lots of data, and lucky for us, every game that is played on the generals.io website is recorded as "replays".
i was able to find a dataset of ~200k game replays in the generals.io discord. once you unzip the download, you'll see many .gioreplay files. each file is a json that corresponds to one recorded game & has fields that describe the initial layout of the board, such as it's width, height where cities, mountains, and generals are. then, the "moves" of every player is recorded and stored in a list of dictionaries that describe which player moved from what square to what other square at a specific time step of the game.
if you wanted to curate your own dataset, you can also scrape game replays in any user's profile page like https://generals.io/profiles/jinglinl. from there, you can click on a past game look at the network requests, you'll see a .gior (generalsio replay) request which contains a lz-string compressed + encoded json blob that you can decode + decompress to recover the original JSON.
dataset
before writing any neural net code, i spent a good deal of time wrangling and understanding the data i was going to be working with.
first, i thought about what the input and output to the neural net should be. i thought about how i personally played the game and reasoned that soon the input being a board state and the output being which action the model should take given that board state.
input tensor
replay_to_tensor.py that converts raw .gioreplay files into pytorch tensors.
output tensor
generals.io maps have a maximum size of 32x32 tiles. at every move, you need to choose a tile, then choose to whether to go up, down, left, right and then choose whether or not to split your troops. this gives us (32x32)x (4)x(2) = 8120 possible moves.
therefore output of the model is a single integer from 0 to 8191, each corresponding to a (cell, direction, split) combination.
visualizing and interrogating the data
unfortunately for me, when i look at a tensor of numbers, i don't glean much information about what is going on.
spent a lot of time inspecting the data with various visualizer tools i built. it was during this time that i caught many bugs in my game engine.
- for example, i would have the visualizer show me a certain board state of the game, and the corresponding action that was paired with the board state. i noticed that about 15% of these pairs would contain actions that were completely invalid like trying to move from an enemy's square, or a fogged square or a mountain. because my game engine had a bug.
- normally in the game, you can choose to make a null move (take no action) at any given step of the game. in the game, it is usually always advantageous to move, but humans can be feeble minded and might choose to take a second to evaluate the state of the game board or maybe they've ran out of troops in their current march but were too slow to move their cursor to an alternate square to start a new march. i knew that my NN would not have such limitations and because of this i wanted to purposefully train a model that had a bias for action and so i made it impossible to output a no action move. each
.gioreplayfile conveniently also only logs moves where the player actually took an action, so null move simply don't appear in the data. - but, as training progressed, i noticed that while the model would trend towards making 100% valid moves, every once in a while, it would choose to make a null move, where source and destination where the smae tile. this led me to suspect that in the dataset probably did contain some games that contained null moves, maybe from bots that were playing on the site.
evals, how do i know the model is learning and playing well
does the model's predicted move originate from a tile the player actually owns?
- this eval is a santiy check that the model is learning to move from a tile that it owns and not an enemy tile, or a fogged tile, or a mountain or city tile.
- normally in the game, you can choose to make a null move (take no action) at any given step of the game. in the game, it is usually always advantageous to move, but humans can be feeble minded and might choose to take a second to evaluate the state of the game board or maybe they've ran out of troops in their current march but were too slow to move their cursor to an alternate square to start a new march. i knew that my NN would not have such limitations and because of this i wanted to purposefully train a model that had a bias for action and so i made it impossible to output a no action move. each
.gioreplayfile conveniently also only logs moves where the player actually took an action, so null moves simply don't appear in the dataset. but, as training progressed, i noticed that while the model would trend towards making 100% valid moves, every once in a while, it would choose to make a null move, where source and destination where the same tile. this led me to suspect that in the dataset probably did contain some games that contained null moves, possibly from bots that were playing on the site and were sending moves like{start: N, end: N}via an API. this suspicion was correct and i modifiedreplay_to_tensor.pyto any null moves.
game engine
because each .gioreplay file recounts a whole game, i needed to create a game engine that would simulate the rules of the game & create board state, action-that-followed tensors for each player of the game.
train/val/test split
replays_prod is split 85/10/5 using a random.seed() into data-train/, data-val/, and data-test/ using symlinks. is having both a data-val/ and data-test/ controversial? i'm not sure, but i felt very data abundant so i decided to have both in the off chance i started overfitting my model to my validation set.
dataloader
model architecture(s)
V1: MLP
starting off, i wanted to avoid introducing unnecessary complexity into my project especially as i was still setting up other parts of my training pipeline like the dataset/dataloader, training loop, and evals.
V2: ResNet!
V3: RL (IN PROGRESS)
one shortfall of behavioral cloning is that there is currently no way to discern training examples of "good moves" and training examples of "bad moves". as in, all (board_state, action_tensor) pairs are treated as equally worth mimicking, even though within games there are winners/losers and good players/bad players. even when train on moves filtered on moves of top players, they too can make absolute blunders.
there is also a second common issue with BC on top player data called distribution shift (ross and bangel). the model is trained on states the demonstrate visted, but at inference it visits states its own policy leads to, which can diverge, so then it has no idea what to do when it gets to those states.
Decision Transformers
training loop
inference loop
evals
self play
red teaming
renting gpus
interesting quirks about the model & experiments i ran to address them
- does the thing where it marches back and forth from square a to sqaure b then over and over again
- it's really bad at defending it's king tower when the kingtower is in "check"
- i realized that as a human, i play more defensively once i know my opponent has 'seen' my king tower. that is, the knowledge that my opponent is now aware of my king tower location makes me play the game more defensively. i might start aggregating my troops and marching them back near the king tower to prepare for an attack.
- i wanted to convey this information to my model, so i added a new channel opponent_seen, which is a map that represents the tiles of the board that the opponent has seen at some point of the game.
karpathy blog post
i found https://karpathy.github.io/2019/04/25/recipe/ very useful