AI Learning Excitebike with XGBoost

Following the incredible boost in machine learning applications in the recent years, people have made artificial intelligence (AI) learn many tasks. For example, it has learn to drive cars, translate languages, understand pictures, filter emails, etc. This incredible development now even extends to video games. A notable effort came from Deepmind with a Convolutional Neural Network algorithm that can play an handful of Atari 2600 games. This is not the only case, as there are multiple games played by AI that can be found on the Internet: Super Mario World, Pac-Man, Super Mario Kart, etc.

Recently, I stumbled upon this paper which outline a method for making an AI learn to play video games using a Gradient Boosting algorithm. Being more familiar with Gradient Boosting techniques than Neural Networks, I decided to give the method a chance.

I choose to make the AI learn to play the NES game Excitebike. Learning Excitebike follows a blog post that I’ve made some years ago where I sampled random trials of a single jump. Now, the scope is much bigger than a single jump: I want the AI to learn how to navigate through the entire race!Excitebike_coverCool2.jpg

This blog post will first present the environment of this experiment. I will briefly introduce the techniques used to make this work. I will not go into details, preferring to point to other resources instead. I will focus on the right techniques that made it work. Furthermore, this blog post will show what the AI learned during its journey with examples of its improvements. Finally, I will talk about features that could be improved.

1. Experiment Context

Emulation

Normally, the game is played on a NES console. For the scope of this project, we need a very flexible framework. In this case, the game is instead played from the Bizhawk emulator, which incorporate an interface to run computer programs in the Lua language.

Normally, to play the game, a player has to press buttons on the NES controller. Since the experiment is actually being run from an emulator, the AI will evaluate which input maximize the future outcomes, and digitize the buttons directly from the computer.

How-to.png

Figure 1: How to play Excitebike from the official manual.

Figure 1 shows how the controller needs to be used from the Excitebike official manual. To make the bike go forward in Excitebike, a player has to press an accelerator button – either the A or B button. The A button is for the normal acceleration while the B button is for the turbo. For the sake of simplification, I made the AI always press A thus it doesn’t have to learn about managing the turbo. Instead it concentrates its learning on overcoming the obstacles.

To avoid the obstacles and make the best from the jumps, the control pad has to be used. The horizontal axis is used to gauge the angle of the bike, effectively lowering or raising the nose of the bike. Conversely, the vertical axis on the control pad makes the bike change lanes.

Game Settings

I chose to do the experiment on the first level of Excitebike (map from vgmaps.com). Compared to the other races, it’s the easiest one. Animation 1 shows an accelerated example (×2 speed) of a run where the directional pad was input randomly. I also chose the “game A” mode, meaning that there were no computer-controlled rivals on the screen. Obviously, these choices were made to simplify a little bit the task for the AI.

ExcitebikeRandom2.gif

Animation 1: Example run where the directional pad is input randomly.

2. Technical Approach Taken

Algorithm

As said previously, I used this paper’s technique to generate an AI that will learn how to play the game. It introduces two novel ideas: a gradient-boosting style learner and an exploration strategy inspired by the principles of state abstraction. Only the first technique, gradient-boosting style learner, was implemented in my Excitebike AI. Nonetheless, the second technique could be interesting to introduce in a future blog post.

An improvement to the algorithm incorporated a fail-safe method: every 5 simulations, it would assess if the algorithm had improved. If not, it would discard this 5 simulations block, effectively returning back 5 simulations. In addition to this, double Q-Learning was used as well.

Target Variable and Selected Features

To improve, the algorithm needs a goal: complete the race as fast as possible. To evaluate how fast the bike goes, we compute the number of pixels traveled by the bike during a period of 4 frames. On the NES, there are 60 frames per second.

Traditional AI gaming learning techniques use image as an input. Because that requires considerable computing power, we’re going to use the memory values as input instead. I selected some features from the RAM (working memory) to use in the modelTable 1 shows the features name and their address.

Features.png

Table 1: Memory addresses selected as features in the modeling.

Mechanics

The Bizhawk emulator, with its Lua interpreter, ran all the simulations. After each finished race, the Lua program exports the following data into a file:

  • which buttons were pressed,
  • the value of the selected features,
  • the target variable.

R, with the XGBoost package, reads the file and, from the data, computes a regression tree on the gradient boosting learning model after each finished race. It exports to a Lua function the model’s results. After receiving this updated function, Lua can use it in its simulations to test which buttons to input to have the best outcome (see Figure 2).

Schema.png

Figure 2: How Lua and R learn to play Excitebike.

3. Results

Baseline

To better understand the results and assess if the AI improved, we can check whether it’s better than random. Figure 3 shows the distribution of the time taken to finish the race for a sample of 1000 runs where the control pad was input randomly. As seen in the figure, it takes in average 5408 frames to finish the race. At 60 frames per second, it means that it takes around 90 seconds to complete the race. Animation 1, presented in an earlier section, is one example of these runs. This sample of runs constitutes the baseline that the AI has to beat.

Baseline Results.png

Figure 3: Boxplot of the number of frames it took to finish the race for a sample of 1000 runs where the controller was randomly input.

Learning

Time Results.png

Figure 4: Number of frames it took for the AI to complete the race by iteration of the algorithm. Each green dot shows the result of a run. The green line is an average smooth of these results. The orange dashed line is the average for the baseline random model with the paler band containing 90% of the baseline random model results.

The algorithm was trained for several hours during which it did 910 runs. Figure 4 shows the number of frames it took to finish the race for each of the 910 iterations. At first, results were disappointing as the time to complete the race got higher and higher. It got much worse than the baseline model (orange line), reaching almost 4000 additional frames to complete the race in average (~90 seconds) .

The problem is that in ExciteBike, if the bike’s nose is lowered in a jump, it goes faster than if the bike is flat. The AI learned very fast this knowledge. Unfortunately, if the angle is too extreme when the bike lands, the player crashes. Animation 2 shows one of the worst run where the model is reckless and crashes the Bike repeatedly.

ExcitebikeXgbBad.gif

Animation 2: Early example run where the AI had learned to lower the bike’s nose to move faster in a jump, but hadn’t learned to be careful about crashes.

Fortunately, after some hours, the AI started to improve. A major point was around iteration 750 where the model finally got better than random. At this point, the AI learned about the optimal angle for speed and stability. Animation 3 shows the resulting run of the final model, where the bike did not crash a single time.

ExcitebikeXgb

Animation 3: Final run piloted by the AI trained with XGBoost.

Maybe it could have improved even further with additional iterations. Unfortunately, the algorithm gets more and more burdensome. A run that would initially take 10 seconds to simulate takes several minutes at around 1000 simulations. This is a limit for the gradient boosting algorithm, which gets more complex at each iteration. This is because the program adds an additional tree each time a race is completed.

4. Conclusion

Today, we saw how gradient boosting could be used to train a model to play the game Excitebike on the NES. Results were initially bad but got much better after several iterations. Unfortunately, the model gets too demanding for a computer to run at some point.

This is my first attempt at reinforcement learning in video games. Future works could try to optimize the computation of the trees. Right now, they are evaluated simply with a loop in Lua. Furthermore, an algorithm could replace the worst trees. This way, the number of trees would stay artificially low. Finally, XGBoost’s parameters could be optimized as they were mostly left to default.

5. Appendix

Repository of all codes and files.

One thought on “AI Learning Excitebike with XGBoost

Leave a comment