Flappy Bird

  • Uploaded by: Bakhat Sikandar
  • 0
  • 0
  • September 2022
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Flappy Bird as PDF for free.

More details

  • Words: 1,740
  • Pages: 8
Loading documents preview...
Flappy Bird AI Final Project Boris Talesnik, Neria Saada, Issack Rosenberg The Hebrew University of Jerusalem Jerusalem, Israel Abstract​ —In this document we describe our AI final project and discuss our solution to Flappy Bird game, using learning algorithm of Q-learning with different states.

INTRODUCTION Flappy Bird is a 2013 mobile game, developed by ​ Vietnam​ -based ​ developer​ Dong Nguyen and published by GEARS Studios, a small independent game developer also based in Vietnam. The game is a ​ side-scroller​ where the player controls a bird, attempting to fly between rows of green pipes without hitting them. The objective is to direct a flying bird, named Faby, who moves continuously to the right, between sets of ​ Mario-like pipes​ . If the player touches the pipes, they ​ lose​ . Faby briefly flaps upward each time that the player taps the screen; if the screen is not tapped, Faby falls because of ​ gravity​ ; each pair of pipes that he navigates between earns the player a single ​ point​ .

 

THE GOAL Using learning algorithms we want to allow AI player to do a training set with different definitions of states (more details about those differences will discuss later), and learn the flappy bird world. After the training set we want the player to successfully continue playing as long as we want without any hit. We can examine the learning success using different parameters as number of training runs, learning rate, discount factor etc.

THE APPROACH We chose to use learning against search methods because of Flappy Bird is a continuous and dynamic game, the environment is not predefined, and we wanted to create agent that can take a first phase of learning and then play the game fluently. Q-learning algorithm is the most fit to our problem because the algorithm builds a policy and then can manage the states it will counter during the game. In order to make effective learning we needed to defined the state space of the problem. We found that there are 3 elements that we want to include in the state space: ❖ Vertical distance from the lower pipe. ❖ Horizontal distance from the lower pipe. ❖ Bird's velocity. We assumed that those 3 elements are the most important to describe a current state of the bird.

Different approaches to state definitions Definition: GR – grid resolution; parameter we used to quantize the game grid. We used GR values of 1,2 and 4. We discuss more about it in the results part. Naïve state space The various elements in the state vector have the following ranges: ❖ Vertical: ~ 0 – 500 ❖ Horizontal: ~ 0 – 300 ❖ Velocity ~ -9 – 9 (units are in pixels and pixels per frame for the velocity) It is easy to see that the size of the state space is above 106 which means that using all the states data will take too much time to learn, so we decided to use different approaches of the space quantization. Optimized state spaces In order to decrease number of states we decided to use 4 different approaches: 1. Distance only – using only vertical and horizontal distances. 2. Horizontal relativity (before or between the pipes) and vertical distance. 3. Similar to approach 2 but with bird velocity. 4. Distance and an boolean indicator for positive/ non-positive velocity. All of the distances were tested with different GR values, and in order to decrease the size of the state space even more we defined the max and min values distances from pipes. Outside of this range we count it as the upper\lower bound of distance. After all those reductions, we got new size of state space for each approach (in correspondence to the approaches above):

1. 2. 3. 4.

verticalDistance ⋅  horizontalDistance GR2 2⋅verticalDistance GR 2⋅ 20⋅   verticalDistance GR  horizontalDistance 2⋅verticalDistance ⋅ GR2

We decided to try all those 4 approaches and find which one of them makes the best learning. Rewards Staying alive between states: 1 Dying: -1000 Scoring +1: 1000

HOW TO USE THE CODE The platform we used is python 2.7. We tested it on a 32-bit python. We found open source of Flappy Bird game and defined new game engine which we used to develop the learning agents. Files Game.py​ : Defines an abstract game class. FlappyGame.py​ : Defines the flappy bird game. Player.py​ : Defines the player classes which we used in the game. AgentFlappyGame.py​ : Extends FlappyGame to use an agent. Agent.py​ : Defines various agent classes and state classes. Main.py​ : A script for manually setting the game parameters. TestScript.py​ : Used for running flappy game with an agent with various combinations of parameters. Running the game Make sure the library ​ pyGame ​ is installed ​ (pip install PyGame ​ or from ​ here​ ) Make sure the library ​ easygui ​ is installed​ (pip install easygui or from ​ here​ ) To run the game with a single set of parameters: ​ python Main.py will open the following window: Figure 2: The window to enter parameters

Parameters and ranges Training runs​ : amount of episodes until the agent stop learning. An episodes ends when the bird hit the pipe. Exploration rate (Epsilon)​ : the probability to choose a random action. 0-1 Learning rate (Alpha)​ : weight of a new Q-value. 0-1 Discount factor (Gamma)​ : the factor that multiplied the accumulated reward. 0-1 Pipe vertical gap​ : distance between lower and upper pipes. 130-160 FPS​ : the more frames the faster the game will run. Any natural number. Agent​ : True to use a learning agent, False for human player. GR​ : as defined in the approach section. 1,2,4. DataType​ : type of state space definition to use: ❖ 1: Naïve approach. ❖ 2: Distance only. ❖ 3: Horizontal relativity and vertical distance. ❖ 4: Horizontal relativity and vertical distance with bird velocity. ❖ 5: Horizontal relativity and vertical distance with binary bird velocity. Agent Mode: If the game is running in agent mode the game starts without GUI mode to run the learning phase (much) faster. To change the GUI mode (ingame) there is a set of keys to use (not the ones on the NUMPAD): ❖ 0​ : no GUI mode - speed is only limited by the processing power. ❖ 1​ : 30 FPS ❖ 2​ : 60 FPS ❖ 3​ : 120 FPS ❖ 4​ : 240 FPS Human Mode The only key needed is the space key for jumping and starting a game. Output During the learning phase the user will get message every 50 episodes. The message includes the high score and average score of the last 50 episodes.

   

 

RESULTS The number of options and parameter combinations to run the games is unlimited. Hence we ran a script that switched between different parameters and state spaces, every 3 minutes. We chose to present 3 runs for each state space - to show the relation between the parameters and performance. The graphs show the probability of the bird to crash into a pipe.

Naïve state space As seen on the graph - learning is happening, but it is slow due to the size of the state space. It’s clear that the GR parameter has a very large influence on the speed of learning - the higher the GR, the faster the learning. Although not quite seen here, the discount rate parameter has an effect - higher is faster, but it’s effect is minor considering the GR parameter.

Distance only With this state space, we can see that the learning is much faster (only after 300 episodes the loss probability is smaller than 0.6). Also, we can see the influence of the grid size on the learning speed, with grid size = 4, we got almost 0 loss probability.

Horizontal relativity Here we can see even better improvement - the loss probability is getting close to zero in less than 100 episodes. And again we have the fastest learning with bigger grid size.

Horizontal relativity and velocity Here we can see that the addition of the velocity variable is actually slowing the learning, of course it’s because the state space is larger. On another note, the learning seems to be more monotone and consistent than the approach without the velocity.

Distance and binary velocity

The results actually show that though the state space is smaller than the previous approach, the learning is not faster (even slower with some runs) - this can be due to the fact that learning with the complete set of the velocities creates a better learning procedure.

CONCLUSIONS Q-Learning works The results show clearly that it’s possible for an agent with a Q-Learning mechanism to learn how to play the game and actually get good results; far better that the common person. State space size We got a clear connection between the size of the state space and the time it takes for the agent to learn. This of course is assumed due to the nature of the Q-Learning algorithm. It’s also the most influential parameter (except for the exploration - more on that after this). Another thing we did see is that though the learning speed is slower on the larger state spaces, it was more consistent and more monotone. This is due to the fact that when the resolution of the grid is higher (lower GR factor), the states are more distinguishable, i.e. in a low res grid a good state and a bad state might fall into the same bin because the resolution of the actual game doesn’t change with the GR factor - it’s always the full resolution of the game. Best space The best results were for the smallest state space (approach #2 - distance only and grid size 4). After only 100 episodes we got a loss probability close to zero, which means the agent can play hours without losing. Exploration During the tests we saw that the exploration parameter has to be near 0 - this is caused by the fact that a frame is a turn, and there are many ( really, a lot) frames per game. An exploration factor of above 0.0001 was making the game impossible to learn - at least in the time frame we gave for each run.

References and additional links Flappy Bird - python github Q-Learning - wikipedia Unit 7A Slides - MDP Project 4: Reinforcement Learning

Related Documents

Bird Watching
January 2021 2
Collins Bird Guide
January 2021 13
Stupid Fucking Bird
March 2021 0
Bird Of Sot Est Asia.pdf
January 2021 1

More Documents from "Teodora Coneac"