Reinforcement learning is a branch of artificial intelligence where a computer learns about a problem by directly interacting with the problem. This project uses reinforcement learning to try to learn to play the game Doodle Jump on an old iPhone. A camera snaps 30 images per second from the iphone. These images are analyzed then a simple robots manipulates the iPhone by rotating it and pressing buttons on the screen.
HardwareThe "smart" part of the is project is an Nvidia Jetson running the Robot Operating System (ROS) and TensorFlow. The robot includes:
- A stepper motor used to rotate the iPhone.
- A servo motor and a solenoid used to press buttons on the iPhone screen.
- An Ximea camera used to capture pictures of the iPhone screen and feed them via USB3 to the Jetson.
- An EIBot board drives the stepper and the servos.
- There's also a power supply in a retro looking box that provides 12 volts to the Jetson, 19 volts to the EIBot board and 5 volts to a USB hub.
- A keyboard, mouse and display used to run Linux and control things.
ROS is a great way to build a project like this. Using ROS let me construct the system as a set of independent processes that communicate using a standard message passing system. This project includes both C++ and Python processes. The message compiler builds a stub in both languages so it's easy to pass messages around. The main ROS modules include:
- A camera driver that receives images from the camera.
- A simple neural net that reads the score from the screen.
- A bigger, TensorFlow-based neural net that analyzes the pictures received from the camera. This is the net that is trained by interactions with the robot.
- An archive process responsible for saving actions and screen images in an sqlite database for later training use.
- An EIBot board driver.
- A screen driver.
The reinforcement learning branch of AI is one of the most active areas. This system uses a variant called imitation learning. I played hundreds of games of Doodle Jump using the robot to manipulate the iPhone. The system archived the images and the actions I took. These images and actions were used as a training set for the neural net. It took many thousands of training images before the system started to do anything sensible. Currently the system can play a rudimentary game. It doesn't wildly press buttons or shake the screen randomly. As part of the learning, I correct, as well as I can, bad decisions in real time. These games and the corrections, as well as a random selection of the last 20K images are used as a training after each run.
The process is much slower than I expected. I hope to improve things by adopting a more sophisticated learning model, and I hope to turn the system loose to learn on its own.
Comments