The AI Doom’s Day Has Arrived
This neural network doesn't play Doom. This neural network is Doom.
Training an artificial neural network to play a video game is nothing new at this point in time, as many very successful approaches have been worked out over the years. But training a neural network to be a video game is something different entirely and raises many questions about intellectual property rights and what exactly constitutes a copy of a video game. Researchers at Google and Tel Aviv University have recently opened this Pandora’s box with the release of what they call GameNGen, which clones the logic and visuals of a video game into a neural model.
The team started with the goal of cloning Doom because, to a hacker, no other pursuit could be quite so noble. To make that happen, they first built an AI pipeline that leverages the image generation algorithm Stable Diffusion to predict the next video frame that one would expect to see when given a series of past frames combined with user inputs. By seeding the algorithm with an initial sequence of images, the game can be started. After that, the predicted images themselves start to slide back into the sequence of past frames, which allows the action to keep moving. And as the user provides inputs, like moving or firing a weapon, the predicted images change to represent the expected actions.
It took a huge amount of training data to train this pipeline, so the researchers developed another machine learning algorithm to help them collect it. This algorithm continuously played Doom and captured long sequences of screenshots that were paired with user inputs.
In a perfect world, that may have been enough to crack the problem. But it is not a perfect world. The team ran into issues with small errors being introduced into the predicted images. As these errors moved back into the past sequence of frames, the errors were multiplied over time. That process continued until everything was a jumbled mess.
To correct for this problem, another neural network was developed. This one was trained to understand what a good frame from the game looks like. It was used to fix up any issues and produce a clean image that can then be displayed and used to keep the action rolling along without a glitch — well, mostly anyway.
The result is quite impressive. Game screens largely look correct, and the algorithm learned not only the basic logic of the game, but also more complex things like the color of a key that is needed to open a particular door. It is not perfect, however. Odd visual artifacts are sometimes visible, and some on-screen elements may inexplicably become blurry at times.The level design also leaves something to be desired. The actual level designs have been replaced by whatever the algorithm dreams up, and sometimes they do not make a whole lot of sense or may be unwinnable.
The longer the game goes on, the more likely issues are to pop up. But if you want to play for just a couple minutes, the 20 FPS experience could make you feel like you are playing a real version of Doom from an alternate universe.