Google DeepMind just entered the 90s. Fresh off their success in, DeepMind’s latest artificial intelligence can navigate a 3D maze reminiscent of the 1993 shooter game Doom.
Unlike most game-playing AIs, the system has no access to the game’s internal code. Instead it plays just as a human would, by looking at the screen and deciding how to proceed. This ability to navigate a 3D space by “sight” could be useful for AIs operating in the real world.
The work builds on research DeepMind, in which the team trained an AI to play 49 different video games from the Atari 2600, a games console popular in the 1980s. The software wasn’t told the rules of the games, and instead had to watch the screen to come up with its own strategies to get a high score. It was able to beat a top human player in 23 of the games.
That AI relied on a technique called reinforcement learning, which rewards the system for taking actions that improves its score, combined with athat analyses and learns patterns on the game screen. It was also able to look back into its memory and study past scenarios, a technique called experience replay.
But experience replay has drawbacks that make it hard to scale up to more advanced problems. “It uses more memory and more computation per real interaction,” write the DeepMind team in its latest paper. So it has come up with a technique called asynchronous reinforcement learning, which sees multiple versions of an AI tackling a problem in parallel and comparing their experiences.
This approach requires much less computational might. While the previous system required eight days of training on high-endto play Atari games, the new AI achieved better performance on more modest CPUs in just four days. With Atari well and truly beaten, the team moved on to other games. In a simple 3D racing game (see video below) it achieved 90 per cent of a human tester’s score.
The AI’s greatest challenge came from a 3D maze game called Labyrinth, a test bed for DeepMind’s tech that resembles Doom without the shooting (see video at top). The system is rewarded for finding apples and portals, the latter of which teleport it elsewhere in the maze, and has to score as high as possible in 60 seconds.
“This task is much more challenging than [the driving game] because the agent is faced with a new maze in each episode and must learn a general strategy for exploring mazes,” write the team. It succeeded, learning a “reasonable strategy for exploring random 3D mazes using only a visual input”.
More on these topics:
This entry passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.