MuZero achieves superhuman performance in chess, Go, shogi and Atari – without knowing the rules – using a learned model.

Nature-20 (info)

DeepMind’s AlphaFold solves the protein folding problem.

(info) Nature-20 (info)

AlphaStar achieves grandmaster level in the game of StarCraft II. Matches were played using a pro approved interface, on the full game without any restrictions

Nature-19 (info) (first results)

AlphaZero learns chess, shogi and Go by self-play, without human knowledge, to defeat existing world champion programs.

Science-18 (infoarXiv-17 (older)

Max Jaderberg’s For The Win agent learns by self-play, directly from raw pixels, to play Quake III Arena: Capture the Flag at human level.

Science-19 (info) arXiv-18 (older)

Greg Wayne’s Merlin combines memory and reinforcement learning to solve the DeepMind Lab, directly from raw pixels.


AlphaGo Zero becomes the world’s strongest Go player, starting completely from scratch, without any human knowledge.

Nature-17 (info)

AlphaGo defeats a human professional player for the first time, by combining deep neural networks and tree search.

Nature-16 (info) ICLR-15 (older)

A single neural network architecture learns to play many different Atari games to human level, directly from video input and joystick output.

Nature-15 NIPS-18 AAAI-17 ICLR-17 NIPS-16 AAAI-16 ICLR-16 ICML-DLW-15 NIPS-DLW-13

Demo Source

First results on the new Starcraft II environment for reinforcement learning


Deep reinforcement learning solves a variety of continuous manipulation and locomotion problems, using a single neural network architecture.

arXiv-17 ICLR-16 NIPS-16 NIPS-15


Deep reinforcement learning approaches superhuman performance in poker, without domain knowledge.


SmooCT wins three silver medals at the Computer Poker Competition.


Monte-Carlo search in Civilization II beats the built-in AI.


Demo Source

Real-time planning in games with hidden state, using partially observable Monte-Carlo planning (POMCP).


Demo Source

Joel Veness’ Meep is the first master-level chess program with an evaluation function that was learnt entirely from self-play, by bootstrapping from deep searches.


RLGO is a Go program based on reinforcement learning techniques. It combines TD learning and TD search, using a million binary features matching simple patterns of stones. RLGO outperformed traditional (pre-Monte-Carlo) programs in 9×9 Go.

Source MLJ PhD ICML-08 IJCAI-07

Sylvain Gelly’s MoGo (2007) is a Go program based on Monte-Carlo tree search. It was the world’s first master level 9×9 Computer Go program, and the first program to beat a human professional in even games on 9×9 boards and in handicap games on 19×19 boards.


Real-time strategy games are often plagued by pathfinding problems when large numbers of units move around the map. Cooperative pathfinding allows multiple units to coordinate their routes effectively in both space and time.

Demo AIIDE-05 AIW-06

In a previous life, I was CTO for Elixir Studios and lead programmer on the PC strategy game Republic: the Revolution.