Publications

Fast reinforcement learning with generalized policy updates. A Barreto, S. Hou, D. Borsa, D. Silver, D. Precup. PNAS 2020.

Discovering Reinforcement Learning Algorithms. J. Oh, M. Hessel, W. M. Czarnecki, Z. Xu, H. V. Hasselt, S. Singh, D. Silver. arXiv 2020.

Meta-Gradient Reinforcement Learning with an Objective Discovered Online. Z. Xu, H. V. Hasselt, M. Hessel, J. Oh, S. Singh, D. Silver. arXiv 2020.

Expected Eligibility Traces. H. V. Hasselt, S. Madjiheurem, M. Hessel, D. Silver, A. Barreto, D. Borsa. arXiv 2020.

The Value-Improvement Path Towards Better Representations for Reinforcement Learning. W. Dabney, A. Barreto, M. Rowland, R. Dadashi, J. Quan, M. G. Bellemare, D. Silver. arXiv 2020.

Self-Tuning Deep Reinforcement Learning. T. Zahavy, Z. Xu, V. Veeriah, M. Hessel, J. Oh, H. van Hasselt, D. Silver, S. Singh. arXiv 2020.

Value-driven Hindsight Modelling. A. Guez, F. Viola, T. Weber, L. Buesing, S. Kapturowski, D. Precup, D. Silver, N. Heess. arXiv 2020.

Improved Protein Structure Prediction Using Potentials from Deep Learning. A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A. W. R. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D.T. Jones, D. Silver, K. Kavukcuoglu, D. Hassabis. Nature, 2020.

What Can Learned Intrinsic Rewards Capture? Z. Zheng, J. Oh, M. Hessel, Z. Xu, M. Kroiss, H. van Hasselt, D. Silver, S. Singh. ICML 2020.
Protein Structure Prediction Using Multiple Deep Neural Networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A. W. R. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D. T. Jones, D. Silver, K. Kavukcuoglu, D. Hassabis. Proteins: Structure, Function, and Bioinformatics, 2019.
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, D. Silver. arXiv 2019.
Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning. O. Vinyals, I. Babuschkin, W.M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D.H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J.P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. Wünsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, D. Silver. Nature 2019.
Behaviour Suite for Reinforcement Learning. I. Osband, Y. Doron, M. Hessel, J. Aslanides, E. Sezener, A. Saraiva, K. McKinney, T. Lattimore, C. Szepezvari, S. Singh, B. Van Roy, R. Sutton, D. Silver, H. van Hasselt. ICML 2020.
On Inductive Biases in Deep Reinforcement Learning. M. Hessel, H. van Hasselt, J. Modayil, D. Silver. arXiv 2019.
Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement. A. Barreto, D. Borsa, J. Quan, T. Schaul, D. Silver, M. Hessel, D. Mankowitz, A. Žídek, R. Munos. ICML 2018.
An Investigation of Model-Free Planning. A. Guez, M. Mirza, K. Gregor, R. Kabra, S. Racanière, T. Weber, D. Raposo, A. Santoro, L Orseau, T. Eccles, G. Wayne, D. Silver, T. Lillicrap. ICML 2019.
Credit Assignment Techniques in Stochastic Computation Graphs. T. Weber, N. Heess, L. Buesing & D. Silver. AISTATS 2019.

The Option Keyboard Combining Skills in Reinforcement Learning. A. Barreto, D. Borsa, S. Hou, G. Comanici, E. Aygün, P. Hamel, D. Toyama, S. Mourad, D. Silver, D. Precup. NeurIPS 2019.

Discovery of Useful Questions as Auxiliary Tasks. V. Veeriah, M. Hessel, Z. Xu, J. Rajendran, R.L. Lewis, J.Oh, H. van Hasselt, D. Silver, S. Singh. NeurIPS 2019.

Universal Successor Features Approximators. D. Borsa, A. Barreto, J. Quan, D. Mankowitz, R. Munos, H. van Hasselt, D. Silver, T. Schaul. ICLR (Poster) 2019.

Bayesian Optimization in AlphaGo. Y Chen, A Huang, Z Wang, I Antonoglou, J Schrittwieser, D Silver, N. de Freitas. arXiv 2018.

Introduction to the Special Issue on Deep Reinforcement Learning: An Editorial. R. Sun, D. Silver, G. Tesauro, G.B. Huang. Neural Networks 107:1-2 (2018).

Distributed Prioritized Experience Replay. D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. van Hasselt, D. Silver. ICLR 2018.

De Novo Structure Prediction with Deep-Learning Based Scoring. R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Zidek, A. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, D.T. Jones, D. Silver, K. Kavukcuoglu, D. Hassabis, A.W. Senior. Annu Rev Biochem 2018.

Meta-Gradient Reinforcement Learning. Z. Xu, H. van Hasselt, D. Silver. NIPS 2018.
Human-Level Performance in First-Person Multiplayer Games with Population-Based Deep Reinforcement Learning. M. Jaderberg, W. Czarnecki, I. Dunning, L. Marris, G. Lever, A. Castaneda, C. Beattie, N. Rabinowitz, A. Morcos, A. Ruderman, N. Sonnerat, T. Green, L. Deason, J. Leibo, D. Silver, D. Hassabis, K. Kavukcuoglu, T. Graepel. arXiv 2018.
Learning to Search with MCTSnets. A. Guez, T. Weber, I. Antonoglou, K. Simonyan, O. Vinyals, D. Wierstra, R. Munos, D. Silver. ICML 2018.
Implicit Quantile Networks for Distributional Reinforcement Learning. W. Dabney, G. Ostrovski, D. Silver, R. Munos. ICML 2018.
Unsupervised Predictive Memory in a Goal-Directed Agent. G. Wayne, C. Hung, D. Amos, M. Mirza, A. Ahuja, A. Grabska-Barwinska, J. Rae, P. Mirowski, J. Leibo, A. Santoro, M. Gemici, M. Reynolds, T. Harley, J. Abramson, S. Mohamed, D. Rezende, D. Saxton, A. Cain, C. Hillier, D. Silver, K. Kavukcuoglu, M. Botvinick, D. Hassabis, T. Lillicrap. arXiv 2018.

Unicorn: Continual Learning with a Universal, off-policy agent. D. Mankowitz, A. Zidek, A. Barreto, D. Horgan, M. Hessel, J. Quan, J. Oh, H. van Hasselt, D. Silver, T. Schaul. arXiv 2018.

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M., A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, D. Hassabis. Science 2018.

Mastering the Game of Go without Human Knowledge. D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel & D. Hassabis. Nature 529 (7587): 484-489 (2016).

Successor Features for Transfer in Reinforcement Learning. A. Barreto, R. Munos, T. Schaul, D. Silver. NIPS 2017.
Natural Value Approximators. Z. Xu, J. Modayil, H. van Hasselt, A. Barreto, D. Silver, T. Schaul. NIPS 2017.
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Perolat, D. Silver, T. Graepel. NIPS 2017.
Rainbow: Combining Improvements in Deep Reinforcement Learning. M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar, D. Silver. AAAI 2017.
StarCraft II: A New Challenge for Reinforcement Learning. O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. Vezhnevets, M. Yeo, A. Makhzani, H. Kuttler, J. Agapiou, J. Schrittwieser, J. Quan, S. Gaffney, S. Petersen, K. Simonyan, T. Schaul, H. van Hasselt, D. Silver, T. Lillicrap, K. Calderone, P. Keet, A. Brunasso, D. Lawrence, A. Ekermo, J. Repp, R. Tsing. arXiv 2017.
Emergence of Locomotion Behaviours in Rich Environments. N. Heess, T.B. Dhruva, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, A. Eslami, M. Riedmiller, D. Silver. arXiv 2017.
FeUdal Networks for Hierarchical Reinforcement Learning. A. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, K. Kavukcuoglu. ICML 2017.
Decoupled Neural Interfaces using Synthetic Gradients. M. Jaderberg, W. Czarnecki, S. Osindero, O. Vinyals, A. Graves, D. Silver, K. Kavukcuoglu. ICML 2017.
The Predictron: End-to-End Learning and Planning. D. Silver, H. van Hasselt, M. Hessel, T. Schaul, A. Guez, T. Harley, G. Dulac-Arnold, D. Reichert, N. Rabinowitz, A. Barreto, T. Degris. ICML 2017.
Reinforcement Learning with Unsupervised Auxiliary Tasks. M. Jaderberg, V. Mnih, W. Czarnecki, T. Schaul, J. Leibo, D. Silver, K. Kavukcuoglu. ICLR 2017.
Learning Functions Across Many Orders of Magnitude. H. van Hasselt, A. Guez, M. Hessel, D. Silver. NIPS 2016.
Asynchronous Methods for Deep Reinforcement Learning. V. Mnih, A. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu. ICML 2016.
Mastering the Game of Go with Deep Neural Networks and Tree Search. D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Nature 2016.
Continuous Control with Deep Reinforcement Learning. T. Lillicrap, J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra. ICLR 2016.
Prioritized Experience Replay. T. Schaul, J. Quan, I. Antonoglou, D. Silver. ICLR 2016.
Deep Reinforcement Learning with Double Q-Learning. H. van Hasselt, A. Guez, D. Silver. AAAI 2016.
Learning Continuous Control Policies by Stochastic Value Gradients. N. Heess, G. Wayne, D. Silver, T. Lillicrap, T. Erez, Y. Tassa. NIPS 2015.

Memory-Based Control with Recurrent Neural Networks. N. Heess, J. Hunt, T. Lillicrap, D. Silver. NIPS-DRLW 2015.

Reinforced Variational Inference. T. Weber, N. Heess, S. Eslami, J. Schulman, D. Wingate, D. Silver. NIPS ABIW 2015.
Massively Parallel Methods for Deep Reinforcement Learning. A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, S. Legg, V. Mnih, K. Kavukcuoglu, D. Silver. ICML-DLW 2015.
Universal Value Function Approximators. T. Schaul, D. Horgan, K. Gregor, D. Silver. ICML 2015.
Fictitious Self-Play in Extensive-Form Games. J. Heinrich, M. Lanctot, D. Silver. ICML 2015.
Smooth UCT Search in Computer Poker. J. Heinrich, D. Silver. IJCAI 2015.
Value Iteration with Options and State Aggregation. K. Ciosek. D. Silver. ICAPS-WPL 2015.
Human-level Control Through Deep Reinforcement Learning. V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis. Nature 2015.
Move Evaluation in Go Using Deep Convolutional Neural Networks. C. Maddison, A. Huang, I. Sutskever and D. Silver. ICLR 2015.
Bayes-Adaptive Simulation-based Search with Value Function Approximation. A. Guez, N. Heess, D. Silver and P. Dayan. NIPS 2014.
Deterministic Policy Gradients. D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller. ICML 2014.
Unit Tests for Stochastic Optimization. T. Schaul, I. Antonoglou and D. Silver. ICLR 2014.
Better Optimism By Bayes: Adaptive Planning with Rich Models. A. Guez, D. Silver and P. Dayan. arXiv 2014.
Playing Atari with Deep Reinforcement Learning. V. Mnih, K. Kavokcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. NIPS-DLW 2013.
Concurrent Reinforcement Learning from Customer Interactions. D. Silver, L. Newnham, D. Barker, S. Weller and J. McFall. ICML 2013.
Actor-Critic Reinforcement Learning with Energy-Based Policies. N. Heess, D. Silver and Y. W. Teh. EWRL 2012.
Compositional Planning Using Optimal Option Models. D. Silver and K. Ciosek. ICML 2012.
Learning to Win by Reading Manuals in a Monte-Carlo Framework. S. Branavan, R. Barzilay and D. Silver. JAIR 2012.
Temporal-Difference Search in Computer Go. D. Silver, R. Sutton and M. Müller. MLJ 2012.
The Grand Challenge of Computer Go: Monte-Carlo Tree Search and Extensions. S. Gelly, L. Kocsis, M. Schoenauer, M. Sebag, D. Silver, C. Szepesvari and O. Teytaud. CACM 2012.
Monte-Carlo Tree Search and Rapid Action Value Estimation in Computer Go. S. Gelly and D. Silver. AIJ 2011. Prominent paper award, 2016.
Non-linear Monte-Carlo Search in Civilization II. S. Branavan, D. Silver and R. Barzilay. IJCAI 2011.
Playing Games with Language in a Monte-Carlo Framework. S. Branavan, R. Barzilay and D. Silver. ACL 2011.
A Monte-Carlo AIXI Approximation. J. Veness, K. S. Ng, M. Hutter, W. Uther and D. Silver. JAIR 2011.
Monte-Carlo Planning in Large POMDPs. D. Silver and J. Veness. NIPS 2010.
Reinforcement Learning via AIXI Approximation. J. Veness, K. S. Ng, M. Hutter and D. Silver. AAAI 2010.
Reinforcement Learning and Simulation-Based Search. D. Silver. Ph.D. thesis, 2009.
Bootstrapping from Game Tree Search. J. Veness, D. Silver, A. Blair and W. Uther. NIPS 2009.
Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation. R. Sutton, H. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvari and E. Wiewiora. NIPS 2009.
Monte-Carlo Simulation Balancing. D. Silver and G. Tesauro. ICML 2009.
Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation. R. Sutton, H. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvari and E. Wiewiora. ICML 2009.
Achieving Master-Level Play in 9×9 Computer Go. S. Gelly and D. Silver. AAAI Nectar 2008.
Sample-Based Learning and Search with Permanent and Transient Memories. D. Silver, R. Sutton and M. Müller. ICML 2008.
Combining Online and Offline Learning in UCT. S. Gelly and D. Silver. ICML 2007. Test of time award, 2017.
On the Role of Tracking in Stationary Environments. R. Sutton, A. Koop and D. Silver. ICML 2007.
Reinforcement Learning of Local Shape in the Game of Go. D. Silver, R. Sutton and M. Müller. IJCAI 2007.
Cooperative Pathfinding D. Silver. AIIDE 2005.