Campbell, M., Hoen, A.J. Jr. and Hsu, F.H. Deep Blue Artif. Intel. 134, 57–83 (2002).
Silver, D et al. Master the game of Goni with Deep Nada Neural Network and Tree Discovery. Nature Is 529, 484–489 (2016).
Bellemer, M.G., Naddaf, Y., Venice, J. And Bling Ling, m. Arcade Learning Environment: An Evaluation Platform for General Agents. J. Artif. Intel. Reserved. 47, 253–279 (2013).
Machado, m. Et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Problems Open to General Agents. J. Artif. Intel. Reserved. 61, 523–562 (2018).
Silver, D et al. A general reinforcement learning algorithm that masters through chess, shogi and self-play. Science 362, 1140–1144 (2018).
Schaefer, j. Et al. World Championship Caliber Checkers Program. Artif. Intel. 53, 273–289 (1992).
Brown, n. And sandhome, t for head-up no-limit poker. Superhuman A.I. Science Is 359, 418–424 (2018).
Morawak, m. Et al. Deepstack: Expert-level artificial intelligence in head-up no-limit poker. Science Held at 356, 508–513 (2017).
Vlahavas, i. And Refanidis, i. Planning and scheduling Technical Report (EETN, 2013).
Segler, M.H., Preas, M.A. And Veller, MPD organizes chemical synthesis with deep neural networks and symbolic AI. Nature Held at 555, 604–610 (2018).
Sutton, RS and Barto, AG Learning Reinforcement: An Introduction 2G EDN (MIT Press, 2018).
Disneyroth, m. And Ramsusen, c. Pilco: A model-based and data-efficient approach to policy discovery. In Proc. 28th International Conference on Machine Learning, ICML 2011 465–472 (n Manipress, 2011)
Hess, n. Et al. Learning continuous control policies through stochastic value components. In Nips’15: Proc. 28th International Conference on Neural Information Processing Systems Volume 2 (Aids Cortes, C. et al.) 2944–2952 (MIT Press, 2015).
Levin, s. And Abel, p. Learning neural network policies with guided policy discovery under unknown dynamics. Ed. Neural Inf. Process. Cyst. 27, 1071–1079 (2014).
Google Scholar
Hefner, D. et al. Learning latent dynamics for planning from pixels. Print at https://arxiv.org/abs/1811.04551 (2018).
Kaiser, L. Et al. Learning Model Based Reinforcement for Atri. Print at https://arxiv.org/abs/1903.00374 (2019).
Bujing, L. Et al. Learn and ask for quick productive models to learn reinforcement. Print at https://arxiv.org/abs/1802.03006 (2018).
Espholt, L. Et al. Impala: Scalable delivered Deep-RL with key weight actor-learner architectures. In Proc. International Conference on Machine Learning, ICML Volume 80 (Aids DY, J. and Krause, A.) 1407–1416 (2018).
Kapturoski, S., St Strowski, G., Dubney, W., Kwan, J. And Munos, r. Recurrent Experience Replay in Distributed Reinforcement Learning. In International Conference on Learning Presentations (2019)
Hangan, d. Et al. Shared advance experience replay. In International Conference on Learning Presentations (2018).
Putterman, M.L. Markov decision processes: independent stochastic dynamic programming 1st Eden (John Wiley & Sons, 1994)
In search of the Monte-Carlo tree, Coulomb, R. Efficient selection and backup operators. In International Conference on Computer and Games 72–83 (Springer, 2006)
Wahlstrom, N., Shawn, TB and Disneyroth, from MP pixels to torque: policy education with deep dynamic models. Print at http://arxiv.org/abs/1502.02251 (2015).
Waters, M., Springenberg, JT, Boedecker, J. And Reedmiller, m. Embed to control: Locally linear latent dynamics model for control of raw images. In Nips’15: Proc. 28th International Conference on Neural Information Processing Systems Volume 2 (Aids Cortes, C. et al.) 2746–2754 (MIT Press, 2015).
Yes, d. And Smithuber, j. Recurring world models model policy facilitates evolution. In Nips’18: Proc. 32nd International Conference on Neural Information Processing Systems (Aids Bangio, S. et al.) 2455–2467 (Quran Associates, 2018).
Galleda, C., Kumar, S., Buckman, J., Nachum, O. And Bellemer, M.G. DPMDP: Continuously learning latent space models for presentation learning. Proc. 36th International Conference on Machine Learning: Proc Vol 97. Machine Learning Research (Eds Choudhury, K. and Salkhuttinov, R.) 2170–2179 (PMLR, 2019).
Van Hasselt, H., Hazel, M. And Aslenides, j. When to use parametric models in reinforcement ring? Print at https://arxiv.org/abs/1906.05243 (2019).
Tamar, A., Wu, Y., Thomas, G., Levin, S. And Abel, p. Value iteration network. Ed. Neural Inf. Process. Cyst. 29, 2154–2162 (2016).
Google Scholar
Silver, D et al. Predictor: End-to-end education and planning. In Proc. 34th International Conference on Machine Learning Volume 70 (Aids Precup, D. & Teh, YW) 3191–3199 (JMLR, 2017).
Farahmand, A.M., Barreto, A. And Nikowski, d. Value-aware loss function for learning model based reinforcement. In Proc. 20th International Conference on Artificial Intelligence and Statistics: Proc’s Volume 54. Machine Learning Research (Aids Singh, A. and Xu, J.) 1486–1494 (PMLR, 2017).
Farahmand, a. Value-aware model education. Ed. Neural Inf. Process. Cyst. 31, 9090–9101 (2018).
Google Scholar
Farquhar, G., Rocktashell, T., IGL, M. And Whitson, s. TRQN and ATRC: Specialized tree plan for deep reinforcement education. In International Conference on Learning Presentations (2018).
Oh, J., Singh, S. And Lee, h. Value prediction network. Ed. Neural Inf. Process. Cyst. 30, 6118–6128 (2017).
Google Scholar
Krievsky, A., Sutskiver, I. And H. Hinton, G.E. with deep deep convoluted neural networks. Imagine taxonomy. Ed. Neural Inf. Process. Cyst. 25, 1097–1105 (2012).
Google Scholar
He, K., Zhang, X., Ren, S. And Sun, in the deepest remaining network. Identity mappings. In 14th European Conference on Computer Vision 630–645 (2016).
Hazel, m. Et al. Rainbow: Combining Improvements in Deep Reinforcement Learning. In Thirty-second AAAI AI conference on artificial intelligence (2018).
Schmidt, S., Hazel, M. And Simonian, policy f-policy actor-critic with shared experience replay. Print at https://arxiv.org/abs/1909.11583 (2019).
Ezizadenasheli, k. Et al. Surprising negative results for the discovery of the anti-generative tree. Print at http://arxiv.org/abs/1806.05780 (2018).
Minih, V. Et al. Human-level control through deep nada enforcement education. Nature Is 518, 529–533 (2015).
Open, AI OpenA Fi. OpenAI https://blog.openai.com/openai-five/ (2018).
Vinyls, o. Et al. Grandmaster level in Starcraft II using multi-agent reinforcement learning. Nature Is 575, 350–354 (2019).
Jederberg, m. Et al. Reinforcement education with inadvisable supportive functions. Print at https://arxiv.org/abs/1611.05397 (2016).
Silver, D et al. To master the game of Go without human knowledge. Nature Is 550, 354–359 (2017).
Coccyx, l. And Spesevery, c. Bandit Dabent Monte-Carlo Planning. In European Conference on Machine Learning 282–293 (Springer, 2006)
Rosin, CD multi-armed bandits with episode reference. N. Math. Artif. Intel. 61, 203–230 (2011).
Schadd, MP, Winnings, MH, Van den Herrick, HJ, Chaslot, GM-B. And Uterwijk, JW single-player Monte-Carlo tree search. In International Conference on Computer and Games 1–12 (Springer, 2008)
Pohlen, t. Et al. Observe and see more: Achieve consistent influence on the balcony. Print at https://arxiv.org/abs/1805.11593 (2018).
Skal, T., Kwan, J., Antonoglo, i. And Silver, d. In International Conference on Learning Presentations (2016).
Cloud TPU Google Cloud https://cloud.google.com/tpu/ (2019).
Kalom, R. Full history rating: Bayesian rating system for time-different power players. In International Conference on Computer and Games 113–124 (2008).
Nair, a. Et al. Largely parallel methods for deep nada reinforcement education. Print at https://arxiv.org/abs/1507.04296 (2015).
Lancott, m. Et al. OpenSPL: A framework for learning reinforcement in sports. Print at http://arxiv.org/abs/1908.09453 (2019).