Learn how to master balcony, go, chess and shogi by planning with Model Dell


  • ..

    Campbell, M., Hoen, A.J. Jr. and Hsu, F.H. Deep Blue Artif. Intel. 134, 57–83 (2002).

    Article Google Scholar

  • 2.

    Silver, D et al. Master the game of Goni with Deep Nada Neural Network and Tree Discovery. Nature Is 529, 484–489 (2016).

    ADS CAS Article Article Google

  • 3.

    Bellemer, M.G., Naddaf, Y., Venice, J. And Bling Ling, m. Arcade Learning Environment: An Evaluation Platform for General Agents. J. Artif. Intel. Reserved. 47, 253–279 (2013).

    Article Google Scholar

  • 4

    Machado, m. Et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Problems Open to General Agents. J. Artif. Intel. Reserved. 61, 523–562 (2018).

    MethSkinet Article Google Scholar

  • 5.

    Silver, D et al. A general reinforcement learning algorithm that masters through chess, shogi and self-play. Science 362, 1140–1144 (2018).

    ADS MathSkinet CAS Article Google Scholar

  • 6.

    Schaefer, j. Et al. World Championship Caliber Checkers Program. Artif. Intel. 53, 273–289 (1992).

    Article Google Scholar

  • 7.

    Brown, n. And sandhome, t for head-up no-limit poker. Superhuman A.I. Science Is 359, 418–424 (2018).

    ADS MathSkinet CAS Article Google Scholar

  • 8.

    Morawak, m. Et al. Deepstack: Expert-level artificial intelligence in head-up no-limit poker. Science Held at 356, 508–513 (2017).

    ADS Math Math Thenet Net Google Scholar

  • 9.

    Vlahavas, i. And Refanidis, i. Planning and scheduling Technical Report (EETN, 2013).

  • 10.

    Segler, M.H., Preas, M.A. And Veller, MPD organizes chemical synthesis with deep neural networks and symbolic AI. Nature Held at 555, 604–610 (2018).

    ADS CAS Article Article Google

  • 11.

    Sutton, RS and Barto, AG Learning Reinforcement: An Introduction 2G EDN (MIT Press, 2018).

  • 12.

    Disneyroth, m. And Ramsusen, c. Pilco: A model-based and data-efficient approach to policy discovery. In Proc. 28th International Conference on Machine Learning, ICML 2011 465–472 (n Manipress, 2011)

  • 13.

    Hess, n. Et al. Learning continuous control policies through stochastic value components. In Nips’15: Proc. 28th International Conference on Neural Information Processing Systems Volume 2 (Aids Cortes, C. et al.) 2944–2952 (MIT Press, 2015).

  • 14.

    Levin, s. And Abel, p. Learning neural network policies with guided policy discovery under unknown dynamics. Ed. Neural Inf. Process. Cyst. 27, 1071–1079 (2014).

    Google Scholar

  • 15.

    Hefner, D. et al. Learning latent dynamics for planning from pixels. Print at https://arxiv.org/abs/1811.04551 (2018).

  • 16.

    Kaiser, L. Et al. Learning Model Based Reinforcement for Atri. Print at https://arxiv.org/abs/1903.00374 (2019).

  • 17.

    Bujing, L. Et al. Learn and ask for quick productive models to learn reinforcement. Print at https://arxiv.org/abs/1802.03006 (2018).

  • 18.

    Espholt, L. Et al. Impala: Scalable delivered Deep-RL with key weight actor-learner architectures. In Proc. International Conference on Machine Learning, ICML Volume 80 (Aids DY, J. and Krause, A.) 1407–1416 (2018).

  • 19.

    Kapturoski, S., St Strowski, G., Dubney, W., Kwan, J. And Munos, r. Recurrent Experience Replay in Distributed Reinforcement Learning. In International Conference on Learning Presentations (2019)

  • 20.

    Hangan, d. Et al. Shared advance experience replay. In International Conference on Learning Presentations (2018).

  • 21.

    Putterman, M.L. Markov decision processes: independent stochastic dynamic programming 1st Eden (John Wiley & Sons, 1994)

  • 22.

    In search of the Monte-Carlo tree, Coulomb, R. Efficient selection and backup operators. In International Conference on Computer and Games 72–83 (Springer, 2006)

  • 23.

    Wahlstrom, N., Shawn, TB and Disneyroth, from MP pixels to torque: policy education with deep dynamic models. Print at http://arxiv.org/abs/1502.02251 (2015).

  • 24

    Waters, M., Springenberg, JT, Boedecker, J. And Reedmiller, m. Embed to control: Locally linear latent dynamics model for control of raw images. In Nips’15: Proc. 28th International Conference on Neural Information Processing Systems Volume 2 (Aids Cortes, C. et al.) 2746–2754 (MIT Press, 2015).

  • 25.

    Yes, d. And Smithuber, j. Recurring world models model policy facilitates evolution. In Nips’18: Proc. 32nd International Conference on Neural Information Processing Systems (Aids Bangio, S. et al.) 2455–2467 (Quran Associates, 2018).

  • 26.

    Galleda, C., Kumar, S., Buckman, J., Nachum, O. And Bellemer, M.G. DPMDP: Continuously learning latent space models for presentation learning. Proc. 36th International Conference on Machine Learning: Proc Vol 97. Machine Learning Research (Eds Choudhury, K. and Salkhuttinov, R.) 2170–2179 (PMLR, 2019).

  • 27.

    Van Hasselt, H., Hazel, M. And Aslenides, j. When to use parametric models in reinforcement ring? Print at https://arxiv.org/abs/1906.05243 (2019).

  • 28.

    Tamar, A., Wu, Y., Thomas, G., Levin, S. And Abel, p. Value iteration network. Ed. Neural Inf. Process. Cyst. 29, 2154–2162 (2016).

    Google Scholar

  • 29.

    Silver, D et al. Predictor: End-to-end education and planning. In Proc. 34th International Conference on Machine Learning Volume 70 (Aids Precup, D. & Teh, YW) 3191–3199 (JMLR, 2017).

  • 30.

    Farahmand, A.M., Barreto, A. And Nikowski, d. Value-aware loss function for learning model based reinforcement. In Proc. 20th International Conference on Artificial Intelligence and Statistics: Proc’s Volume 54. Machine Learning Research (Aids Singh, A. and Xu, J.) 1486–1494 (PMLR, 2017).

  • 31.

    Farahmand, a. Value-aware model education. Ed. Neural Inf. Process. Cyst. 31, 9090–9101 (2018).

    Google Scholar

  • 32.

    Farquhar, G., Rocktashell, T., IGL, M. And Whitson, s. TRQN and ATRC: Specialized tree plan for deep reinforcement education. In International Conference on Learning Presentations (2018).

  • 33.

    Oh, J., Singh, S. And Lee, h. Value prediction network. Ed. Neural Inf. Process. Cyst. 30, 6118–6128 (2017).

    Google Scholar

  • 34.

    Krievsky, A., Sutskiver, I. And H. Hinton, G.E. with deep deep convoluted neural networks. Imagine taxonomy. Ed. Neural Inf. Process. Cyst. 25, 1097–1105 (2012).

    Google Scholar

  • 35.

    He, K., Zhang, X., Ren, S. And Sun, in the deepest remaining network. Identity mappings. In 14th European Conference on Computer Vision 630–645 (2016).

  • 36.

    Hazel, m. Et al. Rainbow: Combining Improvements in Deep Reinforcement Learning. In Thirty-second AAAI AI conference on artificial intelligence (2018).

  • 37.

    Schmidt, S., Hazel, M. And Simonian, policy f-policy actor-critic with shared experience replay. Print at https://arxiv.org/abs/1909.11583 (2019).

  • 38.

    Ezizadenasheli, k. Et al. Surprising negative results for the discovery of the anti-generative tree. Print at http://arxiv.org/abs/1806.05780 (2018).

  • 39.

    Minih, V. Et al. Human-level control through deep nada enforcement education. Nature Is 518, 529–533 (2015).

    ADS CAS Article Article Google

  • 40

    Open, AI OpenA Fi. OpenAI https://blog.openai.com/openai-five/ (2018).

  • 41.

    Vinyls, o. Et al. Grandmaster level in Starcraft II using multi-agent reinforcement learning. Nature Is 575, 350–354 (2019).

    ADS CAS Article Article Google

  • 42.

    Jederberg, m. Et al. Reinforcement education with inadvisable supportive functions. Print at https://arxiv.org/abs/1611.05397 (2016).

  • 43.

    Silver, D et al. To master the game of Go without human knowledge. Nature Is 550, 354–359 (2017).

    ADS CAS Article Article Google

  • 44.

    Coccyx, l. And Spesevery, c. Bandit Dabent Monte-Carlo Planning. In European Conference on Machine Learning 282–293 (Springer, 2006)

  • 45.

    Rosin, CD multi-armed bandits with episode reference. N. Math. Artif. Intel. 61, 203–230 (2011).

    MethSkinet Article Google Scholar

  • 46.

    Schadd, MP, Winnings, MH, Van den Herrick, HJ, Chaslot, GM-B. And Uterwijk, JW single-player Monte-Carlo tree search. In International Conference on Computer and Games 1–12 (Springer, 2008)

  • 47.

    Pohlen, t. Et al. Observe and see more: Achieve consistent influence on the balcony. Print at https://arxiv.org/abs/1805.11593 (2018).

  • 48.

    Skal, T., Kwan, J., Antonoglo, i. And Silver, d. In International Conference on Learning Presentations (2016).

  • 49.

    Cloud TPU Google Cloud https://cloud.google.com/tpu/ (2019).

  • 50

    Kalom, R. Full history rating: Bayesian rating system for time-different power players. In International Conference on Computer and Games 113–124 (2008).

  • 51.

    Nair, a. Et al. Largely parallel methods for deep nada reinforcement education. Print at https://arxiv.org/abs/1507.04296 (2015).

  • 52.

    Lancott, m. Et al. OpenSPL: A framework for learning reinforcement in sports. Print at http://arxiv.org/abs/1908.09453 (2019).