A Modular Deep-learning Environment for Rogue

WSEAS Transactions on Systems and Control

Print ISSN: 1991-8763
E-ISSN: 2224-2856

Volume 12, 2017

Notice: As of 2014 and for the forthcoming years, the publication frequency/periodicity of WSEAS Journals is adapted to the 'continuously updated' model. What this means is that instead of being separated into issues, new papers will be added on a continuous basis, allowing a more regular flow and shorter publication times. The papers will appear in reverse order, therefore the most recent one will be on top.

Volume 12, 2017

A Modular Deep-learning Environment for Rogue

AUTHORS: Andrea Asperti, Carlo De Pieri, Mattia Maldini, Gianmaria Pedrini, Francesco Sovrano

Download as PDF

ABSTRACT: Rogue is a famous dungeon-crawling video-game of the 80ies, the ancestor of its gender. Due to their nature, and in particular to the necessity to explore partially observable and always different labyrinths (no level replay), roguelike games are a very natural and challenging task for reinforcement learning and Q-learning, requiring the acquisition of complex, non-reactive behaviours involving memory and planning. In this article we present Rogueinabox: an environment allowing a simple interaction with the Rogue game, especially designed for the definition of automatic agents and their training via deep-learning techniques. We also show a few initial examples of agents, discuss their architecture and illustrate their behaviour.

KEYWORDS: Machine Learning, Deep Learning, Reinforcement Learning, QLearning, Hierarchical Reinforcement Learning, Planning, Imagination augmentation, Neural Network, Artificial Intelligence, Rogue, Labyrinth, Dungeon, Game, Situations, Asynchronous Actor-Critic Agents, Auxiliary tasks, Intrinsic reward, Sparse reward

REFERENCES:

[1] B. Edwards, “The ten greatest pc games ever,” http://www.pcworld.com/article/158850/ best pc games.html, 2009.

[2] M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, and K. Kavukcuoglu, “Reinforcement learning with unsupervised auxiliary tasks,” CoRR, vol. abs/1611.05397, 2016.

[Online]. Available: http://arxiv.org/abs/1611.05397

[3] N. Dilokthanakul, C. Kaplanis, N. Pawlowski, and M. Shanahan, “Feature control as intrinsic motivation for hierarchical reinforcement learning,” CoRR, vol. abs/1705.06769, 2017.

[Online]. Available: http://arxiv.org/abs/1705.06769

[4] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” J. Artif. Intell. Res. (JAIR), vol. 47, pp. 253–279, 2013.

[Online]. Available: http://dx.doi.org/10. 1613/jair.3912

[5] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai universe,” https://github.com/openai/universe, 2016.

[6] ——, “Openai gym,” CoRR, vol. abs/1606.01540, 2016.

[Online]. Available: http://arxiv.org/abs/ 1606.01540

[7] M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Jaskowski, “Vizdoom: A doom-based AI research platform for visual reinforcement learning,” CoRR, vol. abs/1605.02097, 2016.

[Online]. Available: http://arxiv.org/abs/1605. 02097

[8] V. Cerny and F. Dechterenko, “Rogue-like games as a playground for artificial intelligence– evolutionary approach,” in International Conference on Entertainment Computing. Springer, 2015, pp. 261–271.

[9] krajj7, “Bothack,” https://github.com/krajj7/ BotHack, 2015.

[10] A. Asperti, C. D. Pieri, and G. Pedrini, “Rogueinabox: an environment for roguelike learning,” International Jounral of Computers, vol. 2, pp. 146–154, 2017.

[Online]. Available: http://www.iaras.org/iaras/home/cijc/ rogueinabox-an-environment-for-roguelike-learning

[11] R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1st ed. Cambridge, MA, USA: MIT Press, 1998.

[12] M. Wiering and J. Schmidhuber, “Solving pomdps with levin search and EIRA,” in Machine Learning, Proceedings of the Thirteenth International Conference (ICML ’96), Bari, Italy, July 3-6, 1996, L. Saitta, Ed. Morgan Kaufmann, 1996, pp. 534–542.

[13] D. Wierstra, A. Forster, J. Peters, and J. Schmid- ¨ huber, “Solving deep memory pomdps with recurrent policy gradients,” in Artificial Neural Networks - ICANN 2007, 17th International Conference, Porto, Portugal, September 9-13, 2007, Proceedings, Part I, ser. Lecture Notes in Computer Science, J. M. de Sa, L. A. Alexandre, W. Duch, ´ and D. P. Mandic, Eds., vol. 4668. Springer, 2007, pp. 697–706.

[14] A. Tamar, S. Levine, and P. Abbeel, “Value iteration networks,” CoRR, vol. abs/1602.02867, 2016.

[Online]. Available: http://arxiv.org/abs/ 1602.02867

[15] A. S. Klyubin, D. Polani, and C. L. Nehaniv, “Empowerment: a universal agent-centric measure of control,” in Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005, 2-4 September 2005, Edinburgh, UK, 2005, pp. 128–135.

[Online]. Available: https://doi.org/10.1109/CEC.2005.1554676

[16] T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. B. Tenenbaum, “Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation,” CoRR, vol. abs/1604.06057, 2016.

[Online]. Available: http://arxiv.org/abs/1604.06057

[17] A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, and K. Kavukcuoglu, “Feudal networks for hierarchical reinforcement learning,” CoRR, vol. abs/1703.01161, 2017.

[Online]. Available: http://arxiv.org/abs/1703.01161

[18] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[Online]. Available: https://doi.org/10.1162/neco.1997.9.8.1735

[19] F. A. Gers, J. Schmidhuber, and F. A. Cummins, “Learning to forget: Continual prediction with LSTM,” Neural Computation, vol. 12, no. 10, pp. 2451–2471, 2000.

[Online]. Available: https: //doi.org/10.1162/089976600300015015

[20] J. Chung, C¸ . Gulc¸ehre, K. Cho, and Y. Bengio, ¨ “Gated feedback recurrent neural networks,” in Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, ser. JMLR Workshop and Conference Proceedings, F. R. Bach and D. M. Blei, Eds., vol. 37. JMLR.org, 2015, pp. 2067–2075.

[Online]. Available: http://jmlr.org/ proceedings/papers/v37/chung15.html

[21] M. J. Hausknecht and P. Stone, “Deep recurrent q-learning for partially observable mdps,” CoRR, vol. abs/1507.06527, 2015.

[Online]. Available: http://arxiv.org/abs/1507.06527

[22] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.

[Online]. Available: https://doi.org/10.1038/nature14236

[23] G. Lample and D. S. Chaplot, “Playing FPS games with deep reinforcement learning,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., S. P. Singh and S. Markovitch, Eds. AAAI Press, 2017, pp. 2140–2146.

[Online]. Available: http://aaai.org/ ocs/index.php/AAAI/AAAI17/paper/view/14456

[24] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015, pp. 2017–2025.

[Online]. Available: http://papers.nips.cc/paper/ 5854-spatial-transformer-networks

[25] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 640–651, 2017.

[Online]. Available: https://doi.org/10.1109/TPAMI.2016.2572683

[26] M. McPartland and M. Gallagher, “Creating a multi-purpose first person shooter bot with reinforcement learning,” in Proceedings of the 2008 IEEE Symposium on Computational Intelligence and Games, CIG 2009, Perth, Australia, 15-18 December, 2008, P. Hingston and L. Barone, Eds. IEEE, 2008, pp. 143–150.

[Online]. Available: https://doi.org/10.1109/CIG.2008.5035633

[27] B. Tastan, Y. Chang, and G. Sukthankar, “Learning to intercept opponents in first person shooter games,” in 2012 IEEE Conference on Computational Intelligence and Games, CIG 2012, Granada, Spain, September 11-14, 2012. IEEE, 2012, pp. 100–107.

[Online]. Available: https://doi.org/10.1109/CIG.2012.6374144

[28] F. Chollet et al., “Keras,” https://github.com/ fchollet/keras, 2015.

[29] M. L. Mauldin, G. Jacobson, A. Appel, and L. Hamey, “Rog-o-matic: A belligerent expert system,” in Fifth Biennial Conference of the Canadian Society for Computational Studies of Intelligence, London Ontario, May 16, 1984., 1984.

[30] B. Harrison. Angband borg.

[Online]. Available: http://www.thangorodrim.net/borg.html

[31] M. J. Hausknecht and P. Stone, “The impact of determinism on learning atari 2600 games,” in Learning for General Competency in Video Games, Papers from the 2015 AAAI Workshop, Austin, Texas, USA, January 26, 2015., ser. AAAI Workshops, M. Bowling, M. G. Bellemare, E. Talvitie, J. Veness, and M. C. Machado, Eds., vol. WS-15-10. AAAI Press, 2015.

[Online]. Available: http://aaai.org/ocs/index. php/WS/AAAIW15/paper/view/9564

[32] Theano Development Team, “Theano: A Python framework for fast computation of mathematical expressions,” arXiv e-prints, vol. abs/1605.02688, May 2016.

[Online]. Available: http://arxiv.org/ abs/1605.02688

[33] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, ´ C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, ´ M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org.

[Online]. Available: http://tensorflow.org/

[34] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” CoRR, vol. abs/1602.01783, 2016.

[Online]. Available: http://arxiv.org/abs/1602.01783

[35] T. Weber, S. Racaniere, D. P. Reichert, L. Buesing, ` A. Guez, D. J. Rezende, A. P. Badia, O. Vinyals, N. Heess, Y. Li, R. Pascanu, P. Battaglia, D. Silver, and D. Wierstra, “Imagination-augmented agents for deep reinforcement learning,” CoRR, vol. abs/1707.06203, 2017.

[Online]. Available: http://arxiv.org/abs/1707.06203

[36] R. Pascanu, Y. Li, O. Vinyals, N. Heess, L. Buesing, S. Racaniere, D. P. Reichert, ` T. Weber, D. Wierstra, and P. Battaglia, “Learning model-based planning from scratch,” CoRR, vol. abs/1707.06170, 2017.

[Online]. Available: http://arxiv.org/abs/1707.06170

WSEAS Transactions on Systems and Control, ISSN / E-ISSN: 1991-8763 / 2224-2856, Volume 12, 2017, Art. #39, pp. 362-373

Copyright © 2017 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution License 4.0

Quick Links

Login

Other Articles by Author(s)

Author(s) and WSEAS

WSEAS Transactions on Systems and Control

Bulletin Board