A Greek Voice Recognition Interface for ROV Applications, Using Machine Learning Technologies and the CMU Sphinx Platform

WSEAS Transactions on Systems and Control

Print ISSN: 1991-8763
E-ISSN: 2224-2856

Volume 13, 2018

Notice: As of 2014 and for the forthcoming years, the publication frequency/periodicity of WSEAS Journals is adapted to the 'continuously updated' model. What this means is that instead of being separated into issues, new papers will be added on a continuous basis, allowing a more regular flow and shorter publication times. The papers will appear in reverse order, therefore the most recent one will be on top.

Volume 13, 2018

A Greek Voice Recognition Interface for ROV Applications, Using Machine Learning Technologies and the CMU Sphinx Platform

AUTHORS: Fotios K. Pantazoglou, Georgios P. Kladis, Nikolaos K. Papadakis

Download as PDF

ABSTRACT: Finding new technical solutions to command and control smart and high technology devices has become a necessity in our days. This is due to the fact that control of these devices, in a manual manner, may often become cumbersome for the operator especially when he/she is involved with numerous tasks. One way to overcome this, it is common practice the use of Automatic Speech Recognition (ASR) procedures, and this is the main topic of this article. In this article we present the implementation of a Greek CMU Sphinx model that can be used in Remotely Operated Vehicles (ROV) operations and applications. In particular, this work is focused in the development and training of the CMU Sphinx platform for the Greek language using well established machine learning tools and technologies .The generic Greek model and the Greek model for ROV applications are freely available in international repository via (https://gitlab.sse.gr/fpantazoglou/omilia and https://goo.gl/9v3QqG )

KEYWORDS: Human-machine interface, Machine learning, CMU sphinx, Pocketsphinx, Greek language, Automatic Speech recognition, Hidden Markov Models, Remotely Operated vehicles,

REFERENCES:

[1] T. B. Martin, H. J. Zadell, E. F. Grunza, M. B. Hmcher, and D. R. Reddy, Speech Recognition by Machine: A Review, Proc. IEEE Conf. Rec. Soc. Amcr. ZYans. Comput. Tech. Rep vol. 6443, no. 2, 1974, pp. 541–546.

[2] P. Lamere et al., The CMU SPHINX-4 speech recognition system, IEEE Intl. Conf. Acoust. Speech Signal Process. (ICASSP 2003), Hong Kong vol. 1, 2003, pp. 2-5.

[3] D. Huggins-Daines, M. Kumar, A. Chan, A. W. Black, M. Ravishankar, and A. I. Rudnicky, Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices Proc. IEEE Int. Conf. Acoust. Speech Signal Process. vol. 1, 2006, pp. 185-188.

[4] CMU Sphinx Acoustic and Language Models, https://bit.ly/2SzLkYd, 2018.

[5] CMU Speech Recognition Toolkit, https://bit.ly/2QC00bL, 2018.

[6] E. G. Tsardoulias, A. L. Symeonidis, and P. A. Mitkas, An automatic speech detection architecture for social robot oral interaction, Proc. Audio Most. 2015 Interact. With Sound, 2015.

[7] F. K. Pantazoglou, N. K. Papadakis, and G. P. Kladis, Implementation of the generic Greek Model for CMU Sphinx speech recognition toolkit, eRA-12 International Scientific Conference, 2017.

[8] M. Anusuya and S. Katti, Speech recognition by machine: A review, Int. J. Comput. Sci. Inf. Secur., vol. 6, no. 3, 2009, pp. 181-205 .

[9] D. Yu, Automatic Speech Recognition: A Deep Learning Approach, Springer, 2014.

[10] D. Stallard et al., The BBN TransTalk Speechto-Speech Translation System, English, no. June, 2009.

[11] M. L. Seltzer, Y. C. Ju, I. Tashev, Y. Y. Wang, and D. Yu, In-car media search, IEEE Signal Process. Mag., vol. 28, no. 4, 2011, pp. 50-60.

[12] A. Samah and A. A. Osman, Controlling Home Devices for Handicapped People via Voice Command Techniques, 2015.

[13] K. Geetha and E. Chandra, Automatic Speech Recognition - An Overview, Int. J. Eng. Comput. Sci., vol. 2, no. 3, 2013, pp. 633-639.

[14] S. K. Gaikwad, B. W. Gawali, and P. Yannawar, A Review on Speech Recognition Technique, Int. J. Comput. Appl., vol. 10, no. 3, 2010, pp. 16-24.

[15] Hemdal, J. F., G. W. Hughes, A feature based computer recognition program for the modeling of vowel perception, Models for the Perception of Speech and Visual Form, MIT Press(Models for the Perception of Speech and Visual Form), 1967.

[16] R.K.Moore, Twenty things we still dont know about speech, Workshop on Progress and Prospects of speech Research and Technology, 1994.

[17] G. Hinton et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition, IEEE Signal Process. Mag., no. November, 2012, pp. 82-97.

[18] Baum and L., An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process, Inequalities, vol. 3, 1972, pp. 1-8.

[19] D. Jurafsky and J. Martin, Hidden Markov Models, Speech Lang. Process., no. Chapter 20, 2017, p. 21.

[20] L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE, vol. 77, no. 2, 1989, pp. 257- 286.

[21] Senin, P., Dynamic Time Warping Algorithm Review, Science, December 2007, pp. 1-23

[22] D. H. Daines, An Architecture for Scalable , Universal Speech Recognition Thesis Committee: c 2011 David Huggins Daines, 2011, p. 131.

[23] A. J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Trans. Inf. Theory, vol. 13, no. 2, 1967, pp. 260-269.

[24] R. Singh, M. K. Warmuth, B. Raj, and P. Lamere, Classification with Free Energy at Raised Temperatures, EuroSpeech, no. 2, 2003, pp. 1773-1776.

[25] W. Walker et al., Sphinx-4: A Flexible Open Source Framework for Speech Recognition, Smli, no. TR-2004-139, 2004, pp. 1-9.

[26] E. W. Dijkstra, A note on two problems in connexion with graphs, Numer. Math., vol. 1, no. 1, 1959, pp. 269-271.

[27] A. Protopapas, M. Tzakosta, A. Chalamandaris, and P. Tsiakoulis, IPLR: an online resource for Greek word-level and sublexical information, Lang. Resour. Eval., vol. 46, no. 3, sep 2012, pp. 449-459.

[28] G.Divya Priya , Mr.I.Harish, Raspberry PI Based Underwater Vehicle for Monitoring Aquatic Ecosystem, International Journal of Engineering Trends and Applications (IJETA), vol. 6, iss. 2, Mar-Apr 2015.

WSEAS Transactions on Systems and Control, ISSN / E-ISSN: 1991-8763 / 2224-2856, Volume 13, 2018, Art. #63, pp. 550-560

Copyright Β© 2018 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution License 4.0

Quick Links

Login

Other Articles by Author(s)

Author(s) and WSEAS

WSEAS Transactions on Systems and Control

Bulletin Board