A Corpus for Investigating the Multimodal Nature of Multi-Speaker Spontaneous Conversations – EVA Corpus

WSEAS Transactions on Information Science and Applications

Print ISSN: 1790-0832
E-ISSN: 2224-3402

Volume 14, 2017

Notice: As of 2014 and for the forthcoming years, the publication frequency/periodicity of WSEAS Journals is adapted to the 'continuously updated' model. What this means is that instead of being separated into issues, new papers will be added on a continuous basis, allowing a more regular flow and shorter publication times. The papers will appear in reverse order, therefore the most recent one will be on top.

A Corpus for Investigating the Multimodal Nature of Multi-Speaker Spontaneous Conversations – EVA Corpus

AUTHORS: Izidor Mlakar, Zdravko Kačič, Matej Rojc

Download as PDF

ABSTRACT: Multimodality and multimodal communication is a rapidly evolving research field addressed by scientists working in various perspectives, from psycho-sociological fields, anthropology and linguistics, to communication and multimodal interfaces, companions, smart homes and ambient assisted living etc. Multimodality in human-machine interaction is not just an add-on or a style of information representation. It goes well beyond semantics and semiotic artefacts. It can significantly contribute to representation of the information as well as in interpersonal and textual function of communication. The study in this paper is a part of an ongoing effort in order to empirically investigate in detail relations between verbal and co-verbal behavior expressed during multi-speaker highly spontaneous and live conversations. It utilizes a highly multimodal approach for investigating into relations between the traditional linguistic (such as: paragraphs, sentences, sentence types, words, POS tags etc.) and prosodic features (such as: phrase breaks, prominence, durations, and pitch), and paralinguistic features traditionally interpreted as non-verbal communication or co-verbal behavior (such as: dialog role, semiotic classification of behavior, emotions, facial expressions, head movement, gaze, and hand gestures). The main motivation for this study is to be able to understand especially the informal nature of human-human communication, and to create co-verbal resources for automatic synthesis of highly natural co-verbal behavior from un-annotated text and expressed through embodied conversational agents. The EVA corpus designed by a novel EVA annotation scheme represents a rich empirical resource for performing such studies in conversational phenomena that manifest themselves in highly spontaneous face-to-face conversations. A preliminary analysis regarding emotions within conversations has been also conducted and presented in the paper.

KEYWORDS: conversation analysis, informal conversation, emotions, multiparty dialog, language and social interaction, multimodality, pragmatics, verbal and non-verbal interaction, co-verbal behavior

REFERENCES:

[1] Mondada, L. (2017). New Challenges for Conversation Analysis: The Situated and Systematic Organization of Social Interaction. Langage et société, (2), 181-197.

[2] Bonsignori, V., & Camiciottoli, B. C. (Eds.). (2017). Multimodality Across Communicative Settings, Discourse Domains and Genres. Cambridge Scholars Publishing.

[3] Velentzas, J. O. H. N., & Broni, D. G. (2014). Communication cycle: Definition, process, models and examples. In Proceeding of the 5th International Conference on Finance, Accounting and Law (ICFA‟ 14) (Vol. 17, pp. 117-131).

[4] Kleckova, J. A. N. A., & Mahdian, B. (2004). Nonverbal Communication in Spontaneous Speech Recognition. WSEAS Transactions on Electronics, 1(3), 531-536.

[5] Esposito, A., Vassallo, J., Esposito, A. M., & Bourbakis, N. (2015, November). On the Amount of Semantic Information Conveyed by Gestures. In Tools with Artificial Intelligence (ICTAI), 2015 IEEE 27th International Conference on (pp. 660-667). IEEE.

[6] Dafinoiu, I. & Rotaru T.S. (2011). The use of suggestive influences in promoting environmental behaviours. In Proceedings of the 7th WSEAS / IASME International Conferrence on Educational Technologies (EDUTE 11). Ias.

[7] Chen, C. L., & Herbst, P. (2013). The interplay among gestures, discourse, and diagrams in students’ geometrical reasoning. Educational Studies in Mathematics, 83(2), 285-307.

[8] Colletta, J. M., Guidetti, M., Capirci, O., Cristilli, C., Demir, O. E., Kunene-Nicolas, R. N., & Levine, S. (2015). Effects of age and language on co-speech gesture production: an investigation of French, American, and Italian children's narratives. Journal of child language, 42(1), 122-145.

[9] Allwood, J. (2013). A framework for studying human multimodal communication. Coverbal Synchrony in Human-Machine Interaction, 17.

[10] McNeill, D. (2015). Why we gesture: The surprising role of hand movements in communication. Cambridge University Press.

[11] Duncan, S. D., Cassell, J., & Levy, E. T. (Eds.). (2007). Gesture and the dynamic dimension of language: Essays in honor of David McNeill (Vol. 1). John Benjamins Publishing.

[12] Bozkurt, E., Yemez, Y., & Erzin, E. (2016). Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures. Speech Communication, 85, 29-42.

[13] Poggi, I. (2007). Hands, mind, face and body: A goal and belief view of multimodal communication. Berlin: Weidler.

[14] Holler, J., & Bavelas, J. (2017). Multi-modal communication of common ground. Why Gesture?: How the hands function in speaking, thinking and communicating, 7, 213.

[15] Salama, M. A. R. I. A., & Shawish, A. H. M. E. D. (2013). A Comprehensive Mobile-Based Companion for Diabetes Management. In 7th WSEAS European Computing Conference, Dubrovnik.

[16] Tsiourti, C., Moussa, M. B., Quintas, J., Loke, B., Jochem, I., Lopes, J. A., & Konstantas, D. (2016, September). A virtual assistive companion for older adults: design implications for a real-world application. In Proceedings of SAI Intelligent Systems Conference (pp. 1014- 1033). Springer, Cham.

[17] Bergmann, K., Kopp, S. (2010).Systematicity and Idiosyncrasy in Iconic Gesture Use: Empirical Analysis and Computational Modeling. In: Kopp, S., Wachsmuth, I. (eds.) GW 2009. LNCS, vol. 5934, pp. 182-194. Springer, Heidelberg (2010).

[18] Wagner, P., Malisz, Z., & Kopp, S. (2014). Gesture and speech in interaction: An overview. Speech Communication, 57, 209-232.

[19] Jokinen, K., & Pelachaud, C., (2013). From Annotation to Multimodal Behavior. In Coverbal Synchrony in Human-Machine Interaction, Rojc, M. & Campbell, N., eds., Crc Press, 2013, ISBN: 978-1-4665-9825-6.

[20] Rojc, M., Mlakar, I., & Kačič, Z. (2017). The TTS-driven affective embodied conversational agent EVA, based on a novel conversationalbehavior generation algorithm. Engineering Applications of Artificial Intelligence, 57, 80- 104.

[21] Yumak, Z., & Magnenat-Thalmann, N. (2016). Multimodal and multi-party social interactions. In Context Aware Human-Robot and HumanAgent Interaction (pp. 275-298). Springer International Publishing.

[22] Li, Y., Tao, J., Chao, L., Bao, W., & Liu, Y. (2016). CHEAVD: a Chinese natural emotional audio-visual database. Journal of Ambient Intelligence and Humanized Computing, 1-12.

[23] Martin, J. C., Caridakis, G., Devillers, L., Karpouzis, K., & Abrilian, S. (2009). Manual annotation and automatic image processing of multimodal emotional behaviors: validating the annotation of TV interviews. Personal and Ubiquitous Computing, 13(1), 69-76.

[24] Koutsombogera, M., Touribaba, L., & Papageorgiou, H. (2008, May). Multimodality in conversation analysis: a case of Greek TV interviews. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008) Workshop on Multimodal Coorpora from Models of Natural Interaction to Systems and Applications (pp. 12-15).

[25] Caridakis, G., Wagner, J., Raouzaiou, A., Lingenfelser, F., Karpouzis, K., & Andre, E. (2013). A cross-cultural, multimodal, affective corpus for gesture expressivity analysis. Journal on Multimodal User Interfaces, 7(1-2), 121-134.

[26] Paggio, P., & Navarretta, C. (2016). The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations. Language Resources and Evaluation, 1-32.

[27] Lin, Y. L. (2017). Co-occurrence of speech and gestures: A multimodal corpus linguistic approach to intercultural interaction. Journal of Pragmatics, 117, 155-167.

[28] Zhang, Z., Girard, J. M., Wu, Y., Zhang, X., Liu, P., Ciftci, U., & Cohn, J. F. (2016). Multimodal spontaneous emotion corpus for human behavior analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3438-3446).

[29] Fitzpatrick, E. (Ed.). (2007). Corpus linguistics beyond the word: corpus research from phrase to discourse (Vol. 60).

[30] Liu, I. T., & Sun, C. S. (2007). Association between emotional reaction and visual symbols. In 3rd WSEAS/IASME international conference on EDUCATIONAL TECHNOLOGIES (EDUTE'07).

[31] Ridderinkhof, K. R. (2017). Emotion in action: A predictive processing perspective and theoretical synthesis. Emotion Review, 1754073916661765.

[32] Zhu, L. (2016). Language, emotion and metapragmatics: A theory based on typological evidence. International Journal of Society, Culture & Language, 4(2), 119-134.

[33] Mlakar, I., Kačič, Z., & Rojc, M. (2012). Formoriented annotation for building a functionally independent dictionary of synthetic movement. Cognitive Behavioural Systems, 251-265.

[34] Rojc, M., Mlakar, I. (2016). An expressive conversational-behavior generation model for advanced interaction within multimodal user interfaces, (Computer Science, Technology and Applications). New York: Nova Science Publishers, Inc., cop. XIV, p. 234 str. ISBN 978-1-63482-955-7. ISBN 978-1-63484-084-2.

[35] Walsh, M. (2010). Multimodal literacy: What does it mean for classroom practice? Australian Journal of Language and Literacy, 33(3), 211.

[36] Sloetjes, H., & Wittenburg, P., (2008). Annotation by category – ELAN and ISO DCR. In: Proceedings othe 6th International Conference on Language Resources and Evaluation (LREC 2008).

[37] Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge University Press.

[38] Allwood, J., Nivre, J., & Ahlsen, E. (1993). On the semantics and pragmatics of linguistic feedback. Journal of Semantics, 9(1), 1–26.

[39] Keltner, D., & Cordaro, D. T. (2017). Understanding Multimodal Emotional Expressions. The science of facial expression, 1798.

[40] Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6 (3/4), 169– 200.

[41] Allwood, J., Lanzini, S., & Ahlse ́ n , E. (2014). Contributions of different modalities to the attribution of affective-epistemic states. In P. Paggio & B. N. Wessel-Tolvig (Eds.), Proceedings from the 1st European symposium on multimodal communication University of Malta (pp. 1–6).

[42] Laycraft, K. C. (2014). Creativity As An Order Through Emotions: A Study of Creative Adolescents and Young Adults. BookBaby.

[43] Seligman, M. E., & Csikszentmihalyi, M. (2014). Positive psychology: An introduction. In Flow and the foundations of positive psychology (pp. 279-298). Springer Netherlands.

WSEAS Transactions on Information Science and Applications, ISSN / E-ISSN: 1790-0832 / 2224-3402, Volume 14, 2017, Art. #23, pp. 213-226

Copyright © 2017 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution License 4.0

Quick Links

Login

Other Articles by Author(s)

Author(s) and WSEAS

WSEAS Transactions on Information Science and Applications

Bulletin Board