Multimodal research projects and useful links:

Multimodal corpora (datasets):

Corpus name Details Tools used 
IFA Dialog Video corpus ​(van Son et al., 2008)​ Twenty face-to-face dialogue conversations of circa 15 minutes each from the Spoken Dutch Corpus (CGN), with functional annotation of dialogue utterances and annotated gaze direction.  Praat TextGrid for transcription, Praat TextGrid and ELAN for annotation. 
Eye-tracking in Multimodal Interaction Corpus  ​​(Holler & Kendrick, 2015)​   English language corpus comprising 20 groups of participants engaging in casual conversation (10 triadic and 10 dyadic). Conversations are circa 20 minutes in length, recorded in laboratory settings and included the use of eye tracking equipment. Question-response sequences are annotated, along with gaze shifts, in-breaths, head movements and other non-verbal phenomena.  Manually annotated by an experienced conversation analyst and imported into ELAN. 
Corpus d’interactions dialogales  ​​(Blache et al., 2017)​ An eight-hour French language dialogue corpus containing prosodic, gestural, syntactic and interpausal unit annotation.  Transcription and annotation using ANVIL. 
BAS SmartWeb Video corpus  ​​(Schiel, 2009)​   A 36-hour German language corpus containing user queries to a naturally spoken Web interface. This corpus is annotated for: orthography, phonology, speaker turn, noise, prosody and gaze direction. Transcriptions and annotations provided in XML 
Natural Media Motion-Capture Corpus   ​​(Schueller et al., 2017)​ A three-hour German language corpus comprising data from 18 participants carrying out a pre-defined task. This corpus includes annotations for gesture types and meta-information about encoding.  Data recorded using the VICON motion capture system. Annotated using ELAN (720 annotation files – .eaf). 
BAS SmartKom Public Video and Gesture corpus  ​​(Schiel et al., 2002)​ A 15-hour German language corpus containing recordings of 86 actors using the SmartKom system (i.e. a system similar to a public phonebooth). Participants were asked to solve tasks prescribed by an instructor. Annotations include: orthography, phonology, speaker turn, noise, prosody, emotion, hand gesture, facial expression.  No details given annotations saved in the bespoke BAS Partitur Format  file (BPF).   
Bielefeld Speech and Gesture Alignment Corpus  ​​(Lücking et al., 2013)​ A German and English corpus containing 25 dialogues of 50 interlocutors engaging in a spatial communication task. This corpus contains an alignment of speech and gestures.  Praat for speech transcription and ELAN for gesture annotation.  
Multimodality and multiparty corpus of text comprehension interactions   ​​(Koutsombogera et al., 2016)​ This corpus comprises reading comprehension exercises in school settings (between a teacher and student). The data includes orthographic transcription and annotations for gaze, head, eye and lip movements.  ELAN 
Hungarian Multimodal Corpus  ​​(Szekrényes, 2014)​ Video and audio recordings of a simulated job interview and a guided dialogue about personal topics. Includes 121 university students in 50 hours of recording. Non-verbal and verbal elements of communication are annotated in this corpus.  ELAN 
PoliModal Corpus  ​​(Trotta et al., 2020)​ A corpus comprising 54 transcripts of 56 TV face-to-face TV interviews (14 hours, circa 100,870 tokens) from an Italian political talk show. This corpus includes annotations for utterance phenomena and facial, hand and body posture annotations.  Annotations undertaken using XML, following the TEI (Text Encoding Initiative) guidelines, facilitate by the use of ANVIL.  
Multimodal corpus EVA 1.0  ​​(Mlakar et al., 2019)​ This corpus includes a single 57 minute recording of multi-party spontaneous discourse from an evening talk show. The corpus contains annotations for morphosyntax and non-verbal and verbal elements of communication.  Transcriptions annotated using ELAN. 
Video-linked Thai/Swedish child data corpus  ​​(Zlatev et al., 2006)​ This is a Swedish and Thai corpus containing 60 transcripts from interactions in everyday contexts between six children and their caregivers, recorded longitudinally. This corpus contains video-transcription alignment, word segmentation and phonetic transcription.  CLAN 


Blache, P., Bertrand, R., Ferré, G., Pallaud, B., Prévot, L., & Rauzy, S. (2017). The Corpus of Interactional Data: A Large Multimodal Annotated Resource. In Handbook of Linguistic Annotation (pp. 1323–1356). Springer Netherlands. 

​Holler, J., & Kendrick, K. H. (2015). Unaddressed participants gaze in multi-person interaction: optimizing recipiency. Frontiers in Psychology, 6. 

Kong, A. P.-H., Law, S.-P., Kwan, C. C.-Y., Lai, C., & Lam, V. (2015). A Coding System with Independent Annotations of Gesture Forms and Functions During Verbal Communication: Development of a Database of Speech and GEsture (DoSaGE). Journal of Nonverbal Behavior, 39(1), 93–111. 

Koutsombogera, M., Deligiannis, M., Giagkou, M., & Papageorgiou, H. (2016). Towards Modelling Multimodal and Multiparty Interaction in Educational Settings (pp. 165–184). 

Lücking, A., Bergman, K., Hahn, F., Kopp, S., & Rieser, H. (2013). Data-based analysis of speech and gesture: the Bielefeld Speech and Gesture Alignment corpus (SaGA) and its applications. Journal on Multimodal User Interfaces, 7(1–2), 5–18. 

Mlakar, I., Verdonik, D., Majhenič, S., & Rojc, M. (2019). Towards Pragmatic Understanding of Conversational Intent: A Multimodal Annotation Approach to Multiparty Informal Interaction – The EVA Corpus (pp. 19–30). 

Schiel, F. (2009). The SmartWeb Corpora: Multimodal Access to the Web in Natural Environments (pp. 1–17). 

Schiel, F., Steininger, S., & Urk, U. (2002). The SmartKom multimodal corpus at BAS. Proc. 3rd Int. Conf. on Language Resources and Evaluation (LREC 2002).

Schueller, D., Beecks, C., Hassani, M., Hinnell, J., Brenger, B., Seidl, T., & Mittelberg, I. (2017). Automated Pattern Analysis in Gesture Research: Similarity Measuring in 3D Motion Capture Models of Communicative Action. DHQ: Digital Humanities Quarterly, 11(2). 

Szekrényes, I. (2014). Annotation and interpretation of prosodic data in the HuComTech corpus for multimodal user interfaces. Journal on Multimodal User Interfaces, 8(2), 143–150.

Trotta, D., Palmero Aprosio, A., Tonelli, S., & Elia, A. (2020). Adding Gesture, Posture and Facial Displays to the PoliModal Corpus of Political Interviews. Language Resources Evaluation Conference (LREC), 4320–4326. 

van Son, R. J. J. H., Wesseling, W., Sanders, E., & van der Heuvel, H. (2008). The IFADV corpus: A free dialog video corpus. Proceedings of Sixth International Conference on Language Resources and Evaluation (LREC), 2008 [Online]. 

Zlatev, J., Andrén, M., & Osathanonda, S. (2006). A video-linked Thai/Swedish child data corpus: A tool for the study of comparative semiotic development. Http://Project.Sol.Lu.Se/Sedsu/. 

​ ​