The aim of this task is to estimate the level of conversational engagement of participants in a group video-mediated communication. The dataset for this task consists of several auditory, video and gaze recordings from a potential home teleconference system (see [1, 2]). Each recording captures interaction between a group of co-located participants and one remote participant, involved in activities ranging from casual conversation to simple social games. The audio-visual recordings are accompanied by gaze recordings of the remote participant, manually-annotated head positions and voice activity annotations. The experiments will be done for the remote participant for whom gaze data is available.

Evaluation metric

A ground truth annotation for training part of the dataset will be made available to the participants. Ground truth for testing part of the dataset will be released after the challenge. In the submission, participants are expected to provide a short description of their system, and its outputs for short intervals of the testing data in a defined simple format for evaluation. The official metric used in the evaluations will be a weighted classification cost reflecting the similarities between the different levels of engagement. The weights will be made public together with the training data. Additionally, DET (Detection Error Tradeoff) curves and confusion matrices will be generated. A baseline two-class engagement recognition achieved EER of 74% [3].


Please see [1].


[2] M. Hradis, S. Eivazi, R. Bednarik. Voice activity detection in video mediated communication from gaze. In proceedings of ETRA’12, pp. 329-332.
[3] R. Bednarik, S. Eivazi, M. Hradis. Gaze and Conversational Engagement in Multiparty Video Conversation: An annotation scheme and classification Gaze. In Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction, Article 10, 2012.