Au cours de la prochaine réunion IC le 24 juin prochain à 16h sur Teams (équipe IC), Muhammad Usman Malik, postdoc du L3i, présentera ses travaux de thèse sur l’interaction homme-machine.
Title : Learning Multimodal Interaction Models in Mixed Societies"
Human interactions involve multiple modalities, which can be verbal such as speech and text, as well as non-verbal i.e. facial expressions, gaze, head and hand gestures, etc. To mimic real-time human-human interaction within human-agent interaction, multiple interaction modalities can be exploited. With the availability of multimodal human-human and human-agent interaction corpora, machine learning techniques can be used to develop various interrelated human-agent interaction models. In this regard, our research work proposes original models for addressee detection, turn change and next speaker prediction, and finally visual focus of attention behaviour generation, in multiparty interaction.
Our addressee detection model predicts the addressee of an utterance during interaction involving more than two participants. The addressee detection problem has been tackled as a supervised multiclass machine learning problem. Various machine learning algorithms have been trained to develop addressee detection models. The results achieved show that the proposed addressee detection algorithms outperform a baseline.
The second model we propose concerns the turn change and next speaker prediction in multiparty interaction. Turn change prediction is modeled as a binary classification problem whereas the next speaker prediction model is considered as a multiclass classification problem. Machine learning algorithms are trained to solve these two interrelated problems. The results depict that the proposed models outperform baselines.
Finally, the third proposed model concerns the visual focus of attention (VFOA) behaviour generation problem for both speakers and listeners in multiparty interaction. This model is divided into various sub-models that are trained via machine learning as well as heuristic techniques. The results testify that our proposed systems yield better performance than the baseline models developed via random and rule-based approaches.
The results show that the VFOA behaviour generated via the proposed VFOA model is perceived more natural than the baselines and as equally natural as real VFOA behaviour.