2019-01562 - PhD Position F/M Multimodal perception of social and pedagogical classroom interactions using a privacy-safe non-individual approach [PHD Campaign 2019 - Campagne Doctorants Grenoble Rhône-Alpes]
Le descriptif de l’offre ci-dessous est en Anglais

Type de contrat : CDD de la fonction publique

Niveau de diplôme exigé : Bac + 5 ou équivalent

Fonction : Doctorant

A propos du centre ou de la direction fonctionnelle

Grenoble Rhône-Alpes Research Center groups together a few less than 800 people in 35 research teams and 9 research support departments.

Staff is localized on 5 campuses in Grenoble and Lyon, in close collaboration with labs, research and higher education institutions in Grenoble and Lyon, but also with the economic players in these areas.

Present in the fields of software, high-performance computing, Internet of things, image and data, but also simulation in oceanography and biology, it participates at the best level of international scientific achievements and collaborations in both Europe and the rest of the world.

Contexte et atouts du poste

The PhD thesis will be co-advised by Dominique Vaufreydaz (Pervasive team, LIG/Inria Univ. Grenoble Alpes, https://research.vaufreydaz.org/) and Philippe Dessus (LaRAC laboratory, Univ. Grenoble Alpes, https://pdessus.fr/). The Pervasive team (https://team.inria.fr/pervasive/) has a long background in computer vision, multimodal perception, multimodal interaction and affective computing. The LaRAC ( http://larac.univ-grenoble-alpes.fr/) is investigating learning and instruction at all school levels in an interdisciplinary way (cognitive science, social psychology).

Mission confiée

Recent research advances in multimodal perception and interaction with humans have led to huge achievements in multiple domains like computer vision (e.g., detecting faces or people from images), or speech recognition (e.g., large vocabulary, multi-speakers, recognition using smartphone), mainly thanks to Deep Learning. These achievements are impacting other research domains like affective computing (emotion or sentiment analysis), and behavioral modelling for human behavior detection and prediction.

In the Teaching Lab project (Idex Formation grant, Univ. Grenoble Alpes), we aim at developing a smart classroom pervasive system for providing delayed feedback about how teachers manage their instruction. This enables to help beginning teachers increase their awareness of the class while teaching. To do so, we need to capture cues to analyze teacher–students relationships: current teacher activity, current teaching episode, class engagement, students attention or engagement, class ambiance, etc. These cues will be computed using signal processing and machine learning techniques and we will rely on cognitive science background to interpret and to draw a multimodal model of classroom interactions.

However, some privacy et ethical issues arise from this analysis. The goal of the system is to analyze the underlying teaching processes (e.g., teacher–students interaction, misbehavior management, …), not to monitor individual behaviors per se, even if they are inadequate. The multimodal perception system will thus monitor the whole classroom at glance to help teachers enhance their instruction afterwards. Hence this system is not intended to detecting and tracking inattentive or disruptive students.

Most of the current state-of-the-art systems, notably Deep Learning systems, are focusing on humans as individuals, i.e. each individual is processed as one entity. For instance, to detect whether a photo carries a mood of happiness as a whole, systems try to accurately detect faces and then smiles on faces. Averaging the number of smiles on the photo leads to the decision: happy photo or not. Starting from these individual-based systems, we aim at creating and testing new multimodal models to capture global moods from whole classroom multi-view footages. The underlying idea is to get rid from individual analysis and to compute global scores for the class instead of counting on the sum of individual detections.

The research question addressed in this thesis is the following: can we analyze global cues about instructional episodes (like engagement, attentional level, etc.) from still image or video sequences coupled with acoustic features? Ways to address this research question are still open and will be questioned in the thesis (Deep fusion, Reinforcement learning, Generative Adversarial Networks, …). This work will be evaluated along two axes: standard performance evaluations for perception systems and pedagogical benefits of the generated feedback to teachers.

Keywords: Multimodal Perception, Deep Learning, Affective Computing, Behavioral Computing, Teaching Analytics.


Barsade, S. G., & Knight, A. P. (2015). Group affect. Annu. Rev. Organ. Psychol. Organ. Behav., 2(1), 21–46.

Gligoric, N., Uzelac, A., Krco, S., Kovacevic, I., & Nikodijevic, A. (2015). Smart classroom system for detecting level of interest a lecture creates in a classroom. Journal of Ambient Intelligence and Smart Environments, 7(2), 271–284.

Wei, Q., Sun, B., He, J., & Yu, L. (2017). BNU-LSVED 2.0: Spontaneous multimodal student affect database with multi-dimensional labels. Signal Processing: Image Communication, 59, 168–181.

Zhang, C., Chang, C., Chen, L., & Liu, Y. (2018, October). Online Privacy-Safe Engagement Tracking System. In Proc. Int. Conf. on Multimodal Interaction (ICMI 2018)(pp. 553–554). ACM.

James, A., Chua, V. Y., Maszczyk, T., Nunez, A. M., Bull, R., Lee, K., & Dauwels, J. (2018). Automated classification of classroom climate by audio analysis. In International Workshop on Spoken Dialog System Technology.

Ramakrishnan, A., Ottmar, E., LoCasale-Crouch, J., & Whitehill, J. (2019). Toward Automated Classroom Observation: Predicting Positive and Negative Climate. In 14th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2019). Lille, France.

Principales activités

Main activities:

  • State-of-the-art literature on multimodal perception, affective computing, behavioral computing
  • Design and implement a multimodal multi-view perception system drawn from already existing software in the team
  • Collect ecological corpora by recording classroom sessions, collaboratively with an already-hired PhD student involved in the Teaching Lab project (working on teacher–students interactions in a context-aware classroom)
  • Propose, implement and validate new multimodal perception models

Additional activities:

  • Write articles and present results at conferences and workshops
  • Participate in project meetings


The candidate must hold a Master in Computer Sciences or in Applied Mathematics, ideally with a background in signal processing and/or machine learning. Good programming skills are also required (C++, python).


  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage


Salary (before taxes) : 1982€ gross/month for 1st and 2nd year. 2085€ gross/month for 3rd year.