Contract type : Public service fixed-term contract
Level of qualifications required : Graduate degree or equivalent
Fonction : PhD Position
One of the main objectives of social robotics research is to design and develop robots that can engage in social environments in a way that is appealing and familiar to humans. However, interaction is often difficult because users do not understand the robot’s internal states, intentions, actions, and expectations. Thus, to facilitate successful interaction, social robots should provide communicative functionality that is both natural and intuitive. Given the design of humanoid robots, they are typically expected to exhibit human-like communicative behaviors, using speech and non-verbal expressions just as humans do. Gestures help in conveying information which speech alone cannot provide and need to be completed, as in referential, spatial or iconic information [HAB11]. Moreover, providing multiple modalities helps to dissolve ambiguity typical of unimodal communication and, as a consequence, to increase robustness of communication. In multimodal communication, gestures can make interaction with robots more effective. In fact, gestures and speech interact. They are linked in language production and perception, with their interaction contributing to an effective communication [WMK14]. In oral-based communication, human listeners have been shown to be well attentive to information conveyed via such non-verbal behaviors to better understand the acoustic message [GM99].
This topic can be addressed in the field of robotics where few approaches incorporate both speech and gesture analysis and synthesis [GBK06, SL03], but also in the field of developing virtual conversational agents (talking avatars), where the challenge of generating speech and co-verbal gesture has already been tackled in various ways [NBM09, KW04, KBW08].
For virtual agents, most existing systems simplify the gesture-augmented communication by using lexicons of words and present the non-verbal behaviors in the form of pre-produced gestures [NBM09]. For humanoid robots the existing models of gesture synthesis mainly focus on the technical aspects of generating robotic motion that fulfills some communicative function, but they do not combine generated gestures with speech or just pre-recorded gestures that are not generated on-line but simply replayed during human-robot interaction.
The goal of this thesis is to develop a gesture model for a credible communicative robot behavior during speech. The generation of gestures will be studied when the robot is a speaker and when it is a listener. In the context of this thesis, the robot will be replaced by an embodied virtual agent. This allows applying of the outcome of this work in both virtual and real world. It is possible to test the results of this work on a real robot by transferring the virtual agent behavior to the robot, when possible, but it is not an end in itself.
In this thesis, two main topics will be addressed: (1) the prediction of communication-related gesture realization and timing from speech, and (2) the generation of the appropriate gestures during speech synthesis. When the virtual agent is listening to a human interlocutor, the head movement is an important communicative gesture that may give the impression that the virtual agent understands what is said to it and that may make the interaction with the agent more effective. One challenge is to extract from speech, both acoustic and linguistic cues [KA04], to characterize the pronounced utterance and to predict the right gesture to generate (head posture, facial expressions and eye gaze [KCD14]). Synchronizing the gestures with the interlocutor speech is critical. In fact, any desynchronization may induce an ambiguity in the understanding of the reaction of the virtual agent. The gesture timing correlated with speech will be studied. In this work, generating the appropriate gesture during speech synthesis, mainly head posture, facial expressions and eye gaze, will be addressed.
To achieve these goals, motion capture data during uttered speech will be acquired synchronously with the acoustic signal. Different contexts will be considered to achieve the collection of sufficiently rich data. This data will be used to identify suitable features to be integrated within the framework of machine learning techniques. As the data is multimodal (acoustic, visual, gestures), each component will be used efficiently in collecting complementary data. The speech signal will be used in the context of a speech-recognition system to extract the linguistic information, and acoustic features helps to extract non linguistic information, as F0 for instance. The correlation between gestures and speech signal will also be studied. The aim of the different analyses is to contribute to the understanding of the mechanism of oral communication combined with gestures and to develop a model that can predict the generation of gestures in the contexts of speaking and listening.
- [GBK06] Gorostiza J, Barber R, Khamis A, Malfaz M, Pacheco R, Rivas R, Corrales A, Delgado E, Salichs M (2006) Multimodal human-robot interaction framework for a personal robot. In: RO-MAN 06: Proc of the 15th IEEE international symposium on robot and human interactive communication
- [GM99] Goldin-Meadow S (1999) The role of gesture in communication and thinking. Trends Cogn Sci 3:419–429
- [HAB11] Hostetter AB (2011) When do gestures communicate? A meta- analysis. Psychol Bull 137(2):297–315
- [NBM09] Niewiadomski R, Bevacqua E, Mancini M, Pelachaud C (2009) Greta: an interactive expressive ECA system. In: Proceedings of 8th int conf on autonomous agents and multiagent systems (AA- MAS2009), pp 1399–1400
- [KA04] Kendon, Adam, 2004. Gesture – Visible Action as Utterance. Cambridge University Press.
- [KBW08] Kopp S, Bergmann K, Wachsmuth I (2008) Multimodal commu- nication from multimodal thinking—towards an integrated model of speech and gesture production. Semant Comput 2(1):115–136
- [KCD14] Kim, Jeesun, Cvejic, Erin, Davis, Christopher, Tracking eyebrows and head gestures associated with spoken prosody. Speech Communication (57), 2014.
- [KW04] Kopp S, Wachsmuth I (2004) Synthesizing multimodal utter- ances for conversational agents. Comput Animat Virtual Worlds 15(1):39–52
- [SL03] Sidner C, Lee C, Lesh N (2003) The role of dialog in human robot interaction. In: International workshop on language understanding and agents for real world interaction
- [WMK14] Petra Wagner, Zofia Malisz, Stefan Kopp, Gesture and speech in interaction: An overview,Speech Communication, Volume 57, 2014, Pages 209-232.
Master of computer science. Good background in modeling, data analysis and machine learning. First experience in speech recognition or in using a deep learning technique will be appreciated.
French or English.
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Salary: 1982€ gross/month for 1st and 2nd year. 2085€ gross/month for 3rd year.
Monthly salary after taxes : around 1596,05€ for 1st and 2nd year. 1678,99€ for 3rd year. (medical insurance included).
- Town/city : Villers-lès-Nancy
- Inria Center : CRI Nancy - Grand Est
- Starting date : 2019-10-01
- Duration of contract : 3 years
- Deadline to apply : 2019-05-01
- Inria Team : MULTISPEECH
PhD Supervisor :
Ouni Slim / firstname.lastname@example.org
The keys to success
May 1st, 2019(Midnight Paris time)
How to apply
Upload your file on jobs.inria.frin a single pdf or zip file, and send it as well by email to email@example.com. Your file should contain the following documents:
- Your CV.
- A cover/motivation letter describing your interest in this topic.
- A short (max one page) description of your Master thesis (or equivalent) or of the work in progress if not yet completed.
- Your degree certificates and transcripts for Bachelor and Master (or the last 5 years).
- Master thesis (or equivalent) if it is already completed and publications if any (it is not expected that you have any). Only the web links to these documents are preferable, if possible.
In addition, one recommendation letter from the person who supervises(d) your Master thesis (or research project or internship) should be sent directly by his/her author to firstname.lastname@example.org
Applications are to be sent as soon as possible.
Inria, the French national research institute for the digital sciences, promotes scientific excellence and technology transfer to maximise its impact. It employs 2,400 people. Its 200 agile project teams, generally with academic partners, involve more than 3,000 scientists in meeting the challenges of computer science and mathematics, often at the interface of other disciplines. Inria works with many companies and has assisted in the creation of over 160 startups. It strives to meet the challenges of the digital transformation of science, society and the economy.
Instruction to apply
Defence Security :
This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.
Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people with disabilities.
Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.