PhD Position F/M Foundation Models and Natural Language Communication for Human-Robot Collaboration

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : PhD Position

Context

The HUCEBOT team is a new team of the Center Inria at the University of Lorraine. 

The team is dedicated to advancing algorithms for human-centered robots: robots that are not working autonomously in isolation, but that instead react, interact, collaborate, and assist humans. To do so, these robots need to intertwine a multi-contact whole-body controller, a digital simulation of the interacting humans, and machine learning models to predict and respond to human movements and intentions. In a crescendo of complexity, the team will tackle scenarios that involve collaboration with cobots,  assistance with exoskeletons, and whole-body teleoperation of humanoid robots. The application domains span from industrial robotics to space teleoperation.

The main robots of the team are the Tiago++ bimanual mobile manipulator, the Unitree G1 humanoid, and the Talos humanoid robot. The team also works with Franka cobots and exoskeletons.

The team currently consists of about 25 members, including permanent researchers, PhD students and post-doctoral students.

Serena Ivaldi, head of HUCEBOT, is holding the chair in Robotics and AI of the Cluster IA ENACT project (https://cluster-ia-enact.ai/) that is funding this PhD thesis. In the chair, she wants to push the research in Natural Language to assist humans in different scenarios of collaboration with robots, where safety is paramount. 

 

 

Assignment

Most work on LLMs for robotics focused on generating sequences of actions and plans from high level goals, offline, only targeting autonomous robots isolated from humans. A critical limitation to deploy LLMs for robots collaborating with humans is their ability to be used online, in a human-in-the-loop scenario, to generate suitable motions and "safe" robot policies.

Here, we use LLMs to generate a robot's motions online in collaborative scenarios where safety is critical: active exoskeletons and mobile manipulators assisting humans in object manipulation. The human vocally commands the robot interactively, online, to control the generation of its motion at the low level: start, stop, direct, and change its low-level parametrization (e.g., compliant behavior, the velocity, the maximal torque assistance, etc.).

The first objective is to design the robot's controller with the natural language interaction feature in mind: the human's commands, corrections and Approximate Numerical Expressions must be translated into meaningful quantities, coherent with the physics of the problem. What do "faster", "a bit higher", "little to the right", and "more assistance" mean?

The second objective is to design new multimodal models fusing LLMs and visual pipelines to predict the human's intent and minimize the need for corrections. Natural language instructions may be incomplete or unclear, but cameras could provide sufficient contextual information to generate an appropriate motion. For example, "take that" could be easily translated into "grasp the bottle", if it is the only item in front of the robot.

The third objective is to detect emergency commands, leveraging both LLMs and audio processing models for nonverbal communication, and generating suitable robot's reactive behaviors. Humans are often unable to speak clearly when they interact with a robot: sometimes, fear takes over and they do not speak at all, or they mumble, or scream, when they could just say a clear "stop". Detecting emergency commands is critical to be able to deploy the robots into the real world.

The PhD student will carry out research in the aforementioned objectives, and will benefit from our collaboration with E. Zibetti (Paris 8, SHS), expert in Approximate Numerical Expressions for Psychology, and D. Sadigh (Stanford University), leading the research in LLMs for robot actions.

Real-world demonstrations with real robots and real humans interacting with the robots are mandatory in this PhD.

Main activities

Main activities: implement, test and develop novel algorithms for robots that use language models and foundation models. Write papers and present them at conferences. Write, test, validate and document its associated software. Experiments with real robots are mandatory.

The PhD will also be involved in the activities organized by the Cluster-AI project ENACT, which may involve dissemination actions, meetings and presentations to relevant stakeholders (Europe, France, industries, etc).

He/she will also participate with the HUCEBOT team to robotics competitions and hackathons organized by the European project euROBIN, with demonstrations of the robots' skills at the European Parliament in 2026 and at ICRA 2026.

Skills

Good skills in Python (Pytorch). Ideally, prior experience with LLM, VLM and Foundation Models.

Good understanding of robotics.

Languages: English (English is the official language of the team and many members do no speak French).

Proactivity and curiosity, ability to work in a team are fundamental.

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration

2200€ gross/month