PhD Position F/M PhD position F/M Data-driven methods for vision-based robotic motion control
Type de contrat : CDD
Niveau de diplôme exigé : Bac + 5 ou équivalent
Fonction : Doctorant
Contexte et atouts du poste
The Willow team will be a resourceful environment to carry out this project, as it is recognized for its contributions in the fields of computer vision and robotics. The research led during this PhD will also benefit from the regional AI ecosystem via the Parisian AI institute PR[AI]RIE-PSAI.
Non-discrimination, openness and transparency: Partners of PR[AI]RIE-PSAI are committed to supporting and promoting equality, diversity, and inclusion within their communities. We encourage applications from diverse backgrounds, which we will ensure are selected through an open and transparent recruitment process. (Français : Non-discrimination, ouverture et transparence : L’ensemble des partenaires de PR[AI]RIE-PSAI s’engagent à soutenir et promouvoir l’égalité, la diversité et l’inclusion au sein de ses communautés. Nous encourageons les candidatures issues de profils variés, que nous veillerons à sélectionner via un processus de recrutement ouvert et transparent.)
Mission confiée
Context: Motion control in robotics is subject to Moravec’s paradox: robots can execute physically impressive mo-tions, yet they fail at seemingly simple tasks. The most impressive humanoid robot today, the Atlas from BostonDynamics, came to fame by performing athletic back-flips; yet it would not be able to get up from lying in bed, unless specialized engineers work on implementing that new behavior. All its motions are executed under a set of assumptionsabout the robot’s environment, for instance: the robot has its torso tilted less than 90 degrees from gravity, has itsfeet on a flat floor or a moderately-tilted terrain, is facing stairs with a specific number of steps, etc.
This approach has led to a decoupling between perception and locomotion. On the one hand, perception experts workon geometric exteroception problems such as detecting walls and steppable surfaces; on the other hand, locomotion experts implement control strategies that assume knowledge of the environment and focus on proprioception (torque and force measurements, inertial measurements, etc.). The main drawback of this decoupling, however, is that it makesboth the vision and locomotion problems harder than they would be if they were addressed jointly. For instance, ina study of a quadruped robot walking in a forest, Miki et al. [1] observed that surface reconstruction methods would frequently fail, in which case the locomotion behavior essentially downgraded to blind locomotion. (Needless to say, walking blindly in a forest is high-risk, even for humans.)
The current paradigm to make robots walk outside of controlled lab environments relies on deep reinforcementlearning from massively-parallel simulations [2, 3]. It does not revisit the extero-proprioceptive decoupling: rather,locomotion policies are made robust against defective perception via domain randomization. Including vision involvesa major update to this paradigm, as the ability to see objects from afar fundamentally creates a synchronizationbottleneck that breaks simulation parallelism. In this thesis, our plan is to explore a line of ideas orthogonal to thesimulation-intensive approach.
Scientific objectives: This project explores questions that arise when relaxing assumptions about the structure of theworld that are at the core of the extero-proprioceptive decoupling. What if locomotion is allowed to decide motionsfrom implicit rather than explicit representations? What if vision contributed to locomotory decisions, and not onlythe other way round?
This thesis is articulated around three main axes: perception, control, and learning from real-robot data. Scientifically,our key idea is to leverage physical contacts on legged robots to establish ground-truth validation between vision andproprioception. We will focus on real-robot data rather than massively-parallel simulation, using low-cost open-sourcerobots for rapid prototyping and data collection. Our first application will be in perception, where we will study the question of contact estimation using both visual and motor data. We will then consider the broader topic of including visual inputs to extend model predictive control into interpretable motion policies. Our overall objective throughoutthe work will be to evaluate how learning from limited real-robot data, but including visual inputs, can be applied onreal-robots to solve challenging tasks such as agile locomotion.
Application process: Applications will only be considered if they are submitted online from the Inria website. The deadline for submitting an application is May 15th, 2025. After this deadline, a screening process will take place and results will be communicated in two stages:
- Pre-screening until May 30th, 2025, at 1:00pm CEST.
- Final selection by a PRAIRIE-PSAI committee before June 15th, 2025.
Each application should include:
- An up-to-date CV
- A one-page motivation letter covering (1) the candidate's ambitions for this topic and (2) the candidacy's fit to the PhD topic described below.
- Scans of the latest diplomas.
Principales activités
Research plan
Axis 1: Connecting vision and proprioception through contact estimation
One goal of this project will be to handle visual and proprioceptive data jointly when learning new motion policies. We will considercontact estimation as an approach to validate the connection between the two. For limbed robots, contactestimation is the problem of determining whether any part of a limb, for instance the sole of a foot on a humanoid leg, is in contact with the environment. When it comes to estimating contact, vision validates the no-contact hypothesis (ifone sees space between two bodies, they are not touching) whereas proprioception validates the in-contact hypothesis(if one feels resistance below their feet, they are on the ground). Contact estimation is commonly solved using priormodels, such as in contact-aided invariant Kalman filtering [4] or probabilistic contact estimation [5, 6]. These methodsrely on priors due to data scarcity, as it is expensive to collect data from large expensive robots operated by specializedtechnicians [4, 7, 8].
Our proposal is to collect ajoint visual and proprioceptive datasetfrom real-robot data, from which we may learn visual representations and motion control simultaneously. Technically, we will be able to collect larger datasets onopen-source wheeled-legged robots available at Inria Paris, which are easier to operate and cheaper to maintain than large-scale legged robots, yet have the same challenging properties for locomotion (underactuated dynamics, importanceof collision avoidance, ...) Scientifically, we will follow up on the idea of encoding visual inputs to a latent space andlearning latent-space dynamics [9, 10], with the novelty of takingcontact constraintsinto account. For instance, ifthe robot is making contact with a wall in front of it, the learned dynamics will be trained to predict that trying togo forward will result in increased proprioceptive contact forces and marginal visual changes (and conversely, visualmotion and marginal force increase in the absence of contact).
Our goal will thus be to lay the foundations for an implicit representation shared between vision and locomotioncomponents, attacking the problem through the well-defined question of contact estimation, where we will have existingbaselines to compare against, and an original angle in terms of methods, with data-based machine learning rather thanmodel-based state estimation.
Axis 2: Model predictive control with visual inputs
Controlling physical robots means dealing with complex and various sensory inputs such as vision, velocity, accelerationand force measurements, etc. In agile robot locomotion, optimal controlhas stood out as a relevant paradigm toderive effective controllers, whether it is via model predictive control and online numerical optimization [11, 7] or viareinforcement learning [12, 3]. The main driver behind this adoption is that optimal control represents and adapts to thephysics underlying the problem at hand. Yet, optimal control requires a model of forward dynamics ̇x=f(x, u), whichis discretized asxt+1=fd(xt, ut) and unrolled either directly in model predictive control optimizations or indirectlyin a simulator training a parameterized policy by reinforcement learning. This pipeline has worked successfully forlow-dimensional proprioceptive inputs that map nicely to statesxt, yet lead to blind policies. How can we deal withvisual inputs to train perceptive policies?
In this axis, we will explore machine learning of maps from visual inputs to not only system dynamics (as in Axis 1)but full-fledged optimal control problems. Advances on this topic have been made possible thanks to recent works onthe differentiation of convex optimization problems [13] and convex optimal control [14, 10]. We will focus on modelpredictive control, where optimal control problems are solved repeatedly over a receding horizon. The receding horizonconsists of two parts: near-future, where dynamics and constraints are fully taken into account while optimizing anobjective function, and post-horizon, where an approximation of the value function for the terminal state is used toapproximate infinite-time optimization. We will explore how visual inputs can map to both of these regimes. In thenear-future regime, via (i) initial state, (ii) objective function, (iii) system dynamics and (iv) constraints, and in thepost-horizon regime, via (v) value function approximation. Preliminary results [14, 15] showed that nonlinear taskscould be approximated by a convex model predictive controller using a vision-trained map to the objective function(ii). In this axis, we will consider the whole spectrum (i)–(v) where vision can enrich model predictive control to solve more complex perceptive tasks.
Unlike reinforcement learning of black-box function approximators, the problems we will predict from vision will alsobeinterpretable. For instance, a collision avoidance task mapped to (iv) constraints will produce a polytope (hence avolume of space that we can visualize) that the model predictive controller will certifiably avoid. We will include inour study questions not only of optimizing task performance, but also of trading it off with model interpretability withapplications to user feedback.
Axis 3: Sim-to-real transfer of vision-based policies
Experimental validation in simulation is an important step to assess the robustness and capabilities of a proposedsolution. In robotics, the sim-to-real gap is particularly dominant: mathematical models of the robot, the environmentand their interactions are usually simplified, prompting the need to validate solutions on real hardware as often aspossible. In this project, we will work with hardware and real-robot data distributions right from the start. The data necessary to train our controllers will be gathered from wheeled-biped robots built and maintained at Inria.
We will consider tasks that are challenging to transfer to real robots, such as stair climbing. Stair climbing ischallenging even for seasoned roller skaters, and has not been demonstrated dynamically yet on wheeled bipeds. It alsoencompasses the salient components of visual predictive control: stairs are only visible from afar, prompting practicalconfrontation with a split receding horizon. As a first challenge, we will consider the task of continuous stair climbingby always keeping both wheels on the ground. This approach will require fine contact estimation, as the robot will need to be able to discriminate between vertical and horizontal forces exerted on its wheels, allowing us to evaluate theeffectiveness of the representations built in Axis 1. We will then further consider the question of dynamic stair climbing, the richer behavior where the robot is allowed to lift its legs and cannot stop mid-step. The dynamic version of thetask is achievable, as demonstrated by seasoned roller skaters, yet has not been demonstrated on any wheeled-biped robot so far.
References
- T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedalrobots in the wild,”Science Robotics, vol. 7, no. 62, 2022.
- A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,” inRobotics: Science and Systems, 2021.
- N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Proceedings of the 5th Conference on Robot Learning, vol. 164 of Proceedings of MachineLearning Research, pp. 91–100, PMLR, 08–11 Nov 2022.
- R. Hartley, M. G. Jadidi, J. Grizzle, and R. M. Eustice, “Contact-aided invariant extended kalman filtering for legged robot stateestimation,” inProceedings of Robotics: Science and Systems, (Pittsburgh, Pennsylvania), June 2018.
- J. Hwangbo, C. D. Bellicoso, P. Fankhauser, and M. Hutter, “Probabilistic foot contact estimation by fusing information from dynamics and differential/forward kinematics,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3872–3878, IEEE, 2016.
- U. B. Gökbakan, F. D ̈umbgen, and S. Caron, “A Data-driven Contact Estimation Method for Wheeled-Biped Robots,” in IEEE International Conference on Robotics and Automation, May 2025.
- S. Caron, A. Kheddar, and O. Tempier, “Stair climbing stabilization of the hrp-4 humanoid robot using whole-body admittance control,” in2019 International conference on robotics and automation (ICRA), pp. 277–283, IEEE, 2019.
- J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science robotics, vol. 5, no. 47, 2020.
- N. Hansen, X. Wang, and H. Su, “Temporal difference learning for model predictive control,” inICML, 2022.
- O. Bounou, J. Ponce, and J. Carpentier, “Learning system dynamics from sensory input under optimal control principles,” in CDC 2024 Conference on Decision and Control, 2024.
- J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, and S. Kim, “Dynamic locomotion in the mit cheetah 3 through convex model-predictivecontrol,” in 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1–9, IEEE, 2018.
- J. Tan, T. Zhang, E. Coumans, A. Iscen, Y. Bai, D. Hafner, S. Bohez, and V. Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots,”arXiv preprint arXiv:1804.10332, 2018.
- A. Bambade, F. Schramm, A. Taylor, and J. Carpentier, “Qplayer: efficient differentiation of convex quadratic optimization,” 2023.
- A. Meduri, H. Zhu, A. Jordana, and L. Righetti, “Mpc with sensor-based online cost adaptation,” in2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 996–1002, IEEE, 2023.
- V. Tordjman--Levavasseur and S. Caron, “Collision avoidance from monocular vision trained with novel view synthesis.”, pre-print, Mar. 2025.
Compétences
- Skills: robotics (M2), computer vision (M2), machine learning (M2), Python (advanced)
- Language: English (French is a plus)
- Additional skills (not required but appreciated): convex optimization, C++, Linux, Git
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Informations générales
- Thème/Domaine :
Vision, perception et interprétation multimedia
Calcul Scientifique (BAP E) - Ville : Paris
- Centre Inria : Centre Inria de Paris
- Date de prise de fonction souhaitée : 2025-09-01
- Durée de contrat : 3 ans
- Date limite pour postuler : 2025-05-15
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postuler
Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
Contacts
- Équipe Inria : WILLOW
-
Directeur de thèse :
Caron Stephane / stephane.caron@inria.fr
L'essentiel pour réussir
Carrying out this PhD project will require in particular:
- Integrating into a dynamic scientific environment: having an analytical mind, but also a taste for learning, listening and sharing thoughts will be essential.
- Past studies in machine learning or robotics at the M2 level, including motion planning, kinematics and dynamics modeling, as well as computer vision at the M2 level.
- Previous experience of scientific research during an M2 internship.
- Appetite for experimenting on real robots.
A propos d'Inria
Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.