Internship Master 2: Adding detail to human feed-forward models

Le descriptif de l’offre ci-dessous est en Anglais

Niveau de diplôme exigé : Bac + 5 ou équivalent

Fonction : Stagiaire de la recherche

A propos du centre ou de la direction fonctionnelle

The Inria Grenoble research center groups together almost 600 people in 27 research teams and 8 research support departments.

Staff is present on three campuses in Grenoble, in close collaboration with other research and higher education institutions (University Grenoble Alpes, CNRS, CEA, INRAE, …), but also with key economic players in the area.

Inria Grenoble is active in the fields of high-performance computing, verification and embedded systems, modeling of the environment at multiple levels, and data science and artificial intelligence. The center is a top-level scientific institute with an extensive network of international collaborations in Europe and the rest of the world.

Mission confiée

 

Context

Recent work in AI-powered computer vision has yielded a new field of so-called coordinate regression methods such as Dust3r[1], AceZero[2], Mast3r[3], Monst3r[4], that have reshaped 3D vision as they introduce a new, feed-forward neural paradigm to address 3D reconstruction problem from a pair of uncalibrated input images. These methods show that fundamental tasks such as depth map estimation, point matching and triangulation, can be efficiently addressed through simple neural architectures, pairing twin visual transformers and letting their cross-attention do the heavy lifting to automatically learn and infer cross-image characteristics that allow such tasks to be solved.

A more recent specialization of coordinate regression approaches hint that more complex inference spaces can be tackled with these neural architectures, such as estimation of full-body human shape parameters (SMPL [5]), through proposals like Hamst3r[6] or Human3R[7], or doing paired inference across a posed space - i.e. the current frame's body deformation - and a canonical body space - i.e. the intrinsic shape expression in a space independent of any given pose or deformation, through works like DualPM. These offer very promising paths to fast, few camera human shape inference from images but currently lack sufficient expression and training for the refinement of surface shape detail in human models.

Mission

The subject of this master proposal is to tackle exactly the aforementioned limitations of human shape regression methods and propose an exploration to add realistic detail to inferred human shapes in this context, that may better account for fine clothing, facial or hair details, by proposing an additional layer or architectural feature to a state of the art method such as Hamst3r[6] allowing it to better account for such detail.

Principales activités

 

 

Main activities

During his internship, the master candidate is expected to tackle the following tasks

  • establish a more complete bibliography of relevant methods based on the initial suggested references below
  • propose and discuss likely and realizable methodological and architecture innovations / reparametrizations that allow to model detail, grounded in this existing work
  • propose and discuss dataset enhancements that would enrich the training toward better performance for these tasks
  • identify existing datasets that are relevant for comparative evaluation of performance of his proposals. In-house datasets such as 4DHumanOutfit[8], acquired by Morpheo and Naver Labs with the Kinovis platform at Centre Inria of University Grenoble Alpes in Montbonnot

References

[1] Wang, Shuzhe / Leroy, Vincent / Cabon, Yohann / Chidlovskii, Boris / Revaud, Jerome
DUSt3R: Geometric 3D Vision Made Easy
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

[2] Brachmann, Eric / Wynn, Jamie / Chen, Shuai / Cavallari, Tommaso / Monszpart, Áron / Turmukhambetov, Daniyar / Prisacariu, Victor Adrian
Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
2024, CoRR , Vol. abs/2404.14351

[3] Leroy, Vincent / Cabon, Yohann / Revaud, Jérôme
Grounding Image Matching in 3D with MASt3R
2024 CoRR , Vol. abs/2406.09756

[4] Zhang, Junyi / Herrmann, Charles / Hur, Junhwa / Jampani, Varun / Darrell, Trevor / Cole, Forrester / Sun, Deqing / Yang, Ming-Hsuan
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
2024 CoRR , Vol. abs/2410.03825

[5] Loper, Matthew / Mahmood, Naureen / Romero, Javier / Pons-Moll, Gerard / Black, Michael J.
SMPL: a skinned multi-person linear model
2015 ACM Trans. Graph. , Vol. 34, No. 6
p. 248:1-248:16

[6] Sara Rojas, Matthieu Armando, Bernard Ghamen, Philippe Weinzaepfel, Vincent Leroy, Gregory Rogez
HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction
International Conference on Computer Vision (ICCV), Honolulu, Hawaii, 19-23 October, 2025

[7] Chen, Yue and Chen, Xingyu and Xue, Yuxuan and Chen, Anpei and Xiu, Yuliang and Gerard, Pons-Moll
Human3R: Everyone Everywhere All at Once. ArXiv 2025.

[8] Armando, Matthieu / Boissieux, Laurence / Boyer, Edmond / Franco, Jean-Sébastien / Humenberger, Martin / Legras, Christophe / Leroy, Vincent / Marsot, Mathieu / Pansiot, Julien / Pujades, Sergi / Rekik, Rim / Rogez, Grégory / Swamy, Anilkumar / Wuhrer, Stefanie
4DHumanOutfit: A multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements
2023 Computer Vision and Image Understanding , Vol. 237

Compétences

 

This internship is aimed at M1/M2 candidates, preferably with some skills in the following domains

  • computer vision, image processing background
  • AI / machine learning / deep learning background
  • some Python / PyTorch experience
  • scientific curiosity, taste and autonomy in explorative tasks and problems

Avantages

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities

Rémunération

Minimum legal gratification