Contract type : Public service fixed-term contract
Level of qualifications required : Graduate degree or equivalent
Fonction : PhD Position
This PhD thesis is proposed in the context of the PIMPE project: Physical complex Interactions and Multi-person Pose Estimation. The project is funded by the IDEX of the Communauté Université Grenoble Alpes, under the International Strategic Partnerships program. Therefore, the PhD funding is granted and the student will spend half of the PhD in Grenoble and half of the PhD in Barcelona. In Grenoble, the PhD student will be hosted at the Perception Team of Inria/LJK (https://team.inria.fr/perception/). In Barcelona, the PhD student will be hosted at IRI-UPC (https://www.iri.upc.edu/).
The estimation of the full human body pose is a paramount computer vision low-level task, potentially applicable to a wide variety of fields including the entertainment industry, sports technology, physical therapy and medical diagnosis. Seminal works grounded the estimation of the pose of a single person [1,2], which is still a very active line of research [3-8]. More recent works on human pose estimation can be roughly split into two categories. On the one side, methods targeting multi-person pose estimation in non-interactive scenarios [9-11], meaning people acting almost independently of each other. On the other side, there are a few works devoted to extreme/uncommon body poses , which could have significant impact in detecting abnormal situations (e.g. helping to diagnose patients with partial motion impairments or detecting fights in surveillance cameras) or on understanding team activities (e.g. in sport and art performances). Full body pose estimation, is often cast into a multivariate regression problem , for which robust (deep) regression techniques are the state-of-the-art [14,15].
Nevertheless, we identify a lack of methodological approaches being able to capture the complexity of visual human pose in multi-person interactive scenarios. In other words, current methods do not provide any means for the analysis or synthesis of images describing multi-person complex interactions that are faithful and realistic in terms of human body pose. We hypothesize that this is in part due to the lack of large training sets (manually) annotated for such scenarios, and therefore to the impossibility of training learning models in general and deep neural architectures in particular.
In this project we aim to develop machine learning techniques able to jointly estimate the full body pose of several persons involved in the same physical interaction. We are interested in a plethora of applicative scenarios, from industrial worker cooperation, to sports and arts as well as anomaly detection. To overcome the data availability issue, we propose to design learning strategies for the generation of realistic and controllable multi-person full-body extreme pose datasets: realistic so as to minimize the data distribution gap between training and real scenarios, controllable so as to generate a wide variability of poses, thus allowing for generalization to complex body motions and inter-body physical interactions.
In order to tackle the main challenges of the project, we will consider two important methodological tools. Firstly, the use of deep neural architectures, that demonstrated great potential on visual data generation [17,18,19], and whose evaluation for regression tasks is still an active area of research . Second the use of resources available from the two partners, namely the Kinovis room at Inria, and a Xsens suit at UPC.
 Yang, Y., Ramanan, D. Articulated pose estimation with flexible mixtures-of-parts. In IEEE CVPR, 2011.
 Ferrari, V., Marin-Jimenez, M., Zisserman, A. Progressive search space reduction for human pose estimation. In IEEE CVPR, 2008.
 Bulat, A., Tzimiropoulos, G. Human pose estimation via convolutional part heatmap regression. In ECCV, 2016.
 Toshev, A., Szegedy, C. Deeppose: Human pose estimation via deep neural networks. In IEEE CVPR, 2014.
 Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In ECCV 2016
 Martinez, J., Hossain, R., Romero, J., Little, J.: A simple yet effective baseline for 3D human pose estimation. In ICCV, 2017.
 Moreno-Noguer, F.: 3D human pose estimation from a single image via distance matrix regression. In IEEE CVPR, 2017.
 Pavlakos, G., and. K. G. Derpanis, X.Z., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In IEEE CVPR 2017.
 Cao, Z., Simon, T., Wei, S. E., Sheikh, Y. Realtime multi-person 2D pose estimation using part affinity fields. In IEEE CVPR, 2017.
 Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., & Schiele, B. Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In ECCV, 2016.
 Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., & Murphy, K. Towards accurate multi-person pose estimation in the wild. In IEEE CVPR, 2017.
 Akhter, I., Black, M.: Pose-conditioned Joint Angle Limits for 3D Human Pose Reconstruction. In IEEE CVPR, 2015.
 Belagiannis, V., Zisserman, A. Recurrent human pose estimation. In IEEE FG, 2017.
 Belagiannis, V., Rupprecht, C., Carneiro, G., Navab, N. Robust optimization for deep regression. In IEEE ICCV, 2015.
 Lathuilière, S., Mesejo, P., Alameda-Pineda, X., Horaud, R. DeepGUM: Learning Deep Robust Regression with a Gaussian-Uniform Mixture Model. In ECCV, 2018.
 Sadeghian, A., Alahi, A., & Savarese, S. Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In ICCV, 2017.
 Wang, W., Alameda-Pineda, X., Xu, D., Fua, P., Ricci, E., and Sebe, N,, Every smile is unique:Landmark-guided diverse smile generation, in IEEE CVPR, 2018.
 Pumarola, A., Agudo, A., Sanfeliu, A., and Moreno-Noguer, F., Unsupervised Person Image Synthesis in Arbitrary Poses, in IEEE CVPR, 2018.
 Pumarola, A., Agudo, A., Martinez, A. M. , Sanfeliu, A., and Moreno-Noguer, F., Ganimation:Anatomically-aware facial animation from a single image, in ECCV, 2018.
 Lathuilière, S., Mesejo, P., Alameda-Pineda, X., Horaud, R. A Comprehensive Analysis of Deep Regression. IEEE TPAMI, 2019.
In the first months the student will have three main tasks: state-of-the-art on physical interaction modeling (directly related to CH1), software exploration and dataset acquisition, both useful for the entire PhD. First, the student will need to parse the literature to understand what is the state-of-the-art in modeling humans and their physical interactions, in natural scenarios related to the applications mentioned in the project’s proposal. This will provide the student a good understanding of the methodological choices present in the literature. Secondly, the student will need to explore the existing software, both well established libraries as well as related software projects publicly available on the cloud. This will provide the student the necessary tools and practical knowledge for the rest of the thesis. Third, the student will need to devote time to data acquisition, both at Inria/LJK exploiting the Kinovis platform, and at UPC-IRI using the Xsens suit.
Research Master's degree, or equivalent, in a discipline connected to computer vision and machine learning. A particular interest/experience in computer vision tasks and machine learning problems, specifically with deep learning is a must. Strong motivation for the research work. Ability to work both independently and to collaborate within a small team. Computer skills: MATLAB, Python, Deep Learning Toolkits (e.g. Keras, Pytorch). Oral and written English communication skills are mandatory.
- Partial reimbursement of public transport costs
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Social security coverage
1768,55€/ month before taxes
- Theme/Domain :
Vision, perception and multimedia interpretation
Information system (BAP E)
- Town/city : Montbonnot
- Inria Center : CRI Grenoble - Rhône-Alpes
- Starting date : 2019-09-02
- Duration of contract : 3 years
- Deadline to apply : 2019-06-30
The keys to success
The PhD student will start on September 2019, for a period of three years (until August 2022). The PhD will be split in two halves, first at the Perception Team of Inria and LJK (until April 2021) and then at IRI-UPC. The PhD student will be co-supervised by Dr. Moreno-Noguer (https://www.iri.upc.edu/people/fmoreno/index.html) and Dr. Xavier Alameda-Pineda (http://xavirema.eu) with the support of Dr. Antonio Agudo (http://www.iri.upc.edu/people/aagudo/) and Dr. Radu Horaud (https://team.inria.fr/perception/team-members/radu-patrice-horaud/).
Inria, the French national research institute for the digital sciences, promotes scientific excellence and technology transfer to maximise its impact. It employs 2,400 people. Its 200 agile project teams, generally with academic partners, involve more than 3,000 scientists in meeting the challenges of computer science and mathematics, often at the interface of other disciplines. Inria works with many companies and has assisted in the creation of over 160 startups. It strives to meet the challenges of the digital transformation of science, society and the economy.
Instruction to apply
Defence Security :
This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.
Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people with disabilities.
Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.