2022-05452 - Post-Doctoral Research Visit F/M Statistics for improving sampling quality, applied to the parameters of a generative model for 3D scenes

Contract type : Fixed-term contract

Level of qualifications required : PhD or equivalent

Fonction : Post-Doctoral Research Visit

About the research centre or Inria department

The Inria Université Côte d’Azur center counts 36 research teams as well as 7 support departments. The center's staff (about 500 people including 320 Inria employees) is made up of scientists of different nationalities (250 foreigners of 50 nationalities), engineers, technicians and administrative staff. 1/3 of the staff are civil servants, the others are contractual agents. The majority of the center’s research teams are located in Sophia Antipolis and Nice in the Alpes-Maritimes. Four teams are based in Montpellier and two teams are hosted in Bologna in Italy and Athens. The Center is a founding member of Université Côte d'Azur and partner of the I-site MUSE supported by the University of Montpellier.


The AI Verse technology (http://www.ai-verse.com/) is devised to create infinitely random and semantically consistent 3D scenes. This creation is fast, consuming less than 4 seconds per labeled image. From these 3D scenes, the system is able to build quality synthetic images that come with rich labels that are unbiased unlike manually annotated labels.
As for real data, no metric exists to evaluate the performance of the synthetic datasets to train a neural network. We thus tend to favor the photorealism of the images but such a criterion is far from being the best. The current technology provides a means to control a rich list of additional parameters (quality of lighting, trajectory and intrinsic parameters of the virtual camera, selection and placement of assets, degrees of occlusion of the objects,  choice of materials, etc). Since the generation engine can modify all of these parameters at will to generate many samples, we will explore optimization methods for improving the sampling quality.

The candidate will work with research experts in geometry, generative models and deep learning.

The research activites will be located at Inria and AI Verse, both in Sophia Antipolis (both sites are at a walking distance from each other). Some visits are also planned at Saclay in the LISN Laboratory (expertise in generative models and deep learning).


Most likely, a set of samples generated randomly by the generative model does not cover well the whole space of interesting situations, because of unsuited sampling laws or of sampling realization issues in high dimensions. The main question is how to improve the quality of this generated dataset, that one would like to be close somehow to the given target dataset (consisting of examples of images that one would like to generate). For this, statistical analyses of these two datasets and of their differences are required, in order to spot possible issues such as strongly under-represented areas of the target domain. Then, sampling laws can be adjusted accordingly, possibly by optimizing their hyper-parameters, if any.

Main activities

Technical goals or research tracks:
- visualization tools based on dimensionality reduction techniques (such as PCA, t-SNE, U-MAP...) can help figuring out possible mismatches between the two distributions (the one of generated samples vs. the target one)
- statistics or statistical tests on these distributions may help also expressing how different they are, and spotting areas of possible improvement
- joint statistical tests, such as proportion of nearest neighbors that fall in the same dataset vs. in the other one, gives also precious information
- to parameterize the sampling laws and optimizing the associated hyper-parameters. For this, a proper criterion ("loss" in machine learning terms) needs to be designed, e.g. based on the concepts above, to describe a metric between distributions, and a suitable optimization technique needs to be chosen and utilized.

Keywords: sampling, distributions, statistics, relatively high dimension, dimension reduction, optimization, visualization


Required: knowledge in statistics, in particular sampling, distributions, high-dimensional issues
- data science skills are a plus
- numerical optimization skills are a plus
- machine learning skills are a plus, but not required

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage


Gross Salary: 2746 € per month