PhD Position F/M Ph.D. W/M: Robust approaches in sequential decision and learning problems.

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : PhD Position

About the research centre or Inria department

The  Inria University of Lille centre, created in 2008, employs 360 people  including 305 scientists in 15 research teams. Recognised for its strong  involvement in the socio-economic development of the Hauts-De-France  region, the Inria University of Lille centre pursues a close  relationship with large companies and SMEs. By promoting synergies  between researchers and industrialists, Inria participates in the  transfer of skills and expertise in digital technologies and provides  access to the best European and international research for the benefit  of innovation and companies, particularly in the region.For more  than 10 years, the Inria University of Lille centre has been located at  the heart of Lille's university and scientific ecosystem, as well as at  the heart of Frenchtech, with a technology showroom based on Avenue de  Bretagne in Lille, on the EuraTechnologies site of economic excellence  dedicated to information and communication technologies (ICT).


Multi-armed bandit theory has witnessed tremendous progress over the last decade, yielding algorithms achieving
strong learning guarantees  (regret minimization, best-arm identification) in increasingly challenging context involving sequential decision-making in uncertain environment. In particular, recent works obtained non-parametric
optimal algorithms, enabling application of multi-armed bandit to a large range of applications when reward distributions are not easily modelled with classical families.

Recently, a complementary bandit model considering Huber-outlier distributions (mixture between an parametric distribution of interest and an arbitrary one) has been studied, offering an interesting complementary perspective  compared to non-parametric assumptions and modelling arbitrarily bad outlier observations, something often encountered in real applications.

This Ph.D. will focus on the problem of model misspecification in the context, i.e. we study the problem of sequential learning when some context, or feature, vector is available but we do not want to force strong assumptions on the context and assume only nonparametric or even corrupted model on the context. To summarize, we want to study both Regret minimization or Best-arm identification objectives, in the contextual setup assuming either Huber-outlier or Non-parametric distributions.


The Ph.D. focuses on basic research, theory of multi-armed bandits, nonparametric and robust statistics, and for this reason the candidate will have to get familiar with both the litterature in multi-armed bandit and the litterature of nonparametric and robust statistics.

Main activities

The suggested plan of this academic thesis consists in execution of three tasks or work packages, mainly oriented on development of fundamental approaches. Each will be accompanied by publications in international top-rated journals and conferences, in the spirit of open and reproducible science, open-source code, etc.

Task 1: Exploitation of uncertain or corrupted context in linear bandits.

When the context is corrupted or uncertain, exploiting the information given by the context is not obvious, and it may even be detrimental to consider a non-informative context as it makes the learning process more complicated. In this first task, we want to study the specific case of linear bandits that feature a simple link between context and reward and is a natural extension from univariate to multivariate in order to begin understanding uncertain contexts in the sequential setting. Complementary to the linear case, bandits with auxiliary information may also be a relevant structure to explore.

Task 2: Sequential optimality in corrupted environment.

To goal is to establish, in a corrupted environment, optimality of an algorithm. Even for the simple problem of estimating a univariate mean in corrupted environment, optimal strategy were discovered only very recently. We propose to extend recent works on optimal estimation in corrupted environment to the problem of sequential estimation in order to better understand the link between sequential setting and the loss of information via corruption. 

Task 3: Implementation of nonparametric and robust methods.

All the algorithms described previously should be implemented and tested, against state-of-the-art algorithms and perform well in practice. By testing on realistic environment (that are already developed by the host team) that have potentially a complex structure, we want to get practical evidence on the interplay between uncertain context and sequential learning to understand better the theoretical questions that should be asked in order to ascertain the efficiency of our algorithm when used in real applications. We are interested in particular in applications to agriculture because agriculture features very uncertain context and up to now the usual bandit strategies struggled to be efficient with such nonparametric problem.


Skills required : Master level mathematic and statistics skills are required. In regard to computer skills, familiarity with Latex, Python and/or R are important.

Languages : English



Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage


1st & 2nd year : 2100 € (grossly salary by month)

3rd year: 2190 € (grossly salary by month)