PhD Position F/M Study of the estimation and control principle for Markov decision processes (IDP 2024)

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : PhD Position


The Inria team Astral is a joint Inria-Naval Group project team, Naval Group being a French industrial group specializing in naval defense construction. With this thesis, we aim to carry out exploratory and preparatory theoretical studies that could have an impact on the work carried out with Naval Group, without however having any guarantee of direct applications in the short or medium term.


Markov decision processes are non-diffusive stochastic processes whose defining parameters (jump rates, transition measures, flows) have a variable on which one can act in such a way that it is hoped to be able to control the process to achieve a certain goal. In practice, these processes may depend on parameters that are a priori unknown and whose value one may want to estimate. If one also seeks to control the process in real-time, this estimation must then also be done in real-time, and our decision-making must adapt to the current estimate of the parameters, and the principle of estimation and control comes into play, linking the choice of estimators to that of strategies, and vice versa.

The principle of adaptation and control for discrete-time Markov decision processes has been the subject of numerous studies. These studies show that the class of minimum contrast estimators constitutes a class of estimators allowing the estimation of the parameters of the observed process at the same time as its control via the construction of asymptotically optimal policies, at least for the criterion of total reward with discount factor (and also when the time horizon is finite).

Depending on the candidate's profile, theoretical or practical aspects should be developed in this area of research.

Main activities

Here are a few lines of theoretical research that could be studied:
- the assumptions made in [H12] about the characteristics of the process need to be weakened to cover more numerous situations.
- the asymptotic properties of the proposed estimators need to be studied more in depth.
- the central limit theorem has not been obtained for these estimators and thus deserved to be studied.
- the work initiated in [Maigret79] around the large deviations principle deserves to be explored further and extended to the context of Markov decision processes.

In this vein, we have recently obtained results that extend the principle of estimation and control to the framework of continuous-time Markov decision processes, see [CG23,CDG23]. The research program presented in discrete time is of course also to be developed in this technically more demanding context, which is notably due to the presence of forced jumps at the boundary.

During this thesis, the practical aspect should not be neglected: the numerical implementation of the studied estimators and the obtained optimal policies will allow the illustration of their properties. This is an important point that will demonstrate the usefulness of theoretical studies and developed methods. In this context, one can look at various classic problems related to target tracking, as explained in [Zhang17], which can be modelled using Markov decision processes with adaptation.

[CG23] Costa, O., \& Dufour, F. (2023). Adaptive discounted control for piecewise deterministic Markov processes. Journal of Mathematical Analysis and Applications, 127517.
[CDG23] Costa, O., Dufour, F. \& Génadot, A. (2023). Minimum Contrast Estimators for Piecewise Deterministic Markov Processes. Soumis.
[Maigret79] Maigret, N. (1979). Majorations de Chernoff et statistique séquentielle pour des chaînes de Markov récurrentes au sens de Doeblin. Astérisque, 68, 125-142.
[H12] Hernández-Lerma, O. (2012). Adaptive Markov control processes (Vol. 79). Springer Science $\&$ Business Media.
[Zhang17] Zhang, H., Dufour, F., Anselmi, J., Laneuville, D., \& Nègre, A. (2017). Piecewise optimal trajectories of observer for bearings-only tracking by quantization. In 2017 20th International Conference on Information Fusion (Fusion) (pp. 1-7). IEEE.


The candidate should have a solid background in probability theory and notably in the theory of Markov processes. Previous experience of a course in control theory (deterministic or stochastic) would be a plus. The ability to develop numerical examples is also expected.

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Possibility of teleworking and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage


  • 2100€ / month (before taxs) during the first 2 years,
  • 2190€ / month (before taxs) during the third year.