Post-Doctoral Research Visit F/M Local search for combinatorial multi-armed bandits

Contract type : Fixed-term contract

Level of qualifications required : PhD or equivalent

Fonction : Post-Doctoral Research Visit

Level of experience : Recently graduated

About the research centre or Inria department

The Inria Centre at Rennes University is one of Inria's eight centres and has more than thirty research teams. The Inria Centre is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc

Context

The person will be recruited into the LACODAM team at the Inria Rennes research center. This research center is distinguished by its excellence in digital research and technological innovation, driving major advances in fields such as artificial intelligence, cybersecurity, and digital health, while working closely with industrial and academic partners to shape the technologies of tomorrow.

The recruited individual will be supervised by Élisa Fromont and Romaric Gaudel, and will collaborate with Paul Viallard. All three are researchers in Machine Learning with publications in leading conferences on the theoretical aspects of Machine Learning, particularly on multi-armed bandit problems.

The goal of the project is to solve a class of combinatorial optimization problems with multi-armed bandit feedback through approaches based on local search techniques. The envisioned approach falls within the framework of reinforcement learning and significantly reduces both the regret and computational cost associated with finding an optimal policy.

The specific application problem under consideration is the selection of a connected subgraph in order to maximize flow. More precisely, we are faced with a set of nodes that wish to communicate and a set of possible connections between these nodes. The objective is to maximize the communication flow while minimizing the number of edges between the nodes. Therefore, the task is to identify the edges to retain for this purpose.

This research project is funded as part of a partnership agreement between Inria and the Defense Innovation Agency of the French Ministry of the Armed Forces.

Assignment

This optimization problem is part of the family of Survivable Network Design (SND) problems [1]. Our work on the subject is distinguished by the fact that

  • (i) the parameters of the underlying optimization problem are unknown,
  • (ii) at each time step, the algorithm selects a sub-network and observes the flow on this sub-network, and
  • (iii) the goal of the algorithm is to maximize the observed average flow.

In this context, the algorithm estimates the parameters of the underlying optimization problem based on the behavior of the flow observed at each time step. However, since the information gathered is limited to the edges of the chosen sub-network, we are faced with two opposing options: selecting, at each time step, a sub-network that is effective according to the data collected so far, or "exploring" (i.e., selecting) a sub-network that includes edges that have been little tested to check their properties. This corresponds to the framework of multi-armed bandit problems [2] in a combinatorial space.

The multi-armed bandit problem is fundamental in reinforcement learning. It models a wide range of applications in which an artificial agent selects an action at each time step and receives a random reward that depends on that action. The distribution of these rewards is unknown, and the agent must learn it based solely on the rewards it receives. This framework covers, for example, the selection of information to display on a webpage or the recommendation of videos.

The resolution of combinatorial problems, such as the traveling salesman problem and logistics problems (in the presence of uncertainty about the problem's parameters), fits perfectly into this framework. However, these problems are less addressed in the state of the art because they are more difficult to solve, particularly due to the very large number of options to consider. The project will therefore propose dedicated multi-armed bandit algorithms for the SND problem.

References

  • [1] Lap Chi Lau, Joseph (Seffi) Naor, Mohammad R. Salavatipour, Mohit Singh. Survivable Network Design with Degree or Order Constraints. STOC’07.
  • [2] Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47(2): 235–256.
  • [3] Camille-Sovanneary Gauthier, Romaric Gaudel, Elisa Fromont, Aser-Boammani Lompo. Parametric Graph for Unimodal Ranking Bandit. ICML'21
  • [4] Camille-Sovanneary Gauthier, Romaric Gaudel, Elisa Fromont. UniRank: Unimodal Bandit Algorithm for Online Ranking. ICML'22
  • [5] Richard Combes and Alexandre Proutière. Unimodal bandits: Regret lower bounds and optimal algorithms. ICML'14

Main activities

The project will first propose a multi-armed bandit algorithm based on an approximate optimization method for the SND problem.

The project will then study an algorithm based on local search.

The algorithms will be designed, proven, implemented, and tested.

Skills

Required skills:

  • prior experience in machine learning,
  • being a proficient programmer,
  • being comfortable with mathematical formalism,
  • and not being intimidated by mathematical proofs.

Knowledge of multi-armed bandits or reinforcement learning is appreciated.

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration

Monthly gross salary : 2788€