2023-06335 - PhD Position F/M Contention-Aware Scheduling of Storage Resources on Exascale Systems

The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams.



This thesis is placed in the context of the PEPR NumPEx (https://numpex.fr/), whose goal is to co-design the exascale software stack and prepare applications for the exascale era. This thesis will be co-supervised by Inria and CEA, respectively the Inria center at the University of Rennes and the CEA center at Bruyères-Le-Châtel, near Paris. Beyond the supervision, collaborations within the PEPR with the different laboratories of the consortium are to be expected. 

PhD Advisors

  • François Tessier (Inria KerData team)
  • Gabriel Antoniu (Inria KerData team)
  • Philippe Deniel (CEA)
  • Thomas Leibovici (CEA)

Location and Mobility

The thesis, which will be co-supervised by Inria and CEA, will be hosted by the KerData team at Inria Rennes Bretagne Atlantique and will include regular visits at the CEA Center of Bruyères-le-Châtel. It may also include collaborations with European or/and international partners such as University of Madrid (Spain), University of Bristol (UK) or Argonne National Lab (USA) to name a few.

Nowadays, there are many scientific fields where the need for computing power and data processing capacity goes beyond what current machines can provide. In radio astronomy, for example, the international SKA project aims to create the largest telescope in the world in order to observe a part of the Universe. A very large volume of data is generated at the telescope level and then transits to geo-distributed data centers to be pre-processed (filtering, reduction) in real time at a rate of 10TB/s. The output data is then sent to a supercomputer to be saved and fed into numerical simulations. At this stage, the computing power and storage resources required are such that machines capable of reaching the exascale become necessary. To date, only a few supercomputers such as Frontier at Oak Ridge National Laboratory (USA) have this capability, but in the coming months, new systems will be deployed. However, the efficient use of these systems raises new challenges, especially regarding data management.
Indeed, even though HPC systems are increasingly powerful, there has been a relative decline in I/O bandwidth. Over the past ten years, the ratio of I/O bandwidth to computing power of the top three supercomputers has been divided by 10 while in some scientific computing centers the volume of data stored has been multiplied by 41 [1]. An aspect that accentuates this gap comes from the design of the machines themselves: while it is common for HPC systems to provide exclusive and dynamic access to compute nodes through a batch scheduler, storage resources are usually global and shared by concurrent applications leading to congestion and performance variability [2,3]. To mitigate this congestion, new tiers of memory and storage have been added to recently deployed supercomputers, increasing their complexity. These new tiers can take the form of node-local SSDs, burst buffers or dedicated storage nodes with network-attached storage technologies, to name a few. Harnessing this additional storage capacity is an active research topic but little has been done about how to efficiently provisioning it [4,5].
Thesis proposal

Dealing with this high degree of storage heterogeneity a real challenge for scientific workflows and applications. This PhD thesis aims to address this issue through the point of view of the resource provisioning.

Main activities

Through intelligent scheduling algorithms, the thesis goal is to enable applications and workflows to seamlessly use storage systems [8] on Exascale systems and beyond (Cloud). Multiple criteria can be taken into account further the only resource contention aspect such as financial cost or energy. These algorithms will need to rely on a resource abstraction model that also need to be devised. The evaluation of these algorithms and the implementation of these models will be done in an existing WRENCH-based [6] simulator, called StorAlloc [5], developed in the team. Tools developed by the CEA, including the Robinhood policy engine [7] and the outcomes from the IO-SEA European Project [9] will also be used. For this work, a strong emphasis will be put on international collaborations (University of Manoa (HI, USA) for instance).

The PhD position is mainly based in Rennes, at IRISA/Inria within the KerData research team and regular visits will be organized at the CEA Center near Paris. The selected candidate will have the opportunity to join a very dynamic group in a stimulating work environment with a lot of active national, European and international collaborations as part of cutting-edge international projects in the areas of Exascale Computing, Cloud Computing, Big Data and Artificial Intelligence. The candidate will also have the opportunity to be hosted for 3-6 month internships abroad to strengthen the international visibility of his/her work and benefit from the expertise of other researchers in the field.


  • An excellent Master degree in computer science or equivalent
  • Strong knowledge of distributed systems
  • Knowledge on storage and (distributed) file systems
  • Ability and motivation to conduct high-quality research, including publishing the results in relevant venues
  • Strong programming skills (Python, C/C++)
  • Working experience in the areas of HPC and Big Data management is an advantage
  • Very good communication skills in oral and written English.
  • Open-mindedness, strong integration skills and team spirit

