R&D Engineer - Exascale storage

Contract type : Fixed-term contract

Renewable contract : Yes

Level of qualifications required : PhD or equivalent

Fonction : Temporary scientific engineer

About the research centre or Inria department

Inria, the French national research institute for the digital sciences, promotes scientific excellence and technology transfer to maximise its impact.
It employs 2,400 people. Its 200 agile project teams, generally with academic partners, involve more than 3,000 scientists in meeting the challenges of computer science and mathematics, often at the interface of other disciplines.
Inria works with many companies and has assisted in the creation of over 160 startups.
It strives to meet the challenges of the digital transformation of science, society and the economy.

Context

About Inria, the team and the position

Inria is the only French public research body fully dedicated to computational sciences. Inria's missions are to produce outstanding research in the computing and mathematical fields of digital sciences and to ensure its impact on the economy and society through technology transfer and innovation. Throughout its 8 research centres and its approximately 200 project teams, Inria has a workforce of 3 400 scientists with an annual budget of 265 million euros, 29% of which coming from its own resources. Inria Rennes Bretagne-Atlantique is one of the eight sites of Inria. This publicly funded research center has a workforce of about 620 people, including full-time research scientists, faculty staff, engineers and support staff, distributed in 33 teams and support services. 

The hired engineer will be a member of the KerData Inria team (https://team.inria.fr/kerdata/) led by Gabriel Antoniu. KerData is a joint research team of Inria Rennes - Bretagne Atlantique and INSA Rennes, and also a team of the IRISA lab. KerData's main research activities address the area of distributed data management at challenging scales, with a recent focus on hybrid (supercomputer/cloud/edge) infrastructures.

Developed by the KerData research team in the context of the NumPEx program (https://numpex.org/), FIVES (https://github.com/fives-simulator/fives) is a storage resource scheduling simulator for supercomputers based on WRENCH and SimGrid, two state-of-the-art simulation frameworks. In particular, FIVES can model a parallel file system such as Lustre, a computing partition, and simulate a set of jobs performing I/O on the resulting HPC system. This simulator is the result of a collaboration with the University of Manoa (HI, USA) and uses I/O execution traces from computing centers such as Argonne National Laboratory and NCSA in the US.

Assignment

 Mission overview

By joining our team you will participate in a dynamic work environment with exceptionally talented and friendly coworkers who are committed to high-quality research and development practices. You will collaborate with esteemed researchers from around the world by taking the technical responsibility for the development of the FIVES software, with the following global goals:

  • Make FIVES evolve towards a distributable, professional-quality software (technical support, documentation, management of the web site);
  • Interact with potential users and build demonstrators with the goal to increase FIVES's visibility and adoption.

Main activities

Detailed missions

 - Test and improve the FIVES code, build non-regressive robustness and performance tests, set up a continuous code integration process;

- Develop the incipient documentation by writing a complete and up-to-date documentation (reference manual, user manual);

- Extend FIVES to support emerging storage systems and to provide locality and power consumption metrics;

- Design and maintain a profesional-quality web site facilitating the distribution of the code and of its documentation.

Skills

Required qualifications

  • Excellent, demonstrated programming skills in Python (including libraries for parallel processing in Python, e.g., Ray, Dask) and C++;
  • Very good knowledge of hardware and software technologies in the areas of distributed computing;
  • Experience with HPC systems;
  • Very good knowledge of methodologies for managing software projects;
  • Ability to analyze and synthesize user requirements;
  • Ability to communicate and work in collaboration with experts in the same area and in other areas, in English;
  • Autonomy in leading and performing the tasks;
  • Sense of partnership and team spirit;
  • Taste for transmitting and sharing knowledge, results, progress;
  • Facility to present the results in written and oral form.

Benefits package

  • Subsidised catering service
  • Partially-reimbursed public transport

Remuneration

monthly gross salary from 2979 euros according to diploma and experience