Contract type : Fixed-term contract
Level of qualifications required : Graduate degree or equivalent
Fonction : PhD Position
Level of experience : Recently graduated
About the research centre or Inria department
The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.
As Artificial Intelligence has recently gained an unprecedented momentum in a rapidly increasing number of application areas, Deep Neural Networks (DNN) are becoming a pervasive tool across a large range of domains, including autonomous driving vehicle, industrial automation, and pharmaceutical research to name just a few.
As these neural network architectures and their training data are getting more and more complex, so are the infrastructures that are needed to execute them sufficiently fast. Hyperparameter setting and tuning, training, inference, dataset handling are operations that are all putting a growing pressure on the underlying compute infrastructure and call for novel approaches at all levels of the workflow, including the algorithmic level, the middleware and deployment level, and the resource optimization level.
This thesis is proposed as part of a collaborative project established between Inria and DFKI (the German Research Center for Artificial Intelligence). The goal of this project is to leverage efficient collaboration of experts in the AI and HPC areas to address the following specific research questions:
- How can we deal with situations where training or validation data is not available in sufficient quantity or quality, which is the case (1) if the generation of data is expensive because of sample creation or measurement costs, (2) if the cost of manual data annotation constitutes an infeasible effort, (3) if the natural occurring data distribution is unfavorable, i.e. highly relevant situations occur only rarely, or (4) if a phenomenon has been predicted in theory, but not yet observed. Our answer here will include the concepts of parametric models and simulations (also known as Digital Reality).
- How can the various possible deployment options of complex AI workflows on the available underlying infrastructure impact performance metrics? How can this infrastructure be best leveraged in practice, potentially through seamless integration of supercomputers, clouds, and fog/edge systems?
This thesis is focusing on the second question
The thesis will focus on the middleware and the deployment level. Our objective is to investigate various deployment strategies for complex AI workflows (e.g., potentially combining online training, simulations and inference, all in parallel and in real-time) on hybrid execution infrastructures (e.g., combining supercomputers and cloud/fog/edge systems). This requires scalable and reliable experimentation tools. To this purpose, an important objective is to propose methodologies and supporting tools enabling researchers to:
- describe in a representative way the application behavior;
- reproduce it in a reliable, controlled environment for extensive experiments, and
- understand how the end-to-end performance of applications is correlated to various algorithm-dependent or infrastructure-dependent factors.
The main expected outcomes are: (1) publications describing an experimental, reproducibility-oriented methodology, its validation in practice through novel insights it can enable, potentially leading to novel algorithms for parallel/continual learning and inference across the computing continuum; (2) associated underlying algorithms and 3) an adequate software framework for experiment deployment, monitoring, and execution at scale on various relevant scalable infrastructures (e.g., on experimental platforms such as Grid’5000 in a first stage and on hybrid infrastructures including pre-exascale HPC platforms in a second stage).
To address these challenges, the thesis will leverage the E2Clab approach [E2Clab2020, Ros2020] initiated in the KerData team at Inria to address the needs of experimentation of workloads involving online parallel learning and inference. In addition to potential parallelization strategies for learning and inference tasks, our goal is to enable reproducible experimentation of complex AI workloads across hybrid infrastructures and help optimize deployment strategies depending on multiple factors including the application characteristics, the target performance metrics and the features of the available execution hardware. The goal is to answer questions like: How can the various possible deployment options of complex AI workflows on the available underlying infrastructure impact performance metrics? How can this infrastructure be best leveraged in practice, potentially through seamless integration of supercomputers, clouds, and fog/edge systems?
International visibility and mobility
The thesis will be conducted in strong collaboration with several partners including DFKI (contact: René Schubotz), where a pair PhD position will be provided, and Argonne National Lab, USA (contact: Bogdan Nicolae). The thesis may include long research stays (1-3 months) at the partners’s teams, for joint collaborative work.
How to apply?
In parallel to the online submission on the Inria web site, please send an email with a cover letter, CV, contact address of at least two references (internship, teacher in a related field, …) and copies of degree certificates to Dr. Gabriel Antoniu (firstname.lastname@example.org) and Dr. Alexandru Costan (email@example.com). Incomplete applications will not be considered nor answered.
[Ros2020] Daniel Rosendo, Pedro Silva, et al. (2020) E2Clab: Exploring the Computing Continuum through Repeatable, Replicable and Reproducible Edge-to-Cloud Experiments. Cluster 2020 - IEEE International Conference on Cluster Computing, Sep 2020, Kobe, Japan.
[E2Clab2020] The E2Clab project: https://team.inria.fr/kerdata/e2clab/.
[G5K] The Grid’5000 experimental testbed: https://www.grid5000.fr/w/Grid5000:Home.
[Sahoo2017] Doyen Sahoo, Quang Pham, Jing Lu, Steven C.H. Hoi. Online Deep Learning: Learning Deep Neural Networks on the Fly. 2017. https://arxiv.org/abs/1711.03705
- Advanced knowledge of computer networks, machine learning and distributed systems
- Ability and motivation to conduct high-quality research, including publishing the results in relevant venues
- Strong programming skills (e.g. C/C++, Java, Python).
- Working experience in the areas of machine learning, HPC, distributed systems is an advantage
- Subsidised catering service
- Partially-reimbursed public transport
monthly gross salary amounting to 1982 euros for the first and second years and 2085 euros for the third year
- Theme/Domain :
Distributed and High Performance Computing
System & Networks (BAP E)
- Town/city : Rennes
- Inria Center : CRI Rennes - Bretagne Atlantique
- Starting date : 2021-10-01
- Duration of contract : 3 years
- Deadline to apply : 2021-06-08
The keys to success
- An excellent Master degree in computer science or equivalent (e.g. engineering)
- Very good communication skills in oral and written English.
- Open-mindedness, strong integration skills and team spirit
Inria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, often at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact.
Instruction to apply
Please submit online : your resume, cover letter and letters of recommendation eventually
In parallel to the online submission on the Inria web site, please send an email with a cover letter, CV, contact address of at least two references (work collaborator, internship advisor, teacher in a related field, …) and copies of degree certificates to Dr. Gabriel Antoniu (firstname.lastname@example.org) and Dr. Alexandru Costan (email@example.com).
Defence Security :
This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.
Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people with disabilities.
Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.