PhD Position F/M LLM-Powered Continuous Evolution of Scientific Computing Software
Type de contrat : CDD
Niveau de diplôme exigé : Bac + 5 ou équivalent
Fonction : Doctorant
A propos du centre ou de la direction fonctionnelle
The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.
Contexte et atouts du poste
Environment
The candidate will be involved in the DiverSE team, joint to the CNRS (IRISA) and Inria, and in the Laboratory in High Performance Computing for Calculation and Simulation (LiHPC) of CEA DAM, affiliated to the University of Paris-Saclay. It will be supervised by Benoit Combemale ( https://people.irisa.fr/Benoit.Combemale/) and Djamel Khelladi ( http://people.irisa.fr/Djamel-Eddine.Khelladi/) from Inria, and Dorian Leroy from CEA DAM. The candidate can be either at Inria in Rennes or CEA DAM in Bruyère le chatel, and visit regularly the other site.
The PhD will be funded by the NumPeX program ( https://numpex.org/).
Mission confiée
The mission of this thesis revolves mainly around conducting research of excellence, which the DiverSE team strives to achieve.
A state-of-the-art review will be one of the first activities in order to better prepare the ground for the implementation of solutions and prototypes, as well as for conducting empirical experiments for a rigorous evaluation of contributions.
Principales activités
Context
Marc Andreessen argued about “Why Software Is Eating The World” in the WSJ [1]. This is also true for scientific computing that levarages on computing capabilities to understand and solve complex problems, in science (chemistry, physics, maths, biology…), industry (health, space, aeronautics, etc.) and public authorities.
In scientific computing, there is a significant disconnect between the lifetime of physics simulation codes (~20 years), HPC programming paradigms (~10 years), and supercomputers (<5 years). This puts a heavy burden on the developers of these applications, as they are primarily physicists and numerical analysts, but nevertheless have to address software engineering and high performance computing (HPC) concerns when coding, and keep pace with advances in those fields [2].
To achieve proper separation of concerns, the use of domain-specific languages (DSLs) tailored to the needs of the domain experts [3] is a promising perspective to allow physicists and numerical analysts to address concerns specific to their domains, while the language developers address the software engineering and HPC concerns.
However, to integrate and experiment with cutting-edge advances in software engineering and HPC, it is not feasible to start from scratch or manually rewrite the existing code due to the extensive lifetime and size of physics simulation codes. Worst, as both software and hardware capabilites continously evolve, one must continously update and maintain (i.e., co-evolve) its code as a consequence. Unfortunately, this task is still manual and is a burden for developers.
Today, AI advancement showed promising results and plethora of LLMs are re-shaping developers daily activity [4]. This is no different in scientific computing and opens up several perspectives and opportunities for automation.
Objectives
The objective of this PhD thesis is to provide building blocks enabling the rapid evaluation and adoption of cutting-edge advances in software engineering and HPC, in the context of scientific computing software, and simulation codes in particular, by leveraging LLMs.
The overall objectives are, beyond a survey of the state of the art [5, 6] on this topic and in adjacent contexts (i.e., non-scientific software), to explore the feasibility of powering continuous code evolution with LLMs. Among the many existing challenges, we aim to:
- Investigate the balance between human intervention and automation required for this task (e.g., rewrite some parts by hand to kickstart the automated process).
- Investigate and explore what kind of “evolution harness” must be built around the application, and to what extent this can be automated.
- Experiment various LLM pipelines.
- Investigate and explore how to enable incremental evolution (e.g., composition of components, interoperability), in particular in the case where the target for evolution is another language: the complete application can’t be evolved all at once, and each evolution increment must be validated. This will also help in the scalability challenge of evolving large complex code.
- Investigate the extraction of evolutions at the language level from evolutions at the source code level, such as identifying emerging language constructs from source code evolutions.
Prerequisites
- A degree (and strong background) in computer science (esp. software engineering)
- Skills in programming and modeling languages, and supporting environments
- Interests in machine learning (esp. LLMs)
- Professional proficiency in English
- Skills for presenting and writting
- Autonomy, rigor and hard worker
References
[1] Marc Andreessen, https://www.wsj.com/articles/SB10001424053111903480904576512250915629460, https://a16z.com/why-software-is-eating-the-world/
[2] Leroy, D., Sallou, J., Bourcier, J., & Combemale, B. (2021). When scientific software meets software engineering. Computer, 54(12), 60-71.
[3] Fowler, M. (2010). Domain-specific languages. Pearson Education.
[4] Ishaani, M., Omidvar-Tehrani, B., & Anubhai, A. (2024). Evaluating human-AI partnership for LLM-based code migration.
[5] Busch, D., Bainczyk, A., & Steffen, B. (2023, October). Towards LLM-Based System Migration in Language-Driven Engineering. In International Conference on Engineering of Computer-Based Systems (pp. 191-200). Cham: Springer Nature Switzerland.
[6] Almeida, A., Xavier, L., & Valente, M. T. (2024). Automatic Library Migration Using Large Language Models: First Results. arXiv preprint arXiv:2408.16151.
How to apply
Send your CV, motivation letter, and grades of your bachelor and master with the diplomas.
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
Rémunération
monthly gross salary 2300 euros
Informations générales
- Thème/Domaine :
Programmation distribuée et génie logiciel
Ingénierie logicielle (BAP E) - Ville : Rennes
- Centre Inria : Centre Inria de l'Université de Rennes
- Date de prise de fonction souhaitée : 2026-02-02
- Durée de contrat : 3 ans
- Date limite pour postuler : 2025-12-19
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postuler
Please submit online : your resume, cover letter and letters of recommendation eventually
Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
Contacts
- Équipe Inria : DIVERSE
-
Directeur de thèse :
Khelladi Djamel / djamel-eddine.khelladi@irisa.fr
L'essentiel pour réussir
- Striving for excellence
- Skills in programming and modeling languages, and supporting environments
- Interests in machine learning (esp. LLMs)
- Professional proficiency in English
- Skills for presenting and writting
- Autonomy, rigor and hard worker
A propos d'Inria
Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.