Post-Doctoral Research Visit F/M Machine Learning for translation between formal mathematics libraries.

Contract type : Fixed-term contract

Level of qualifications required : PhD or equivalent

Fonction : Post-Doctoral Research Visit

About the research centre or Inria department

The Inria research centre in Lyon is the 9th Inria research centre, formally created in January 2022.  It brings together approximately 300 people in 17 research teams and research support services.

Its staff are distributed in Villeurbanne, Lyon Gerland, and Saint-Etienne.

The Lyon centre is active in the fields of software, distributed and high-performance computing, embedded systems, quantum computing and privacy in the digital world, but also in digital health and computational biology.

Context

The postdoc will take place within the framework of the défi Inria LLM4Code, which objectives are to apply machine learning techniques to the automatic generation of code, and in particular for proofs assistants libraries, such as Coq libraries.

The research will be carried out in the LIP laboratory of ENS Lyon under the local supervision of Cyril Cohen, within the CASH team. However, there will be frequent meetings, and regular in person meetings, with participants from the other work packages of the LLM4Code project.

Travel expenses are covered within the limits of the scale in force.

Assignment

Assignments:

With the help of people from the projects and their collaborators, the recruited person will work on the automated translations from, to and between libraries of formalized mathematics using modern AI techniques, and in particular Large Language Models.

Such translations may require nontrivial alteration of the proof assistants at use, such as the creation of plugins, the use of meta-programming, or rephrasing libraries in ways that are better suited for machine learning.

It will also require to set up training, fine-tuning and/or RAG on existing models, and evaluate the relevance of each method.

The expected outcome of this work are research publications and software production.

For a better knowledge of the proposed research subject:

The state of the art, bibliography and scientific references are fast evolving. We suggest to read the recent survey A Survey on Deep Learning for Theorem Proving by Zhaoyu Li, Jialiang Sun, Logan Murphy, Qidong Su, Zenan Li, Xian Zhang, Kaiyu Yang, Xujie Si, with particular focus on publications about Coq or Lean.

Collaboration:

The recruited person will be mainly in connection with Cyril Cohen, but will also work remotely with other members of the LLM4Code project.

Responsibilities:

The person recruited is responsible for setting up the software infrastructure to conduct the experiments and take initiatives for the choices of software, and available hardware, within the limits allowed by Inria.

Main activities

Main activities:

  • Determine how to train, finetune, or use RAG techniques on, existing models, and combine these methods, to achieve the desired translation.
  • Setup the infrastructure to implement and reproduce these methods.
  • Implement extensions of Coq or Lean in order to extract relevant data.
  • Document the resulting software.
  • Collect outputs, run benchmarks and publish scientific papers.

Additional activities:

  • Design user interfaces to make the resulting tool widely usable.
  • Determine a long term maintainance scheme for these.

Skills

Technical skills and level required :

  • Know the basics of at least one proof assistant, preferably one based on dependent type theory.
  • Understand the difference between various machine learning techniques.
  • Have hands on practice in training models.

Languages :

  • Advanced English.

Relational skills :

  • Communicate about your results.
  • Work autonomously, while talking about strategic decisions with other members of the project.
  • Listen to feedback.

Other valued appreciated :

  • Expertise in LLM.
  • Expertise in Coq or Lean.
  • Git, GitHub, Hugging Face, WandB, and other platforms for code sharing, model sharing, etc.

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (90 days / year) and flexible organization of working hours (except for intership)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage under conditions

Remuneration

2788 € gross salary / month