PhD Position F/M PhD position: Steering formal reasoning problems generation for LLM reasoning improvement

Le descriptif de l’offre ci-dessous est en Anglais

Type de contrat : CDD

Niveau de diplôme exigé : Bac + 5 ou équivalent

Fonction : Doctorant

A propos du centre ou de la direction fonctionnelle

The Inria University of Lille centre, created in 2008, employs 360 people including 305 scientists in 15 research teams. Recognised for its strong involvement in the socio-economic development of the Hauts-De-France region, the Inria University of Lille centre pursues a close relationship with large companies and SMEs. By promoting synergies between researchers and industrialists, Inria participates in the transfer of skills and expertise in digital technologies and provides access to the best European and international research for the benefit of innovation and companies, particularly in the region.

For more than 10 years, the Inria University of Lille centre has been located at the heart of Lille's university and scientific ecosystem, as well as at the heart of Frenchtech, with a technology showroom based on Avenue de Bretagne in Lille, on the EuraTechnologies site of economic excellence dedicated to information and communication technologies (ICT).

Contexte et atouts du poste

Large Language Models (LLMs) are trained to predict missing words in many situations, which leads them to absorb knowledge, natural language structure, and some (brittle) algorithmic problem-resolution capabilities.

By contrast, symbolic AI matured efficient algorithms to reliably solve various narrow problems (first order logic, modal logics, planning, constraint satisfation...), but it is challenging to successfully apply them in real world problems requiring natural language understanding and knowledge that is hard to formalize.

The goal of the Adada project is to construct reasoning examples to infuse symbolic AI into large language models. To do so, we will formalize a general problem generation framework and instantiate multiple type of symbolic problems generators. We will use existing symbolic solvers to obtain solutions and fine-tune language models to match the solver ouputs.

We will start problem generations using simple grammars (e.g. context free grammars). However, most generated problems will be junk (intractable, redundant, or trivial problems). To address this, we will define the desirable properties of generated problems, and we will steer problem generations toward desirable problems with machine learning techniques (guided generation, efficient language models).

This will enable an adaptive dataset generation, that will prevent dataset obsolescence and personalize dataset generation to specific applications or to specific models (newer/larger models need harder tasks). This PhD student position will be supported by the Adada ANR project (Adaptive datasets for LLM reasoning enhancement).

Mission confiée

This PhD student will collaborate with Damien Sileo and the Adada consortium (engineers, and interns)

The PhD student should work on designing new methods for steerable problem generation ( This is related to data value generation: https://arxiv.org/abs/1909.11671 )

The core problem is to steer a sampling process to produce data points that are different from each other, and that are also interesting (good level of difficulty, close to real world tasks)

For example, it is easy to generate logic problems that are hard to solve for LLMs, e.g. parity problems at scale (does ~~~~~~~p entail p ?) But these problems are difficult for LLMs but not very interesting.

Principales activités

Survey existing research

Participate to the construction of formal synthetic problem generators (starting with context free grammars, but also using language models for guidance, with efficiency considerations)

Formalize contextual problem value steerable generation (This problem is related to data value generation: https://arxiv.org/abs/1909.11671 )

Formulate research questions, design, and conduct controlled experiments

Evaluate generation strategies on multiple external downstream tasks

Write articles and disseminate research results

Compétences

Languages : English (french not mandatory)

Programming language: Python

Deep learning and statistics background

Knowledge of logic and symbolic AI is appreciated

Avantages

Subsidized meals
Partial reimbursement of public transport costs
Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
Possibility of teleworking and flexible organization of working hours
Professional equipment available (videoconferencing, loan of computer equipment, etc.)
Social, cultural and sports events and activities
Access to vocational training
Social security coverage

Rémunération

1st and 2nd year : 2100 € (gross monthly salarye)

3rd year : 2190 € (gross monthly salary)

Postuler à cette offre

Informations générales

Thème/Domaine : Représentation et traitement des données et des connaissances
Statistiques (Big data) (BAP E)
Ville : Villeneuve d'Ascq
Centre Inria : Centre Inria de l'Université de Lille
Date de prise de fonction souhaitée : 2025-01-01
Durée de contrat : 3 ans
Date limite pour postuler : 2024-12-11

Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.

Consignes pour postuler

Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.

Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.

Contacts

Équipe Inria : MAGNET
Directeur de thèse :
Sileo Damien / damien.sileo@inria.fr

L'essentiel pour réussir

Strong knowledge of deep learning and ideally reinforcement learning

Autonomy, critical thinking, willingness to tackle hard problems

Interest in formal algorithms

Strong scientific background

Knowledge of NLP

A propos d'Inria

Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'eﬀorce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.