Engineer Position F/M : Reasoning problems generation for natural language processing

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : Temporary scientific engineer

About the research centre or Inria department

The Inria University of Lille centre, created in 2008, employs 360 people including 305 scientists in 15 research teams. Recognised for its strong involvement in the socio-economic development of the Hauts-De-France region, the Inria University of Lille centre pursues a close relationship with large companies and SMEs. By promoting synergies between researchers and industrialists, Inria participates in the transfer of skills and expertise in digital technologies and provides access to the best European and international research for the benefit of innovation and companies, particularly in the region.

For more than 10 years, the Inria University of Lille centre has been located at the heart of Lille's university and scientific ecosystem, as well as at the heart of Frenchtech, with a technology showroom based on Avenue de Bretagne in Lille, on the EuraTechnologies site of economic excellence dedicated to information and communication technologies (ICT).

Context

Large Language Models (LLMs)  are trained to predict missing words in many situations, which leads them to absorb knowledge, natural language structure, and some (brittle) algorithmic problem-resolution capabilities.

By contrast, symbolic AI matured efficient algorithms to reliably solve various narrow problems (first order logic, modal logics, planning, constraint satisfation...), but it is challenging to successfully apply them in real world problems requiring natural language understanding and knowledge that is hard to formalize.

The goal of the Adada project is to construct reasoning examples to infuse symbolic AI into large language models. To do so, we will formalize a general problem generation framework and instantiate multiple type of symbolic problems generators. We will use existing symbolic solvers to obtain solutions and fine-tune language models to match the solver ouputs.

We will start problem generations using simple grammars (e.g. context free grammars). However, most generated problems will be junk (intractable, redundant, or trivial problems). To address this, we will define the desirable properties of generated problems, and we will steer problem generations toward desirable problems with machine learning techniques (guided generation, efficient language models).

This will enable an adaptive dataset generation, that will prevent dataset obsolescence and personalize dataset generation to specific applications or to specific models (newer/larger models need harder tasks). This PhD student position will be supported by the Adada ANR project (Adaptive datasets for LLM reasoning enhancement). 

 

Keywords: NLP / AI / Reasoning / Logic / Deep Learning / TAL / IA / Raisonnement

 

Assignment

The recruited engineer will collaborate with colleagues in the MAGNET team and the project consortia in general.  

 

Main activities

Software development: create a generic problem generation framework, where users can quickly implement new tasks (e.g. planning, modal logic) and quickly generate/validate training examples

This would require thinking about abstractions and participating to framework implementation with good coding practice

The engineer would call existing LLMs to gauge their accuracy on existing tasks.

The engineer would also participate to implementing new reasoning tasks in the framework, and participate to scientific publications.

Skills

Languages : English (french not mandatory)

Good python coding skills

 

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration

According to profile