Internship - Formal reasoning examples generation for large language model training (M/F)
Contract type : Internship
Level of qualifications required : Master's or equivalent
Other valued qualifications : Msc preferably last year
Fonction : Internship Research
Level of experience : Recently graduated
About the research centre or Inria department
The Inria University of Lille centre, created in 2008, employs 360 people including 305 scientists in 15 research teams. Recognised for its strong involvement in the socio-economic development of the Hauts-De-France region, the Inria University of Lille centre pursues a close relationship with large companies and SMEs. By promoting synergies between researchers and industrialists, Inria participates in the transfer of skills and expertise in digital technologies and provides access to the best European and international research for the benefit of innovation and companies, particularly in the region.For more than 10 years, the Inria University of Lille centre has been located at the heart of Lille's university and scientific ecosystem, as well as at the heart of Frenchtech, with a technology showroom based on Avenue de Bretagne in Lille, on the EuraTechnologies site of economic excellence dedicated to information and communication technologies (ICT)
Context
Large Language Models (LLMs) are trained to predict missing words in many situations, which leads them to absorb knowledge, natural language structure, and some (brittle) algorithmic problem- resolution capabilities.
By contrast, symbolic AI matured efficient algorithms to reliably solve various narrow problems (first order logic, modal logics, planning, constraint satisfation...), but it is challenging to successfully apply them in real world problems requiring natural language understanding and knowledge that is hard to formalize.
The goal of the Adada project is to construct reasoning examples to infuse symbolic AI into large language models. To do so, we will formalize a general problem generation framework and instantiate multiple type of symbolic problems generators. We will use existing symbolic solvers to obtain solutions and fine-tune language models to match the solver ouputs.
We will start problem generations using simple grammars (e.g. context free grammars). However, most generated problems will be junk (intractable, redundant, or trivial problems). To address this, we will define the desirable properties of generated problems, and we will steer problem generations toward desirable problems with machine learning techniques (guided generation, efficient language models).
This will enable an adaptive dataset generation, that will prevent dataset obsolescence and personalize dataset generation to specific applications or to specific models (newer/larger models need harder tasks).
Assignment
Interns will work on dataset generation on a specific topic that depends on their interests, which can include: Modal logic or non standard-logic, planning, first order logic with external knowledge, mathematics (with Lean), formal language processing (grammar induction), symbolic regression, simple visual reasoning.
Main activities
- Survey existing research
- Construct synthetic dataset generators
- Evaluate LLM on them (zero-shot and after fine-tuning)
Skills
- Python
- English (french optional)
- Solid background on formal reasoning (symbolic AI) or math/probability theory
- Language interest (formal semantics appreciated)
"LLMs skills" are not enough and not mandatory, this position is mostly about formal semantics and formal reasoning applied to LLMs
Benefits package
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
General Information
- Theme/Domain : Data and Knowledge Representation and Processing
- Town/city : Villeneuve d'Ascq
- Inria Center : Centre Inria de l'Université de Lille
- Starting date : 2024-12-01
- Duration of contract : 6 months
- Deadline to apply : 2024-12-08
Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.
Instruction to apply
Defence Security :
This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.
Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people with disabilities.
Contacts
- Inria Team : MAGNET
-
Recruiter :
Sileo Damien / damien.sileo@inria.fr
About Inria
Inria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, often at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact.