PhD Position F/M Exploitation and Structuring of Heterogeneous Geological Data and Knowledge
Contract type : Fixed-term contract
Level of qualifications required : Graduate degree or equivalent
Fonction : PhD Position
Context
This PhD position is in the context of a national collaboration between Inria and BRGM (the French geological survey) on augmenting the scientific process of geologists – more specifically, this position is about exploiting data and knowledge available to geologists before field campaigns, in the form of previous reports, maps, scientific publications, databases, etc., about the location surveyed.
The PhD will be co-supervised by:
- Pierre Senellart (Valda team at Inria Paris, where the PhD student will be primarily located)
- Ioana Manolescu (Cedar team at Inria Saclay)
- Cécile Gracianne (BRGM)
Assignment
Motivation : Careful preparation is essential for organizing a field campaign, encompassing both logistical tasks such as acquiring and readying equipment and arranging travel, as well as scientific considerations. Prior to each field campaign, a preparatory phase is undertaken, utilizing existing knowledge to appropriately scale the data acquisition efforts. This involves aligning the scientific requirements and objectives of the campaign with the project's constraints, including budget, time, and data management. Geologists employ various data sources—whether unstructured, semi-structured, or structured, such as scientific reports, publications, and databases—to enhance their understanding of the study area before selecting and developing the most promising scientific hypotheses for testing or confirmation during the field campaign. Throughout the field campaign, geologists generate data of varying structure based on their observations and measurements. The ability to compare this acquired data with initial hypotheses in real-time during the campaign, rather than upon return, allows for adjustments to the action plan in response to unforeseen constraints, such as inaccessible measurement sites or changes in the relevance of certain points. The PhD focuses on developing tools and methodologies to select, extract, and link the necessary data for field campaign setup while promoting the on-site utilization of acquired data during the campaign.
Challenges : The PhD addresses the question of enhancing the accessibility and reusability of BRGM's wealth of information by endowing it with metadata or restructuring it for more effective utilization. This involves tackling several scientific challenges. There is the intricate task of extracting information from BRGM's diverse document corpora, aiming to efficiently incorporate geographical/spatial and other annotations into the data and documents.
Assignment : The PhD student will be tasked to develop a methodology to automatically build a data warehouse from the information available to geologists. Such a warehouse is multimodal as it mixes text, images, and different forms of structured content. Information extraction techniques will be used to extract data from raw documents (e.g., tables of data values from PDFs; coordinates of specific geological features from maps; identifications of geological layers from schemas) and enrich them with accompanying metadata. Deep learning techniques can be used to construct representations of different modalities, which will then be combined in a global model used for information extraction. Integration of data from different sources and semantization of their content will be performed using Open Information Extraction techniques, in connection with knowledge bases such as Wikidata providing basic knowledge about minerals or locations.
Main activities
Main tasks:
- Acquire an exhaustive understanding of the literature on information extraction, data warehousing, and data semantization.
- Propose and implement approaches to extract, structure, exploit, different data types, taking into account their heterogeneity.
- Evaluate the proposed solution on publicly available and BRGM-specific benchmarks.
- Keep track of the uncertainty and provenance of data items.
- Write scientific research papers with the objective to publish them on top data analytics and data management conferences and journals.
Benefits package
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
General Information
- Theme/Domain : Data and Knowledge Representation and Processing
- Town/city : Paris
- Inria Center : Centre Inria de Paris
- Starting date : 2025-03-01
- Duration of contract : 3 years
- Deadline to apply : 2025-02-21
Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.
Instruction to apply
Defence Security :
This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.
Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people with disabilities.
Contacts
- Inria Team : VALDA
-
PhD Supervisor :
Senellart Pierre / Pierre.Senellart@inria.fr
The keys to success
The doctoral student must have obtained a Master's degree or equivalent in computer science or mathematics. He or she should have had courses and initial research experience in one of the following fields: artificial intelligence, data management, statistical learning, information retrieval. He or she should be comfortable with large-scale data processing and the use of modern artificial intelligence techniques, particularly deep learning. He or she should be able to read and write research articles in English.
About Inria
Inria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, often at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact.