PhD Position F/M Exploitation and Structuring of Heterogeneous Geological Data and Knowledge
Type de contrat : CDD
Niveau de diplôme exigé : Bac + 5 ou équivalent
Fonction : Doctorant
Contexte et atouts du poste
This PhD position is in the context of a national collaboration between Inria and BRGM (the French geological survey) on augmenting the scientific process of geologists – more specifically, this position is about exploiting data and knowledge available to geologists before field campaigns, in the form of previous reports, maps, scientific publications, databases, etc., about the location surveyed.
The PhD will be co-supervised by:
- Pierre Senellart (Valda team at Inria Paris, where the PhD student will be primarily located)
- Ioana Manolescu (Cedar team at Inria Saclay)
- Cécile Gracianne (BRGM)
Mission confiée
Motivation : Careful preparation is essential for organizing a field campaign, encompassing both logistical tasks such as acquiring and readying equipment and arranging travel, as well as scientific considerations. Prior to each field campaign, a preparatory phase is undertaken, utilizing existing knowledge to appropriately scale the data acquisition efforts. This involves aligning the scientific requirements and objectives of the campaign with the project's constraints, including budget, time, and data management. Geologists employ various data sources—whether unstructured, semi-structured, or structured, such as scientific reports, publications, and databases—to enhance their understanding of the study area before selecting and developing the most promising scientific hypotheses for testing or confirmation during the field campaign. Throughout the field campaign, geologists generate data of varying structure based on their observations and measurements. The ability to compare this acquired data with initial hypotheses in real-time during the campaign, rather than upon return, allows for adjustments to the action plan in response to unforeseen constraints, such as inaccessible measurement sites or changes in the relevance of certain points. The PhD focuses on developing tools and methodologies to select, extract, and link the necessary data for field campaign setup while promoting the on-site utilization of acquired data during the campaign.
Challenges : The PhD addresses the question of enhancing the accessibility and reusability of BRGM's wealth of information by endowing it with metadata or restructuring it for more effective utilization. This involves tackling several scientific challenges. There is the intricate task of extracting information from BRGM's diverse document corpora, aiming to efficiently incorporate geographical/spatial and other annotations into the data and documents.
Assignment : The PhD student will be tasked to develop a methodology to automatically build a data warehouse from the information available to geologists. Such a warehouse is multimodal as it mixes text, images, and different forms of structured content. Information extraction techniques will be used to extract data from raw documents (e.g., tables of data values from PDFs; coordinates of specific geological features from maps; identifications of geological layers from schemas) and enrich them with accompanying metadata. Deep learning techniques can be used to construct representations of different modalities, which will then be combined in a global model used for information extraction. Integration of data from different sources and semantization of their content will be performed using Open Information Extraction techniques, in connection with knowledge bases such as Wikidata providing basic knowledge about minerals or locations.
Principales activités
Main tasks:
- Acquire an exhaustive understanding of the literature on information extraction, data warehousing, and data semantization.
- Propose and implement approaches to extract, structure, exploit, different data types, taking into account their heterogeneity.
- Evaluate the proposed solution on publicly available and BRGM-specific benchmarks.
- Keep track of the uncertainty and provenance of data items.
- Write scientific research papers with the objective to publish them on top data analytics and data management conferences and journals.
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Informations générales
- Thème/Domaine : Représentation et traitement des données et des connaissances
- Ville : Paris
- Centre Inria : Centre Inria de Paris
- Date de prise de fonction souhaitée : 2025-03-01
- Durée de contrat : 3 ans
- Date limite pour postuler : 2025-02-21
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postuler
Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
Contacts
- Équipe Inria : VALDA
-
Directeur de thèse :
Senellart Pierre / Pierre.Senellart@inria.fr
L'essentiel pour réussir
The doctoral student must have obtained a Master's degree or equivalent in computer science or mathematics. He or she should have had courses and initial research experience in one of the following fields: artificial intelligence, data management, statistical learning, information retrieval. He or she should be comfortable with large-scale data processing and the use of modern artificial intelligence techniques, particularly deep learning. He or she should be able to read and write research articles in English.
A propos d'Inria
Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.