Post-Doctoral Research Visit F/M Interpretable Classifier based on Multi-Modal Embeddings for rare classes: Application to identification of marine ecosystems

Contract type : Fixed-term contract

Level of qualifications required : PhD or equivalent

Fonction : Post-Doctoral Research Visit

Level of experience : Recently graduated

About the research centre or Inria department

The Inria Saclay-Île-de-France Research Centre was established in 2008. It has developed as part of the Saclay site in partnership with Paris-Saclay University and with the Institut Polytechnique de Paris since 2021.

The centre has 39 project teams , 27 of which operate jointly with Paris-Saclay University and the Institut Polytechnique de Paris. Its activities occupy over 600 scientists and research and innovation support staff, including 54 different nationalities.

Context

The success of supervised machine learning has given rise to a new priority, that of being able to interpret the models learned (explainable artificial intelligence, or XAI). The main directions investigated in XAI for computer vision include: building saliency maps  (for example by exploiting the gradient of the classifier); the identification of relevant sub-concepts for the classes under consideration, based on examples and/or the exploitation of sub-images of labelled images. New approaches based on the exploitation of multimodal embeddings, such as those integrated into CLIP, avoid the main limitations of the above guidelines (see [1] and references therein). However, multimodal embeddings reflect common concepts, which hinders their use in specific domains (ranging from biology to industrial environments) where classes involve rare sub-concepts (e.g., "protists"; "planktonic larvae"). Another challenge is to define *frugal* interpretable classifiers, supporting the interaction with researchers and end-users.

The target applications is the characterization of marine ecosystems from the wealth of data collected within the Inria Challenge OcéanIA [2] (see also https://oceania.inria.cl/). This project involves Inria teams in Chile, Paris, Saclay, and Sophia-Antipolis, and the Fondation Tara Océans, the Center of Mathematical Modeling (CMM, U.Chile), the Pontificia Universidad Católica de Chile (PUC), the GO-SEE CNRS Federation, and the Laboratoire des Sciences du Numérique de Nantes (LS2N).

This application is at the heart of the "AI for the Planet" initiative, where new tools are needed to exploit the mass of data collected by multidisciplinary teams to record current marine ecosystems and how they are changing under the impact of climate change.

[1] Nicolas Atienza, Roman Bresson, Cyriaque Rousselot, Philippe Caillou, Johanne Cohen, Christophe Labreuche, Michele Sebag (2024) Cutting the Black Box: Conceptual Interpretation of a Deep Neural Network with Multi-Modal Embeddings and Multi-criteria Decision Aid. IJCAI 2024.

[2] Sanchez-Pi, N., Martí, L., Abreu, A., Bernard, O., de Vargas, C., Eveillard, D., Maass, A., Marquet, P. A., Sainte-Marie, J., Salomon, J., Schoenauer, M., & Sebag, M. (2021). OcéanIA: AI, Data, and Models for Understanding the Ocean and Climate Change. (N. Sanchez-Pi & L. Martí, Eds.). Lille, Paris, Saclay, Santiago, Sophia-Antipolis: Inria – Institut national de recherche en sciences et technologies du numérique. https://hal.science/hal-03274323/

Assignment

We seek a postdoctoral researcher interested in interpretable supervised learning and in AI for good. The data are available; they include large collections of images (4 to 400 million images) collected during the Tara Oceans expeditions.

The challenge is to extend XAI approaches (including the "Cutting the Black Box" one) to semi-supervised learning (circa 1\% of the images are annotated), to rare domains, where the available multi-modal embeddings and/or LLM resources are hardly effective. Creativity is required to leverage the domain knowledge (diatomea, larvae ontologies) and ground the relevant sub-concepts in the classifier latent  space.

Main activities

The tasks of the successful candidate will be to:

  • Define a manageable corpus, supporting a sufficiently efficient classifier (current systems exist).
  • Define a dictionary of possibly relevant sub-concepts, based on plankton ontologies and hierarchical classifications, and leveraging existing resources (Wikipedia).
  • Work in collaboration with researchers from other OceanIA teams, including visits to teams in France and Chile;
  •  Co-develop software/functionalities with Inria project teams and participate in their dissemination through Oceania meetings and ML conferences.
  • Participate in the production of scientific articles and reports, in collaboration with other project members

Skills

 

  • Good knowledge of Machine Learning
  • Fluent in PyTorch/TensorFlow
  • Flair for programming
  • Sense of teamwork

Contacts: Marc.Schoenauer@inria.fr ; sylvain.chevallier@lisn.fr ; sebag@lri.fr

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration

2788 € gross/month