R&D Engineer - Feature Extraction for Video Understanding in Group Interaction Scenario using Transformer-Based Architectures
Type de contrat : CDD
Contrat renouvelable : Oui
Niveau de diplôme exigé : Bac + 5 ou équivalent
Fonction : Ingénieur scientifique contractuel
A propos du centre ou de la direction fonctionnelle
The Inria center at Université Côte d'Azur includes 42 research teams and 9 support services. The center’s staff (about 500 people) is made up of scientists of different nationalities, engineers, technicians and administrative staff. The teams are mainly located on the university campuses of Sophia Antipolis and Nice as well as Montpellier, in close collaboration with research and higher education laboratories and establishments (Université Côte d'Azur, CNRS, INRAE, INSERM ...), but also with the regional economic players.
With a presence in the fields of computational neuroscience and biology, data science and modeling, software engineering and certification, as well as collaborative robotics, the Inria Centre at Université Côte d'Azur is a major player in terms of scientific excellence through its results and collaborations at both European and international levels.
Contexte et atouts du poste
Inria, the French National Institute for Computer Science and Applied Mathematics, promotes “scientific excellence for technology transfer and society”. Graduates from the world’s top universities, Inria's 2,700 employees rise to the challenges of digital sciences. With its open, agile model, Inria can explore original approaches with its partners in industry and academia and provide an efficient response to the multidisciplinary and application challenges of digital transformation. Inria is the source of many innovations that add value and create jobs.
Team
The STARS research team combines advanced theory with cutting-edge practice focusing on cognitive vision systems.
Team web site : https://team.inria.fr/stars/
Scientific context
Feature extraction is a challenging computer vision problem which targets extracting relevant information from raw data in order to reduce dimensionality and capture meaningful patterns. When this needs to be done in a dataset and task invariant way, it is referred to as general feature extraction. This is a crucial step in machine learning pipelines and popular methods like VideoSwin and VIdeoMAE work well for the task of action recognition and video understanding. However, these works and also the datasets that they are tested on, like Something-Something and Kinetics, fail to capture information about interactions in daily life.
Towards this research direction, several methods have been proposed to model these complex fine grained interactions using datasets like UDIVA, MPII Group Interactions and Epic-Kitchen. Those datasets encompassing real-world challenges share the following characteristics: Firstly, there is rich multimodal information available where each modality provides important information relevant to the labels. Secondly, there is a lot of irrelevant information that has to be ignored as deep learning models easily identify patterns that are coincidental (local minima). For example, the colour of the T-shirt could be used to assign a certain personality score to someone if by coincidence the majority of the extrovert people are wearing warm colours. Lastly, the videos in these
datasets are generally very long.
So, the main question is:
How to extract general features from multimodal data with a lot of noise in the form of irrelevant information?
Typical situations that we would like to monitor are daily interactions, responses and reactions and analyse cause and effect in behaviour (it could be humanhuman interaction or human-object interaction).
The system we want to develop will be beneficial for all tasks requiring focus on interactions. Specifically, healthcare for psychological disorders -- general feature extraction will allow deep learning models to assist in various subtasks involved in the diagnosis process.
Mission confiée
In this work, we would like to go beyond existing computer vision deep learning models and introduce ways to extend them to utilise information from new modalities. Also, to identify ways to focus on relevant information for interactions in the input. The system should also take into account the long temporal duration of videos in the datasets in this domain. These have to be done in a flexible way, so that there is minimal change to the original model and hence the original model’s trained weights are useful too.
Existing methods have mostly focused on modelling the variation of visual cues pertinent to the classes provided for video classification tasks. Though they perform these tasks well, changes in the recording setting or addition of noise in the form of irrelevant background information makes it hard for these models to perform well. So, for obtaining a general feature extractor, the models have to be modified to accommodate for these shortcomings.
Principales activités
The Inria STARS team is seeking an engineer with a strong background in computer vision, deep learning, and machine learning.
In this work, we focus on two things: First, in group interaction scenarios, utilizing all available information to obtain relevant features for multiple downstream tasks while ignoring irrelevant background information. Second,
efficient transfer learning for a new recording paradigm. This can include new modalities, change in recording settings, and different downstream tasks. The first objective can be tackled by forcing attention in transformers to attend to relevant parts of the input and having more specific architectures for modelling interactions. The second objective caters to a more general problem of parameter efficient transfer learning which has benefited from works like adapters, prefix tuning and prompt tuning [refs for all three]. These have worked well for the field of NLP and have been adapted to computer vision, but work only for specific cases. The theory behind these techniques can be utilised to develop new methods that serve the second objective of this work.
Large pretrained vision models and their architectures can be used as the backbone for this work.
Compétences
Candidates must hold a Master's degree or equivalent in Computer Science or a closely related discipline by the start date.
The candidate must be grounded in computer vision basics and have solid mathematical and programming skills.
With theoretical knowledge in Computer Vision, OpenCV, Mathematics, Deep Learning (PyTorch, TensorFlow), and technical background in C++ and Python programming, and Linux.
The candidate must be committed to scientific research and substantial publications.
In order to protect its scientific and technological assets, Inria is a restricted-access establishment. Consequently, it follows special regulations for welcoming any person who wishes to work with the institute. The final acceptance of each candidate thus depends on applying this security and defense procedure.
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Contribution to mutual insurance (subject to conditions)
Rémunération
From 2692 € gross monthly (according to degree and experience)
Informations générales
- Thème/Domaine : Vision, perception et interprétation multimedia
- Ville : Sophia Antipolis
- Centre Inria : Centre Inria d'Université Côte d'Azur
- Date de prise de fonction souhaitée : 2025-02-01
- Durée de contrat : 8 mois
- Date limite pour postuler : 2025-01-31
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postuler
Applications must be submitted online on the Inria website. Collecting applications by other channels is not guaranteed.
Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
Contacts
- Équipe Inria : STARS
-
Recruteur :
Brémond François / Francois.Bremond@inria.fr
L'essentiel pour réussir
- Essential qualities in order to fulfil this assignment are feeling at ease in an environment of scientific dynamics and wanting to learn and listen.
- Passionate about innovation, willing to go for a PhD thesis in the field of Computer Vision and Machine Learning.
Languages: English
- Relational skills: team work
- Other valued appreciated: leadership
A propos d'Inria
Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.