PhD Position F/M Construction of a Generic Multilingual, Multi-Speaker Articulatory Model Using Real-Time MRI Data (PhD offer)
Type de contrat : CDD
Niveau de diplôme exigé : Bac + 5 ou équivalent
Fonction : Doctorant
Niveau d'expérience souhaité : Jeune diplômé
Contexte et atouts du poste
Background
Speech production requires control of the movements of the articulators (jaw, tongue, lips, etc.) that are used to modify the shape of the vocal tract, and consequently the acoustic properties, including the resonance frequencies of the vocal tract.
When learning to speak or acquiring a second language, speakers learn to move and control their articulators to produce the sounds of their language. Articulatory synthesis mimics this process by using the temporal evolution of the vocal tract shape and source parameters as input. The advantage of articulatory synthesis is that it can explain the articulatory origin of phonetic contrasts, manipulate the movement of articulators (or even block one to simulate a speech impairment), adapt to a new speaker by modifying the size and shape of the articulators, and finally, reconstruct the vocal tract shape from the speech signal.
Compared to other synthesis approaches that offer high quality, the main advantage of articulatory synthesis is therefore its ability to control the entire speech production process.
Generating the geometric shape of the vocal tract at every moment of synthesis is the central focus of articulatory synthesis. It most often relies on the use of an articulatory model [1, 2] that determines the shape of the vocal tract using a small number of parameters. This model is almost exclusively constructed either from geometric primitives or from a small number of MRI images of a single speaker and, consequently, for a single language. Recently, we developed a model that uses a large number of dynamic MRI images of a speaker to generate the shape of the vocal tract based on a sequence of phonemes to be articulated [3].
The objective of this thesis is now to construct a generic model that is independent of the speaker and the language.
Mission confiée
Work
To this end, we are currently collecting data covering approximately thirty speakers and languages using a real-time acquisition system (at 50 frames per second) for two-dimensional sagittal MRI scans of the vocal tract, as part of a collaboration with the IADI laboratory (INSERM U1254) at the Nancy University Hospital.
These images of the mid-sagittal plane of the vocal tract are of very high quality, making it possible to track the contours of the articulators. Despite the excellent results we have obtained [4], we aim to improve tracking at the tongue tip —which has a significant acoustic impact— for example, by using the nnU-net approach [5].
The thesis work will be divided into two main parts:
- The construction of a generic articulatory model,
- The adaptation of this model to a new language and a new speaker.
For the first part, the work will be based on an anatomical normalization approach of the speaker in order to construct a model for controlling the articulators that accounts for anatomical differences.
For the second part, the work will consist of representing the places of articulation and articulatory movement (when necessary, such as with affricates and diphthongs, for example) associated with the phonemes of a new language within the framework of the generic articulatory model.
References
[1] B. J. Kröger, V. Graf-Borttscheller, A. Lowit. (2008). Two- and Three-Dimensional Visual Articulatory Models for Pronunciation Training and for Treatment of Speech Disorders, Proc. Of Interspeech 2008, Brisbane, Australia
[2] Y. Laprie, J. Busset. (2011). Construction and evaluation of an articulatory model of the vocal tract, In : 19th European Signal Processing Conference - EUSIPCO-2011. – Barcelona, Spain
[3] Vinicius Ribeiro, Karyna Isaieva, Justine Leclere, Pierre-André Vuissoz, Yves Laprie. Automatic generation of the complete vocal tract shape from the sequence of phonemes to be articulated. Speech Communication, 2022, 141, pp.1-13. ⟨10.1016/j.specom.2022.04.004⟩. ⟨hal-03650212⟩
[4] Vinicius Ribeiro, Karyna Isaieva, Justine Leclere, Jacques Felblinger, Pierre-André Vuissoz, et al.. Automatic segmentation of vocal tract articulators in real-time magnetic resonance imaging. Computer Methods and Programs in Biomedicine, 243 (2), ⟨10.1016/j.cmpb.2023.107907⟩. ⟨hal-04376938⟩
[5] Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18, 203–211 (2021). https://doi.org/10.1038/s41592-020-01008-z
[6] Karyna Isaieva, Yves Laprie, Justine Leclère, Ioannis K Douros, Jacques Felblinger, et al.. Multimodal dataset of real-time 2D and static 3D MRI of healthy French speakers. Scientific Data , 2021, 8 (1), pp.258. ⟨10.1038/s41597-021-01041-3⟩. ⟨hal-03507532⟩
[7] Sofiane Azzouz, Pierre-André Vuissoz, Yves Laprie. Reconstruction of the Complete Vocal Tract Contour Through Acoustic to Articulatory Inversion Using Real-Time MRI Data. Interspeech 2025, Aug 2025, Rotterdam (NL), Netherlands. pp.978-982, ⟨10.21437/Interspeech.2025-963⟩. ⟨hal-05293831⟩
Principales activités
Research Environment
The doctoral student will have access to real-time MRI databases already acquired as part of the ANR ArtSpeech project (approximately 10 minutes of speech from 10 speakers [6]) and the Full3FDTalkingHead project (approximately 6 hours of speech from 2 speakers), as well as several other projects. The IADI and Loria laboratories are undoubtedly the most advanced in the field of real-time MRI data acquisition and analysis, with recent work on articulatory acoustic inversion of speech signals [7]. The PhD student will, of course, also be able to participate in ongoing data collection for several languages using the MRI system available at the IADI laboratory.
The scientific environments of the two teams are highly complementary, with strong expertise in all areas of MRI and anatomy within the IADI laboratory and in deep learning within the Loria MultiSpeech team. The two teams are geographically close (1.5 km). The PhD student will have access to both laboratories and the technical resources (computer, access to computing clusters) needed to work under excellent conditions. A progress meeting will be held weekly, and each team organizes a weekly scientific seminar. The PhD student will also have the opportunity to participate in one or two summer schools and conferences on MRI and automatic speech processing. He/she will also receive assistance in writing conference papers and journal articles.
Funding for this doctoral project has already been secured through the ANR ArtAny project.
Supervisor: Yves Laprie (Loria)
Compétences
Skills
The candidate must have a strong background in deep learning, applied mathematics, and computer science. Knowledge of speech processing and magnetic resonance imaging (MRI) is also desirable.
Keywords
articulatory synthesis, real-time MRI, articulatory modeling, deep learning, computer science, speech processing, applied mathematics
Languages
French and/or English
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Rémunération
€2300 gross/month
Informations générales
- Thème/Domaine :
Langue, parole et audio
Calcul Scientifique (BAP E) - Ville : Villers lès Nancy
- Centre Inria : Centre Inria de l'Université de Lorraine
- Date de prise de fonction souhaitée : 2026-10-01
- Durée de contrat : 3 ans
- Date limite pour postuler : 2026-05-31
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postuler
Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
Contacts
- Équipe Inria : MULTISPEECH
-
Directeur de thèse :
Laprie Yves / yves.laprie@loria.fr
A propos d'Inria
Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.