PhD Position F/M Speaker-independent acoustic to articulatory inversion of the entire vocal tract based on rt-MRI
Contract type : Fixed-term contract
Level of qualifications required : Graduate degree or equivalent
Fonction : PhD Position
Context
Articulatory acoustic inversion consists in recovering the geometric shape of the vocal tract from the speech signal. This is a major scientific challenge in automatic speech processing. Potential applications include providing articulatory feedback to a foreign language learning, or medical diagnosis of speech articulation. However, for the moment this problem has only been partially solved, and existing inversion techniques can only recover a few articulatory variables in the vocal tract, essentially for the front part of the tongue and the lips.
We have already developed an approach to single-speaker acoustic to articulatory inversion by training inversion from real-time MRI data and the denoised speech signal.
This PhD offer is provided by the ENACT AI Cluster and its partners. Find all ENACT PhD offers and actions on https://cluster-ia-enact.ai/.
Objective
The aim now is to develop a multi-speaker inversion of the complete vocal tract. To this end, we have data concerning about twenty speakers. These data are less complete than those used for the single-speaker inversion, but they will enable us to develop an anatomical normalization procedure to adapt the inversion to a new speaker, and to perform an acoustic adaptation.
Environment
Our two teams have already been working together for several years on articulatory modeling, making extensive use of dynamic MRI data. We are one of the leading teams in the use of real-time MRI for automatic speech processing. The PhD student will have access to the databases already acquired as part of the ArtSpeech project (on the order of 10 minutes of speech for 10 speakers) and those of the Full3DTalkingHead project (on the order of 3 hours of speech for 3 speakers). It will also be possible to acquire complementary data using the real-time MRI system available in the IADI laboratory. This PhD project will therefore build on current cooperation, and of course on the data and segmentation tools we have developed and continue to improve.
The scientific environment of the two teams is highly complementary, with very strong expertise in all areas of MRI and anatomy within the IADI laboratory, and in deep learning within the MultiSpeech team at Loria. The two teams are geographically close (1.5 km). The PhD student will have access to technical resources (computer, access to computing clusters) enabling him/her to work in excellent conditions. A weekly progress meeting will be held, and each of the two teams organize a weekly scientific seminar. The PhD student will also have the opportunity to attend one or two summer schools and conferences on MRI and automatic speech processing. They will also receive assistance in writing conference and journal articles.
As some of the data concerns German, we plan to cooperate with two teams in Germany (Universität des Saarlandes and TU Dresden).
Assignment
Work
The work will comprise four aspects: (i) anatomical adaptation with or without a static MRI image of the new speaker, (ii) adaptation of the acoustic data of a new speaker so as to invert the speech signal with respect to the model built on the other speakers, (iii) articulatory acoustic inversion itself (iv) geometric evaluation by measuring deviation from expected vocal tract shapes or evaluation using articulatory variables that better match phonetic information.
The first part of the project will involve developing an anatomical adaptation method to take into account a new speaker from a static MRI image, as might be envisaged for a medical application. This adaptation is designed to project the new speaker data into a geometric reference system. Static and dynamic MRI images are available for this adaptation. It will also be possible to develop an image-free geometric adaptation procedure based on the acoustic signal alone. After adaptation, it will be possible to project the inversion results of the new speaker into the reference anatomical frame.
The second part of the work will focus on acoustic adaptation so that the inversion can optimally take into account the acoustic data of a new speaker. This adaptation must also compensate for the fact that the acoustic data used to train the inversion were recorded in intense noise (that of the MRI machine) and had to be denoised. Acoustic adaptation is a theme that has given rise to a great deal of work in automatic speech processing, so there are several ways of tackling this issue efficiently.
For the inversion process itself, the current approach is often based on bidirectional LSTMs, and consists in recovering the contour of articulators. It will be possible to add attention information about the phonetic impact of particular articulators to improve the consistency of inversion results.
The final aspect concerns evaluation. This can be a purely geometric evaluation between the position of the expected contour and that of the contour recovered by inversion. It can also be an evaluation using articulatory variables that are less precise, but which reflect the expected acoustic properties.
References
[1] Azzouz, S., Vuissoz, P.-A. and Laprie, Y. 2024. Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data.
[2] Oura, A., Kikuchi, H. and Kobayashi, T. 2024. Preprocessing for acoustic-to-articulatory inversion using real-time MRI movies of Japanese speech. (2024), 1550–1554.
[3] Parrot, M., Millet, J. and Dunbar, E. 2020. Independent and Automatic Evaluation of Speaker-Independent Acoustic-to-Articulatory Reconstruction. Proceedings of INTERSPEECH 2020, 21st Annual Conference of the International Speech Communication Association (Shanghai / Virtual, China, Oct. 2020).
Skills
Technical skills and level required :
The applicant should have a solid background in deep learning, applied mathematics and computer sciences. Knowledge in speech and MRI processing will be also appreciated.
Languages : English
Benefits package
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Remuneration
€2200 gross/month
General Information
- Theme/Domain :
Language, Speech and Audio
Scientific computing (BAP E) - Town/city : Villers lès Nancy
- Inria Center : Centre Inria de l'Université de Lorraine
- Starting date : 2025-10-01
- Duration of contract : 3 years
- Deadline to apply : 2025-03-26
Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.
Instruction to apply
Defence Security :
This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.
Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people with disabilities.
Contacts
- Inria Team : MULTISPEECH
-
PhD Supervisor :
Laprie Yves / yves.laprie@loria.fr
About Inria
Inria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, often at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact.