PhD Position F/M Distributed Training of Heterogeneous Architectures
Type de contrat : CDD
Niveau de diplôme exigé : Bac + 5 ou équivalent
Fonction : Doctorant
A propos du centre ou de la direction fonctionnelle
Inria is a national research institute dedicated to digital sciences that promotes scientific excellence and transfer. Inria employs 2,400 collaborators organised in research project teams, usually in collaboration with its academic partners.
This agility allows its scientists, from the best universities in the world, to meet the challenges of computer science and mathematics, either through multidisciplinarity or with industrial partners.
A precursor to the creation of Deep Tech companies, Inria has also supported the creation of more than 150 start-ups from its research teams. Inria effectively faces the challenges of the digital transformation of science, society and the economy.
Contexte et atouts du poste
This PhD thesis is in the framework of the Inria-Nokia Bell Labs research initiative on Federated Learning for Cellular Networks and in closer relation with Inria research initiative on Federated Learning, FedMalin https://project.inria.fr/fedmalin/.
The PhD candidate will join Nokia Bell Labs research team in Massy, France https://www.bell-labs.com/about/locations/paris-saclay-france/ and will also be a member of the Inria project-team NEO https://team.inria.fr/neo/.
The Machine Learning & Systems team, part of the AI Research Lab at Nokia Bell Labs, is composed of computer scientists and data engineers who develop AI-based systems and algorithms bridging the gap between the promise of limitless capabilities of AI and the constraints imposed by real computing and communication systems.
NEO is positioned at the intersection of Operations Research and Network Science. By using the tools of Stochastic Operations Research, the team members model situations arising in several application domains, involving networking in one way or the other.
The research activity will be supervised by
* Chung Shue Chen (firstname.lastname@example.org)
* Fabio Pianese (email@example.com)
* Giovanni Neglia (firstname.lastname@example.org)
The increasing size of data generated by smartphones and IoT devices motivated the development of Federated Learning (FL) [LKT+20,KMA+21], a framework for on-device collaborative training of machine learning models. FL algorithms like FedAvg [MMR+17] allow clients to train a common global model without sharing their personal data. FL reduces data collection costs and can help to mitigate data privacy issues, making it possible to train models on large datasets that would otherwise be inaccessible. FL is currently used by many big tech companies (e.g., Google, Apple, Facebook) for learning on their users' data, but the research community envisions also promising applications to learning across large data-silos, like hospitals that cannot share their patients' data [RHL20].
Most existing algorithms for federated learning train the same model architecture for each user/device (personalized FL algorithms allow only the value of model parameters to be different). In this task, we will explicitly consider that heterogeneous devices may not be able to run the same model (because of computational, memory, or battery constraints) and will propose new algorithms to train heterogeneous architectures jointly. A key challenge, in this case, is to design meaningful ways of sharing information across heterogeneous model architectures.
We will consider two different approaches. The first relies on model subsampling: there is a single neural network architecture. However, while powerful clients store and use the complete model, less powerful clients only store a subset of the model matching their memory and computation capabilities. This approach has been proposed in [DDT21, HLA+21] for convolutional networks, where each client may use a different number of channels. We will explore how much this strategy can be extended to different neural network architectures. We will also address open issues about training with subsampling. For example, results in [HLA+21] suggest that, at some training iterations, more powerful clients should behave as less powerful clients and update a subset of the architecture. We are missing a solid explanation for these observations and quantitative guidelines about how often clients should train a model different from the largest one they can train. The response likely depends on the distribution of clients’ capabilities and their dataset sizes.
The second approach we will consider is based on knowledge transfer techniques. For example, reference [LKSJ20] proposes that a central node (the FL aggregator) maintains a model for every possible architecture in the system and uses ensemble distillation to transfer information across the models. While the aggregator is, in general, much more powerful than the FL clients, it may still be the system bottleneck if clients exhibit a large degree of heterogeneity. Then a large number of different models need to be stored in memory and jointly trained at the aggregator. Another limit of [LKSJ20] is the need for an unlabeled dataset at the aggregator. The approach in [ZHZ21] replaces the unlabeled dataset with a synthetic one generated by a generative model. Still, this model must be jointly learned and may potentially leak private information as clients need to reveal their label distribution. We plan to address the limits of the existing approaches by proposing fully decentralized training algorithms, where each client can transfer information from its model to the models of less powerful clients in its neighborhood and may use to this purpose its local dataset.
[DDT21] Enmao Diao, Jie Ding, and Vahid Tarokh. HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients. 2021.
[HLA+21] Samuel Horv ́ath, Stefanos Laskaridis, Mario Almeida, Ilias Leontiadis, Stylianos Venieris, and Nicholas Lane. FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout. In Advances in Neural Information Processing Systems, volume 34, pages 12876–12889. Curran Associates, Inc., 2021.
[KMA+21] P. Kairouz, et al. Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1-2), pp. 1-210, 2021.
[LKSJ20] Tao Lin, Lingjing Kong, Sebastian U. Stich, and Martin Jaggi. Ensemble distillation for robust model fusion in federated learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, pages 2351–2363, Red Hook, NY, USA, December 2020. Curran Associates Inc.
[LKT+20] T. Li, A. Kumar Sahu, A. Talwalkar, and V. Smith. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37 (3), 2020.
[MMR+17] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. Aguera y Arcas. Communication efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, PMLR, 2017.
[RHL20] Rieke, N., Hancox, J., Li, W. et al. The future of digital health with federated learning. npj Digit. Med. 3, 119, 2020.
[ZHZ21] Zhuangdi Zhu, Junyuan Hong, and Jiayu Zhou. Data-free knowledge distillation for heterogeneous federated learning. In International Conference on Machine Learning, pages 12878–12889. PMLR, 2021.
The candidate should have a solid mathematical background (in particular on optimization) and in general be keen on using mathematics to model real problems and get insights. He should also be knowledgeable on machine learning, have good programming skills and previous experiences with PyTorch or TensorFlow.
We expect the candidate to be fluent in English.
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Contribution to mutual insurance (subject to conditions)
Gross Salary per month: 2051€ brut per month (year 1 & 2) and 2158€ brut per month (year 3)
- Thème/Domaine :
Réseaux et télécommunications
Système & réseaux (BAP E)
- Ville : Paris
- Centre Inria : Centre Inria d'Université Côte d'Azur
- Date de prise de fonction souhaitée : 2023-10-01
- Durée de contrat : 3 ans
- Date limite pour postuler : 2024-01-28
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postuler
Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
A propos d'Inria
Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'eﬀorce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.