PhD Position F/M Distributed Training of Heterogeneous Architectures

Le descriptif de l’offre ci-dessous est en Anglais

Type de contrat : CDD

Niveau de diplôme exigé : Bac + 5 ou équivalent

Fonction : Doctorant

A propos du centre ou de la direction fonctionnelle

Inria is a national research institute dedicated to digital sciences that promotes scientific excellence and transfer. Inria employs 2,400 collaborators organised in research project teams, usually in collaboration with its academic partners.
This agility allows its scientists, from the best universities in the world, to meet the challenges of computer science and mathematics, either through multidisciplinarity or with industrial partners.
A precursor to the creation of Deep Tech companies, Inria has also supported the creation of more than 150 start-ups from its research teams. Inria effectively faces the challenges of the digital transformation of science, society and the economy.

Contexte et atouts du poste

This PhD thesis is in the framework of the Inria-Nokia Bell Labs research initiative on Federated Learning for Cellular Networks and in closer relation with Inria research initiative on Federated Learning, FedMalin

The PhD candidate will join Nokia Bell Labs research team in Massy, France and will also be a member of the Inria project-team NEO
The Machine Learning & Systems team, part of the AI Research Lab at Nokia Bell Labs, is composed of computer scientists and data engineers who develop AI-based systems and algorithms bridging the gap between the promise of limitless capabilities of AI and the constraints imposed by real computing and communication systems.
NEO is positioned at the intersection of Operations Research and Network Science. By using the tools of Stochastic Operations Research, the team members model situations arising in several application domains, involving networking in one way or the other.

The research activity will be supervised by
* Chung Shue Chen (
* Fabio Pianese (
* Giovanni Neglia (

Mission confiée

The increasing size of data generated by smartphones and IoT devices motivated the development of Federated Learning (FL) [LKT+20,KMA+21], a framework for on-device collaborative training of machine learning models. FL algorithms like FedAvg [MMR+17] allow clients to train a common global model without sharing their personal data. FL reduces data collection costs and can help to mitigate data privacy issues, making it possible to train models on large datasets that would otherwise be inaccessible. FL is currently used by many big tech companies (e.g., Google, Apple, Facebook) for learning on their users' data, but the research community envisions also promising applications to learning across large data-silos, like hospitals that cannot share their patients' data [RHL20].

Most existing algorithms for federated learning train the same model architecture for each user/device (personalized FL algorithms allow only the value of model parameters to be different). In this task, we will explicitly consider that heterogeneous devices may not be able to run the same model (because of computational, memory, or battery constraints) and will propose new algorithms to train heterogeneous architectures jointly. A key challenge, in this case, is to design meaningful ways of sharing information across heterogeneous model architectures.

We will consider two different approaches. The first relies on model subsampling: there is a single neural network architecture. However, while powerful clients store and use the complete model, less powerful clients only store a subset of the model matching their memory and computation capabilities. This approach has been proposed in [DDT21, HLA+21] for convolutional networks, where each client may use a different number of channels. We will explore how much this strategy can be extended to different neural network architectures. We will also address open issues about training with subsampling. For example, results in [HLA+21] suggest that, at some training iterations, more powerful clients should behave as less powerful clients and update a subset of the architecture. We are missing a solid explanation for these observations and quantitative guidelines about how often clients should train a model different from the largest one they can train. The response likely depends on the distribution of clients’ capabilities and their dataset sizes.

The second approach we will consider is based on knowledge transfer techniques. For example, reference [LKSJ20] proposes that a central node (the FL aggregator) maintains a model for every possible architecture in the system and uses ensemble distillation to transfer information across the models. While the aggregator is, in general, much more powerful than the FL clients, it may still be the system bottleneck if clients exhibit a large degree of heterogeneity. Then a large number of different models need to be stored in memory and jointly trained at the aggregator. Another limit of [LKSJ20] is the need for an unlabeled dataset at the aggregator. The approach in [ZHZ21] replaces the unlabeled dataset with a synthetic one generated by a generative model. Still, this model must be jointly learned and may potentially leak private information as clients need to reveal their label distribution. We plan to address the limits of the existing approaches by proposing fully decentralized training algorithms, where each client can transfer information from its model to the models of less powerful clients in its neighborhood and may use to this purpose its local dataset.

[DDT21] Enmao Diao, Jie Ding, and Vahid Tarokh. HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients. 2021.
[HLA+21] Samuel Horv ́ath, Stefanos Laskaridis, Mario Almeida, Ilias Leontiadis, Stylianos Venieris, and Nicholas Lane. FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout. In Advances in Neural Information Processing Systems, volume 34, pages 12876–12889. Curran Associates, Inc., 2021.
[KMA+21] P. Kairouz, et al. Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1-2), pp. 1-210, 2021.
[LKSJ20] Tao Lin, Lingjing Kong, Sebastian U. Stich, and Martin Jaggi. Ensemble distillation for robust model fusion in federated learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, pages 2351–2363, Red Hook, NY, USA, December 2020. Curran Associates Inc.
[LKT+20] T. Li, A. Kumar Sahu, A. Talwalkar, and V. Smith. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37 (3), 2020.
[MMR+17] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. Aguera y Arcas. Communication efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, PMLR, 2017.
[RHL20] Rieke, N., Hancox, J., Li, W. et al. The future of digital health with federated learning. npj Digit. Med. 3, 119, 2020.
[ZHZ21] Zhuangdi Zhu, Junyuan Hong, and Jiayu Zhou. Data-free knowledge distillation for heterogeneous federated learning. In International Conference on Machine Learning, pages 12878–12889. PMLR, 2021.

Principales activités



The candidate should have a solid mathematical background (in particular on optimization) and in general be keen on using mathematics to model real problems and get insights. He should also be knowledgeable on machine learning, have good programming skills and previous experiences with PyTorch or TensorFlow.

We expect the candidate to be fluent in English.


  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Contribution to mutual insurance (subject to conditions)


Gross Salary per month: 2051€ brut per month (year 1 & 2) and 2158€ brut per month (year 3)