Nous utilisons des cookies pour recueillir des informations statistiques afin de mieux comprendre comment vous utilisez ce site. Si vous cliquez sur « Refuser», ces cookies ne seront pas déposés.

PhD Position F/M Towards a programmable autonomic platform for decentralized learning

Le descriptif de l’offre ci-dessous est en Anglais

Type de contrat : CDD

Niveau de diplôme exigé : Bac + 5 ou équivalent

Fonction : Doctorant

Niveau d'expérience souhaité : Jeune diplômé

A propos du centre ou de la direction fonctionnelle

The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.

Contexte et atouts du poste

The Internet of Things and its myriads of devices constantly producing data at the Edge of the Internet, has become a huge source of data to be collected, processed and analysed in a near real time fashion. In particular, AI applications are natural consumers of these data [1]. With the emergence of Edge computing [2], moving the learning process behind these applications closer to the Edge, where data are produced, is appealing, in turn raising mutliple challenges.

An appealing approach is Federated Learning (FL) [3], where each device learns based on locally produced data, creates a model and then sends it to a centralized server in charge of merging the locally obtained models into a single global model. This model is then sent back to devices so they can have the global model locally and restart learning with a model built based on data collected everywhere. Federated Learning provides benefits compared to a purely centralized learning approach by not having local data moved to the server, and thus enabling by-design privacy.

Yet, FL still relies over a centralized server as a coordinator and merger for the models. Without it, no aggregation is possible. Moreover, it suffers from the traditional limitations of centralized approaches: limited scalability and resilience. While it has been shown that adapting learning for more decentralized platforms is promising [4], the problem of efficiently learning over decentralized platform is still a widely open problem.



[1] R. Singh and S. S. Gill, “Edge AI: a survey”, Internet of Things and Cyber-Physical Systems 2023, Vol. 3, Page 71-92, doi:10.1016/j.iotcps.2023.02.004

[2] Weisong Shi et al., "Edge computing: Vision and challenges", in: IEEE internet of things journal 3.5 (2016), pp. 637–646.

[3] H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2016. Communication-Efficient Learning of Deep Networks from Decentralized Data. https://doi.org/10.48550/arXiv.1602.05629

[4] Hegedűs, I., Danner, G., & Jelasity, M. Gossip learning as a decentralized alternative to federated learning. In Distributed Applications and Interoperable Systems: 19th IFIP WG 6.1 International Conference, DAIS 2019, Kongens Lyngby, Denmark, June 17–21, 2019, Proceedings 19 (pp. 74-90). Springer International Publishing.

Mission confiée

Different from the FL approach where the centralized component is vital for the mere functioning of the system, we plan to adopt the recently proposed alternative where devices learn locally and opportunistically exchange their model so as to build a more complete model. Finding the right trade-off between accuracy of the models within an area (composed of one or several interrelated nodes) and the network traffic generated will constitute the main driver for the solution.

Once deployed, the runtime system must face the changing conditions of the platform: disconnection of devices, bottlenecks in the network. More generally, adaptation is needed to adjust the parameters, hyperparameters and methods use to combine local models, so as to ensure the best tradeoff between accuracy of the model and resource usage.

A related problem is the ease with which a programmer can deploy, monitor and adapt at runtime the platform and its parameters. This relates to the notion of Programmability. Developing abstractions for the easy deployment and control of AI programs over decentralized platforms is an important step towards a larger adoption of the approach. This will constitute the thid aspect to get studied during the thesis.

The work will be experimentally validated over a real large scale platform such as the Grid'5000 platform [5]. To ease such a deployment, the E2Clab framework will be used. E2Clab [6,7] is a framework that implements a rigorous methodology that provides guidelines to move from real-life application workflows to representative settings of the physical infrastructure underlying this application in order to accurately reproduce its relevant behaviors and therefore understand end-to-end performance.



[5] Daniel Balouek et al. Adding Virtualization Capabilities to the Grid’5000 Testbed, in: Cloud Computing and Services Science, ed. by Ivan I. Ivanov et al., vol. 367, Communications in Computer and Information Science, Springer International Publishing, 2013, pp. 3–20, isbn: 978-3-319-04518-4.

[6] Daniel Rosendo, Pedro Silva, et al. E2Clab: Exploring the Computing Continuum through Repeatable, Replicable and Reproduc Edge-to-Cloud Experiments. Cluster 2020 - IEEE International Conference on Cluster Computing, Sep 2020, Kobe, Japan.

[7] The E2Clab project: https://team.inria.fr/kerdata/e2clab/ (https://team.inria.fr/kerdata/e2clab/).

Principales activités

The activity of the PhD student recruited will include:

  • Analysis and synthesis of the state of the art
  • Design of distributed algorithms
  • Proposal of programming abstractions
  • Development and deployment at large scale of a runtime proof of concept
  • Report and scientific article writing

Compétences

Qualifications:

- Good communication and writing skills
- Strong programming / scripting skills
- Knowledge and/or  experience in one or more of the following areas: distributed systems, adaptive systems, Cloud, Edge, Stream Processing, decentralized learning

Avantages

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Rémunération

monthly gross salary amounting to 2200 euros