2022-05178 - PhD Position F/M Distributed Machine Learning in Ubiquitous Environments using Location-dependent Models
Le descriptif de l’offre ci-dessous est en Anglais

Type de contrat : CDD

Niveau de diplôme exigé : Bac + 5 ou équivalent

Fonction : Doctorant

Niveau d'expérience souhaité : Jeune diplômé

A propos du centre ou de la direction fonctionnelle

The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.

Contexte et atouts du poste

Context & Funding

The proposed Ph.D. will take place within the Fed-Malin Inria Challenge project. Fed-Malin aims to address the methodological challenges of moving ML operations from the comfortable cloud nest to the wild Internet. Most existing research considers the “Google/Apple setting” with a large set of relatively homogeneous smartphones and the cloud. By contrast, we will consider various scenarios, including entities with significant computation resources (e.g., companies, hospitals), edge servers deployed by telecommunications operators, and (potentially heterogeneous) IoT devices with or without AI edge accelerators.

Fed-Malin will shed light on the design principles of distributed systems for ML. It will help to better configure the existing ones, as well as to conceive the next generation. More efficient distributed learning systems can reduce cost barriers to access AI technology and mitigate the present concentration of power at the few technology giants that can afford massive computing power. They may also operate under lower energy budgets and thus potentially contribute to making AI more sustainable.

Supervision and Location

The Ph.D. will be co-supervised by the Inria teams WIDE (the World Is DistributEd, in Rennes) and Spirals (Self-adaptation for distributed services and large software systems, in Lille). The successful candidate might be located in either Lille or Rennes (France).

Complete Ph.D. topic description available here.

Mission confiée

Ph.D. Objectives

In many applications, machine learning models are intrinsically tailored to a given geographical area. This is the case, for example, of smart building management [4], crowdsensing for environmental monitoring [9], and smart wireless transmission techniques [10]. These models benefit from continuous data streams generated by sensors and/or mobile devices. The goal of this Ph.D. is to design, deploy and characterize decentralized learning algorithms and frameworks that can preserve the privacy of their users while delivering location-dependent services.


Research Questions & Work Plan


More specifically, our objective in this Ph.D. project is to investigate how decentralized machine learning can be effectively deployed on mobile devices to design and implement location-dependent applications accessible to end-users in the field while preserving their privacy. This objective calls for a combination of novel algorithms, protocols, and middleware solutions. More specifically, we foresee that achieving this objective requires addressing three challenges:

  1. How to store unbounded data streams on constrained mobile devices? DML enforces the local processing of data but depending on sensors, the volume of produced data streams may quickly go beyond the storage capacity of mobile devices. We, therefore, intend to leverage our past work on temporal graph storage techniques [6] to store compact representations of data streams and support reasoning over long histories of sensor data. Intended duration: 12 months;
  2. How to exchange relevant model samples among nearby devices? By connecting nearby devices, DML algorithms can periodically exchange locally learned models and knowledge. Yet, one cannot assume to blindly share these local models for privacy concerns. Instead, the exchange of partial models of shared interests should be privileged. This requires the design of a new privacy-preserving protocol to identify common interests (e.g., shared locations) and then extract the associated partial model that can be shared among connected parties. To this end, we plan to leverage our work on privacy-preserving decentralized averaging [11, 1, 3], in order to compute aggregate gradients without releasing sensitive data. We also plan to build on recent work on the convergence of stochastic gradient methods in the presence of data streams generated by Markov processes [5, 14] (note that these papers ignore space correlations). Intended duration: 12 months;
    3. How to program DML algorithms for the masses? Finally, to avoid the design and implementation of ad hoc algorithms, we aim to design a middleware framework that can support the execution of a larger family of DML algorithms. This programming framework will nonetheless include a mobile testing environment supporting nearby communications in order to tune the hyper-parameters of such models [8]. Intended duration: 12 months.

As a matter of demonstration and assessment of the contributions to the above challenges, we intend to consider two case studies in the area of mobile crowdsourcing software systems:

  • Spatio-temporal clustering of air quality measurements will aim to support a decentralized inference of particle-matter cartography by leveraging a network of air quality sensors connected to personal smartphones. While in situ measurements may disclose points of interest, we aim at computing custom maps of particle matter concentrations in order to recommend end-users appropriate locations with low pollution;
  • User-centric clustering of user trajectories will consider the more general case of individual trajectory processing to extract shared paths explored by a crowd of users and possibly predict future user mobility. These shared paths and trajectory predictions can in turn be used for task planning (in a crowdsensing scenario) and resource allocation (for instance, in a multi-access edge computing (MEC) framework). However, user trajectories cannot be shared without disclosing routines or sensitive places. Thus we aim to build on the vicinity of end-users in order to infer anonymized mobility models. A particular approach we will explore is to build on the vicinity of users, via standard localization methods or channel charting based solely on radio measurements from personal smartphones [13], in order to infer anonymized mobility models.

The above crowdsensing case studies will draw upon the expertise gathered within the Spirals team on crowdsensing platforms, particularly through the APISENSE [9] online platform. The Ph.D. might also involve experiments on specialized testbeds, such as FIT IoT-LAB, which can be used to collect IoT data and experiment with actual ML and FL implementations.

Principales activités

Perform scholarly research in algorithmic and distributed systems. This includes but is not limited to :

  • Performing bibliographic investigation,
  • Writing up surveys on the state of the art related to the Ph.D.'s topic,
  • Designing new algorithms, protocols, and systems to support the Ph.D.'s objectives,
  • Writing up, revising, submitting, and presenting scholarly articles reporting on the Ph.D.'s work.

Compétences

  • Skilled in at least one common programming language
  • Possessing a workable knowledge of system-level development (e.g. Linux)
  • Good writing skills
  • Ability to research, assimilate, and synthesize new information.

Avantages

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Possibility of teleworking ( 90 days per year) and flexible organization of working hours
  • partial payment of insurance costs

Rémunération

monthly gross salary amounting to 1982 euros for the first and second years and 2085 euros for the third year