2022-05054 - PhD Position F/M Stateful and distributed data stream processing
Le descriptif de l’offre ci-dessous est en Anglais

Type de contrat : CDD

Niveau de diplôme exigé : Bac + 5 ou équivalent

Autre diplôme apprécié : Master degree in distributed systems and/or cloud computing

Fonction : Doctorant

A propos du centre ou de la direction fonctionnelle

The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.

Contexte et atouts du poste

The data produced by the Internet of Things are often generated at the edge of the Internet in the form of data streams. Instead of always transferring them to the cloud to be processed there, a more efficient alternative exploits edge/fog computing technologies to process the data close to the location where they were created. This reduces the usage of long-distance networks and improves the response times. Data stream processing frameworks such as Apache Flink are well-suited for processing these data streams in real time. However, using these frameworks in geo-distributed environments such as edge/fog platforms incurs significant challenges.

Our team has solved the problem of modeling the performance and efficiently managing the resources for simple types of “stateless” data processing operators such as data filtering and aggregation by a single key. However, some other types of “stateful” data processing operators are more complex because their parallelization requires additional synchronizations between the operator instances. The goal of this thesis is to investigate the performance of these stateful operators in a geo-distributed environment and, if possible, to propose improved implementations of these operators to facilitate their usage in this type of environment.

Mission confiée

After a first phase of literature study in the domain of geo-distributed data stream processing, the PhD candidate will focus on an experimental study to classify and model the performance of the most popular stateful operators, in particular concerning their scalability in a geo-distributed environment. Depending on the results of this study the next phase will propose improved implementations of these operators. One interesting possibility will be the usage of weakly consistent replication techniques (CRDT – Commutative Replicated Data Types) to obtain improved performance at the cost of a controlled reduction of the precision of obtained results.

 

References:

[1] Hamidreza Arkian, Guillaume Pierre, Johan Tordsson, Erik Elmroth. An Experiment-Driven Performance Model of Stream Processing Operators in Fog Computing Environments. SAC 2020 - ACM/SIGAPP Symposium On Applied Computing, Mar 2020. https://hal.inria.fr/hal-02394396

[2] Hamidreza Arkian, Guillaume Pierre, Johan Tordsson, Erik Elmroth. Model-based Stream Processing Auto-scaling in Geo-Distributed Environments. ICCCN 2021 - 30th International Conference on Computer Communications and Networks, Jul 2021. https://hal.inria.fr/hal-03206689

 

Principales activités

Main activities (5 maximum):

  • Integrate the experimental stream processing testbed in a geo-distributed edge/fog computing environment
  • Perform experiments to quantify the performance and scalability of stateful data stream processing operators
  • Propose improved implementations for the most popular stateful data stream processing operators
  • Write and publish scientific articles to report on the main results

Compétences

  • A master degree in distributed systems and/or Cloud computing.

  • Excellent programming skills in Linux environments.

  • Excellent communication and writing skills.

  • Good command of English. Note that knowledge of French is not required for this position.

  • Knowledge of the following technologies is not mandatory but will be considered as a plus:

    • Cloud resource scheduling

    • Distributed container systems: Kubernetes, Docker Swarm.

    • Single-board computers such as Raspberry PI

    • Python and shell scripting

    • Revision control systems: git, svn.

    • Linux distributions: Debian, Ubuntu.

Avantages

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT
  • Possibility of teleworking (90 days per year) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage
  • partial payment of insurance costs

Rémunération

Monthly gross salary amounting to :

  • 1982 euros for the first and second years and
  • 2085 euros for the third year