PhD Position F/M Handling dynamic constraints and deadlines in distributed software reconfiguration - Application to power transmission

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : PhD Position

About the research centre or Inria department

The Centre Inria de l’Université de Grenoble groups together almost 600 people in 24 research teams and 9 research support departments.

Staff is present on three campuses in Grenoble, in close collaboration with other research and higher education institutions (Université Grenoble Alpes, CNRS, CEA, INRAE, …), but also with key economic players in the area.

The Centre Inria de l’Université Grenoble Alpes is active in the fields of high-performance computing, verification and embedded systems, modeling of the environment at multiple levels, and data science and artificial intelligence. The center is a top-level scientific institute with an extensive network of international collaborations in Europe and the rest of the world.

Context

 

co-advised by
- Sophie Cerf (Ctrl-A, Grenoble), INRIA
- Eric Rutten (Ctrl-A, Grenoble), INRIA
- Hélène Coullon (Stack, Nantes), IMT Atlantique


Within the framework of a collaboration partnership between 2 Inria teams: Ctrl-A and Stack, funded by the Taranis project of the PEPR Cloud.

Assignment

Orchestration is a process that consists of managing dynamically and automatically computing resources in the Cloud/Edge, services, and applications to satisfy final users' expectations (a desired state).
Autonomic management and orchestration of distributed systems use feedback control loops [IBM03] that react to perceived variations (events or values in the system and its environment) by deciding upon reconfigurations (hardware and/or software). These reconfigurations are then implemented through basic actions in the system’s API (e.g., creation, update, deletion).
Feedback loops can manage a variety of objectives of different natures (e.g., self-optimization, self-configuration, self-protection, etc.). They deal with different dimensions: quantitative, temporal, logical, etc., and rely upon diverse decision techniques (control theory, scheduling, constraints resolution, learning, etc.). The general challenge is the design of autonomic managers that can handle this complexity.

This research topic considers the particular subject of reconfigurations [Coullon24] under temporal constraints, where the decision and execution [Chardet21][Philippe24] of the reconfiguration must be performed within a limited duration. Satisfying a temporal constraint makes an adaptation proactive rather than reactive, thus being able to benefit from predictions on upcoming events (e.g. failures, workload, mobility, intermittence).
We will consider techniques of constraint programming [Hermenier13] such as Anytime algorithms [Allouche15] and control theory [Rutten18,Pagano24] in orchestration.
A Use Case of particular interest to us is the notion of flexibility of power supply, as proposed by RTE (France’s Electricity Transmission System Operator), on which we have research activities in the Ctrl-A team, in the Tasting project of the PEPR TASE.

 

Main activities

The approach consists of exploring novel models and implementations for orchestrating reconfigurations with temporal constraints.

Our Use Case from RTE motivates such constraints in relation to the mix of renewable energies and nuclear power plants, making the balance more difficult to guarantee.
To this purpose, RTE implements flexibility contracts with its clients. With these contracts, RTE offers specific electrical discounts to big industries and can ask them to reduce their power consumption when the network faces irregularities (e.g., weather conditions). To apply such power reductions and later come back to normal consumption, RTE applies different kinds of scenarios, from very large (in days) to very short (in seconds) time windows.
For this reason, RTE needs to increase the number of flexible customers.
One customer of interest is the cloud providers because of their huge energy consumption, their increasing numbers, and their flexibility potential.

In this work, we will study plausible scenarios of flexibility in energy consumption of Data Centers, and how dynamic software and system reconfiguration can be used to successfully handle these scenarios.


In Cloud computing, three classes of satisfaction and optimization problems are usually solved to operate a large set of servers and applications and their adaptation through time: configuration problems (to choose the set of services to be deployed), knapsack or bin-packing problems (to place services on servers or VMs) [Hermenier13], and scheduling problems [Cadorel20] (when considering tasks with dependencies). With new incoming energy consumption constraints coming from RTE, these optimization problems (internally solved by Cloud providers) will have to be adapted. In simple cases, it consists of adding a new constraint to existing models, and in complex cases, this new constraint can conflict with others (e.g., quality of service, safety), which requires internal models to be dynamically adapted.
This dynamicity makes it interesting to consider the integration of control theory in the management loop.
Furthermore, RTE will also request the new constraints to be applied with a given deadline. This means that both the decision of the new target state and its application have to be completed before the deadline. It is an important scientific challenge to make the right decision so that its application can meet the deadline.

Experimental validation will be targeted at the SLICES-FR (https://slices-fr.eu/) platform, a research infrastructure covering the whole continuum IoT/networks/edge/cloud, with particular attention to reproducibility.
Another path of experimentation will consider simulation, like the Batsim environment (https://batsim.frama.io/).

References

[IBM03] J. O. Kephart and D. M. Chess, “The vision of autonomic computing” in Computer, vol. 36, no. 1, pp. 41-50, Jan. 2003

[Hermenier13] F. Hermenier, J. Lawall and G. Muller, "BtrPlace: A Flexible Consolidation Manager for Highly Available Applications," in IEEE Transactions on Dependable and Secure Computing, vol. 10, no. 5, pp. 273-286, Sept.-Oct. 2013, doi: 10.1109/TDSC.2013.5.

[Allouche15] David Allouche, Simon de Givry, George Katsirelos, Thomas Schiex, Matthias Zytnicki: Anytime Hybrid Best-First Search with Tree Decomposition for Weighted CSP. CP 2015: 12-29


[Rutten18] Eric Rutten, Nicolas Marchand, Daniel Simon. Feedback Control as MAPE-K loop in Autonomic Computing. Software Engineering for Self-Adaptive Systems III. Assurances., 9640, Springer, pp.349-373, 2018, LNCS.

[Cadorel20] Emile Cadorel, Hélène Coullon, Jean-Marc Menaud. Online Multi-User Workflow Scheduling Algorithm for Fairness and Energy Optimization. CCGrid2020 : 20th International Symposium on Cluster, Cloud and Internet Computing, Nov 2020, Melbourne, Australia. ⟨10.1109/CCGrid49817.2020.00-36⟩. ⟨hal-02551733⟩

[Chardet21] Maverick Chardet, Hélène Coullon, Simon Robillard. Toward Safe and Efficient Reconfiguration with Concerto. Science of Computer Programming, 2021, 203, pp.1-31. ⟨10.1016/j.scico.2020.102582⟩. ⟨hal-03103714⟩

[Coullon24] Hélène Coullon, Ludovic Henrio, Frédéric Loulergue, Simon Robillard. Component-Based Distributed Software Reconfiguration: a Verification-Oriented Survey. ACM Computing Surveys, 2024, 56 (1), pp.1-37. ⟨10.1145/3595376⟩

[Philippe24] Jolan Philippe, Antoine Omond, Hélène Coullon, Charles Prud'Homme, Issam Raïs. Fast Choreography of Cross-DevOps Reconfiguration with Ballet: A Multi-Site OpenStack Case Study. SANER 2024: IEEE International Conference on Software Analysis, Evolution and Reengineering, Mar 2024, Rovaniemi, Finland. pp.1-11, ⟨10.1109/SANER60148.2024.00007⟩. ⟨hal-04457484⟩

[Pagano24 ] Rosa Pagano, Sophie Cerf, Bogdan Robu, Quentin Guilloteau, Raphaël Bleuse, Eric Rutten. Making Control in High Performance Computing for Overload Avoidance Adaptive in Time and Job Size.   CCTA 2024 – 8th IEEE Conference on Control Technology and Applications, Aug 2024, Newcastle Upon Tyne, United Kingdom.

 

 

Skills

The PhD candidate must have:
- a MSc degree in Computer Science (or equivalent, e.g., engineering school).
- Excellent skills in programming languages, software engineering
- Knowledge in the domains of Cloud infrastructures, autonomic computing, and control techniques.
- Good organizational and communication skills.
- Relational skills: curiosity, autonomy, and social capabilities.

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration

Base of 2200 euros gross / month.