Contract type : Public service fixed-term contract
Level of qualifications required : Graduate degree or equivalent
Fonction : PhD Position
About the research centre or Inria department
Grenoble Rhône-Alpes Research Center groups together a few less than 800 people in 35 research teams and 9 research support departments.
Staff is localized on 5 campuses in Grenoble and Lyon, in close collaboration with labs, research and higher education institutions in Grenoble and Lyon, but also with the economic players in these areas.
Present in the fields of software, high-performance computing, Internet of things, image and data, but also simulation in oceanography and biology, it participates at the best level of international scientific achievements and collaborations in both Europe and the rest of the world.
ENS Lyon, Lyon, France
Supervisor: Yves Robert, Professor, email@example.com
Co-supervisor: Loris Marchal, CRNS Researcher, firstname.lastname@example.org
This PhD thesis focuses on real-time workflow applications that execute on high-performance computing platforms. Consecutive instances of the problem are periodically input to the platform, and execute concurrently, under deadline constraints. The optimization problem is to maximize the throughput of the platform while enforcing the response time constraints, and it has received considerable attention in the literature. The thesis revisits this problem on large-scale platforms where tasks are subject to transient errors. These errors must be detected and corrected via re-execution, which may increase response-time and decrease throughput. The thesis will investigate replication techniques to protect the tasks and guarantee some reliability threshold, under a multi-objective optimization approach, where energy consumption (a critical resource in real-time systems) is also accounted for.
Research topics : Algorithms, scheduling, parallel applications, resilience, transient errors, performance evaluation, throughput, response time, energy consumption.
Thesis description : A real-time system is responsible for performing logically correct computations while meeting predefined deadlines. Failing to meet these deadlines often results in catastrophic system failures. Transient faults triggered by cosmic rays or electromagnetic interference have to be tolerated when executing the tasks, which calls for sophisticated replication strategies. However, replication has a high cost in terms of consumed energy, which is a critical resource in embedded real-time systems. Altogether, enforcing all task deadlines while tolerating transient faults and without exceeding a given energy budget is a challenging endeavor.
The thesis will target real-time workflows, i.e., pipelined instances of the same set of tasks whose dependence graph is a given DAG (Directed Acyclic Graph). Instance number i is released at time i.P and must complete by time (i+1)P, where P is the global period of the problem. The workflow instances are mapped onto a parallel platform with N identical processors.
Each task is subject to transient failures, with some probability that is typically evaluated as a function of the task worst-case execution time. A reliability threshold is enforced, either task by task, or globally, which calls for replicating each task with a different number of replicas. However, when the execution of one replica succeeds, the other replicas are cancelled.
The energy consumed by each replica of each task depends upon the frequency chosen for its execution. The lower the frequency, the less energy consumed, but the longer the execution time; a low frequency may prevent the workflow deadline to be matched. Also, if the execution of the replicas of a given task do not overlap, energy consumption will be smaller whenever the first execution succeeds.
Altogether, this calls to designing schedules that achieve several trade-offs. The problem is to find:
(i) a number of replicas for each task;
(ii) a frequency for each task replica; and
(iii) a mapping and scheduling of all these replicas onto the processors.
The constraints are to match the reliability threshold and the deadlines, and the objective is to minimize the expectation of the consumed energy.
First research direction : The first research direction is to address the simpler case of independent tasks with the same period. A preliminary study has been conducted in , where some heuristics have been introduced. The thesis will assess whether the problem of scheduling the task replicas once the mapping and frequencies are given is NP-complete or not, and will design optimal algorithms (if the problem is polynomial) or efficient heuristics (if is is NP-complete). Then the thesis will investigate approximation algorithms to solve the general problem for independent tasks.
Second research direction : The next step will be to consider independent tasks with different periods. The scheduling of the tasks becomes less flexible, and the EDF policy (Earliest Deadline First) is usually enforced to guarantee that all deadlines are matched. The thesis will design algorithms and evaluate their efficiency for this problem.
Third research direction : The thesis will eventually tackle dependence graphs, first of a particular form (chains, fork-joins) and then will address the general problem.
General methodology : The work will be conducted along three main lines:
- Synthesis of relevant literature
- Design and analysis of new and multi-criteria algorithms
- Evaluation through simulations and using benchmark workflows
- Basic background in computer science, probability theory,
- interest in algorithm design and complexity.
(a) General references :
- T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms. The MIT Press, 2001
- M. Mitzenmacher, E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Cambridge University Press, 2005
- J. Leung editor, Handbook of Scheduling: Algorithms, Models and Performance Analysis, CRC Press 2004
- H. Casanova, A. Legrand and Y. Robert, Parallel Algorithms, CRC Press 2008
(b) Reference on real-time workflows:
- M. A. Haque, H. Aydin, D. Zhu, On Reliability Management of Energy-Aware Real-Time Systems Through Task Replication, IEEE Trans. Parallel and Distributed Systems 28, 3 (2017), pp 813-825.
Research towards a PhD thesis in computer science
Technical slls and level required : see above
Languages : fluent English (French not needed)
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
1st and 2nd year : 1 982 euros brut /month
3rd year : 2 085 euros brut / month
- Theme/Domain :
Distributed and High Performance Computing
Scientific computing (BAP E)
- Town/city : Lyon
- Inria Center : CRI Grenoble - Rhône-Alpes
- Starting date : 2019-10-01
- Duration of contract : 3 years
- Deadline to apply : 2019-04-28
The keys to success
Basic background in computer science, probability theory.
Interest in algorithm design and complexity.
Inria, the French national research institute for the digital sciences, promotes scientific excellence and technology transfer to maximise its impact. It employs 2,400 people. Its 200 agile project teams, generally with academic partners, involve more than 3,000 scientists in meeting the challenges of computer science and mathematics, often at the interface of other disciplines. Inria works with many companies and has assisted in the creation of over 160 startups. It strives to meet the challenges of the digital transformation of science, society and the economy.
Instruction to apply
The campaign is not open to local students who have not done any significan mobility.
Defence Security :
This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.
Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people with disabilities.
Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.