2018-00363 - PhD Position : "Design-space exploration of fault-tolerant multicores"
Le descriptif de l’offre ci-dessous est en Anglais

Niveau de diplôme exigé : Bac + 5 ou équivalent

Autre diplôme apprécié : Master

Fonction : Doctorant

Niveau d'expérience souhaité : Jeune diplômé

A propos du centre ou de la direction fonctionnelle

The Cairn project-team researches new architectures, algorithms and design methods for flexible and energy efficiency domain-specific system-on-chip (SoC). As performance and energy-efficiency requirements of SoCs are continuously increasing, they become difficult to fulfil using only programmable processors solutions. To address this issue, we promote/advocate the use of reconfigurable hardware, i.e. hardware structures whose organization may change before or even during execution. Such reconfigurable SoCs offer high performance at a low energy cost, while preserving a high level of flexibility. The group studies these SoCs from three angles: (i) The invention and design of new reconfigurable platforms with an emphasis on flexible arithmetic operator design, dynamic reconfiguration management and low- power consumption. (ii) The development of their corresponding design flows (compilation and synthesis tools) to enable their automatic design from high-level specifications. (iii) The interaction between algorithms and architectures especially for our main application domains (wireless communications, wireless sensor networks and digital security). The team has been created in 2008 and is a “reconfiguration” of the former R2D2 research team from Irisa.

Contexte et atouts du poste

The consumer market has shifted towards multicore architectures, since the clock speeds of the single processors could not be further increased due to power consumption and heat dissipation limits [4]. Multicores provide Space, Weight and Power reductions (SWaP) and massive computing capabilities compared with single core processors, while they can integrate diverse applications on the same platform [1]. However, the reduction of the transistors size with technologies at 28nm and below has led the multicores to become more and more sensible to the environmental impacts [2], such as ionizing, particle and high-energy electromagnetic radiation, extreme weather conditions, high temperature peaks and electromagnetic interferences. Such stimuli trigger violations on the system impacting the normal system functionality and creating faults during its operation [3]. To provide correct system functionality, the reliability of multicore architectures has become a very essential aspect. Several different fault tolerant approaches have been proposed in the literature to improve the system reliability. However, no general solution can exist to provide the required reliability in low cost for all the problems under study. The promising fault tolerant method is determined by the real faults occurring during execution, the application and the platform of each problem under study.

Mission confiée

Assignments: 3-year PhD Thesis

This PhD focuses on fault tolerant multi-core architectures and has as main goals: 1) to gain insight on the impact of faults on multicore architectures in order to model the impact of simple (SEU, SET) and multiple (MBU) errors at different levels of abstraction, and 2) to design and develop a novel method to explore the design space of the promising set of fault tolerant techniques.

Principales activités

During the first part of this thesis, we will study the impact of faults on the basic components of a multicore architecture, i.e. the memory, the core and the interconnection, based on a shared-memory multicore based on RISC-V cores specified at the C-level through high-level synthesis and designed with a 28nm technology. To achieve this, we require to develop models to describe the faulty behaviors of these components by raising the abstraction of the existing fault models on the gate level and up to the architecture level.

During the second part, we will define the set of relevant fault tolerant techniques within our domain and classify these methods into a binary classification scheme. Each of the classes will be characterized with respect to the reliability that they can offer and the overhead that they impose on the design (performance, area, energy). The different possible fault scenarios, based on the abstract models developed during the first part, will be mapped with the corresponding fault tolerant classes. In the next step we will focus on defining a novel design space exploration methodology and designing the corresponding tools in order to efficiently explore the different fault tolerance design options. The methodology will be based on pruning methods over the binary classification and optimizations strategies. The results of the proposed methodology are the set of the most promising fault tolerant approaches under given fault scenarios and platform characteristics that reduce the system cost, while providing reliability and real-time guarantees. A RISC-V multicore architecture will be used to perform the evaluation of the proposed methodology.

This thesis is funded by a project involving INRIA, ONERA, and Temento Systems.

References

  1. Lemonnier, P. Millet, G. Marchesan Almeida, et al. “Towards future adaptive multiprocessor systems-on-chip: an innovative approach for flexible architectures,” in IC-SAMOS, 2012.
  2. Gizopoulos, M. Psarakis, S.V. Adve, et al. “Architectures for Online Error Detection and Recovery in Multicore Processors”, in DATE, 2011
  3. P. Siewiorek and P. Narasimhan, “Fault-tolerant architectures for space and avionics applications", technical report, Carnegie Mellon University, 2008.
  4. Di Carlo, P. Prinetto, D. Rolfo, and P. Trotta, “A fault injection methodology and infrastructure for fast single event upsets emulation on xilinx sram-based fpgas,” in 2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). IEEE, 2014, pp. 159–164.
  5. Jafri, J. Piestrak, S, O. Sentieys, and S. Pillement, “Design of the coarse-grained reconfigurable architecture DART with on-line error detection,” Microprocessors and Microsystems, vol. 38, pp. 124–136, Mar. 2014.
  6. Jafri, S. J. Piestrak, O. Sentieys, and S. Pillement, “Design of a fault-tolerant coarse-grained reconfigurable architecture: A case study,” in Proc. of the 11th IEEE International Symposium on Quality Electronic Design (ISQED 2010), (San Diego, CA, USA), p. 6 pages, IEEE, Mar. 2010.
  7. Gatti, “Development and certification of avionics platforms on multi-core processors,” in Tutorial Mixed-Criticality Systems: Design and Certification Challenges, ESWeek, (Montreal, Canada), 2013.
  8. The RISC-V Instruction Set Architecture, http://riscv.org, 2016.
  9. Psiakis, A. Kritikakou and O. Sentieys, NEDA: NOP Exploitation with Dependency Awareness for Reliable VLIW Processors, IEEE Computer Society Annual Symposium on VLSI (ISVLSI), July 3-5, 2017.
  10. Psiakis, A. Kritikakou and O. Sentieys, Run-Time Instruction Replication for Permanent and Soft Error Mitigation in VLIW Processors, 15th IEEE Int. NEW Circuits and Systems Conference (NEWCAS), 2017.

 

Compétences

The student is expected to develop techniques for design space exploration of computer architectures and fault tolerance. We also expect to have prototype implementations of the developed techniques on FPGA and ASIC. The designs will primarily be done through High-Level Synthesis tools.

Desired skills include:

  • Computer architecture, hardware design, VLSI circuit design.
  • Basic knowledge in compilers, fault tolerance.
  • Familiarity with the C/C++ language or other languages.
  • Familiarity with FPGA design and/or High-Level Synthesis.

Mostly importantly, we seek highly motivated and active students.

Avantages sociaux

  • Subsidised catering service
  • Partially-reimbursed public transport
  • Social security
  • Paid leave
  • Flexible working hours
  • Sports facilities

Rémunération

Monthly gross salary amounting to 1982 euros for the first and second years and 2085 euros for the third year