Embedded Machine Learning Programming (with potential PhD continuation)

Contract type : Internship

Level of qualifications required : Master's or equivalent

Fonction : Internship Research

Context

This internship takes place in the context of a collaboration with the ASTRA joint Inria/Valeo team and with Google DeepMind on the topic of modeling and efficient implementation of complex Machine Learning applications onto embedded platforms.

Scientific context:

Conventional machine learning (ML) frameworks offer a tensor-centric view of the design and implementation of deep neural networks. But ML models do not stand by themselves as pure tensor functions. ML applications typically interact with an environment and often operate on stream of data collected and processed over time. For instance, the reactive control of a self-driving car operates on streams of data coming from sensors and controlling actuators. Training algorithms themselves embed a model into a reactive loop, itself decomposed into epochs and (mini- )batches allowing the efficient scheduling of computations and I/O, parameter updates, etc. The same applies to reinforcement learning (RL) agents. Back to the automated driving example, stateful behavior is essential to taking into account previously-inferred facts such as speed limits, whether the current lane is a left turn etc., long after the acquisition of sensor inputs. Other examples of ML components embedded into stateful reactive feedback loops include model-predictive maintenance, control, and digital twins. ML models themselves involve stateful constructs in the form of recurrent neural network (RNN) layers. When generating optimized code, even matrix products and convolutions in feedforward networks can be folded over time, using (stateful) buffering to reduce memory footprint. In distributed settings, the efficient implementation of large models involves pipelined communications and computations, which amounts to locally recovering a streaming execution pattern.

Considering this broad range of scenarios, we observe that existing ML frameworks inadequately capture reactive aspects, raising barriers between differentiable models and the associated control, optimization, and input/output code. These barriers worsen the gap between ML research and system capabilities, particularly in the area of control automation where embedded ML engineering relies on undisclosed, ad-hoc implementations.

In previous work, we have proposed a reactive language, named MLR, integrating ML-specific constructs (such as bidirectional recurrences or the tensorial operations) and activities (such as automatic differentiation). We have also shown that for applications without bidirectional recurrences reactiveness does not penalize performance.

 

Assignment

The objective of this internship and of the potential PhD follow-up is to advance on either, or both the MLR language design and the MLR compilation fronts.

  • On the language design (syntax and semantics) side, of particular interest is the introduction of iterators allowing for seamless conversion of iterations performed in time, on streams, into iterations performed in space, on tensors. Such transformations are needed both at high level, e.g. to introduce a "batch" dimension into a computation, and at low level, e.g. to specify how a large tensorial operation is decomposed for execution onto hardware.
  • On the compilation side, the key difficulty is the handling of bidirectional recurrences. Classical reactive formalisms such as Lustre can be compiled into very efficient, statically-scheduled code running in constant memory, without buffering. By comparison, the ML-specific bidirectional recurrences implicitly require buffering and dynamic scheduling (like the tape-based methods used during training). Replacing this implicit buffering with explicit, efficient and bounded buffering under a mostly-static scheduling has the potential to largely improve the performance and predictibility of generated code.

In both cases, the internship will start with the analysis and MLR modeling of a complex ML application that will be used as main use case: a Reinforcement Learning-based Autonomous Driving (AD) application from the automotive domain. 

The internship will involve regular interactions with:

  • Google DeepMind for the language design and compilation work.
  • Our automotive partners (the ASTRA team and Valeo) for the evaluation of MLR on the AD use case.

Contact: More information on the internship offer can be obtained by contacting dumitru.potop@inria.fr

Main activities

Main activities :

  • State of the art analysis
  • Use case modeling and evaluation
  • Proposal of language extensions and compilation methods
  • Participating to the writing of a research paper

Skills

Languages : Proficiency in either French or English is required.

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage