Logo Inria

PhD Position F/M Hardware-guided compression & fine-tuning of Transformer-based models

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : PhD Position

About the research centre or Inria department

The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.

Context

Supervisors: Silviu-Ioan Filip (1) and Olivier Sentieys (1)

(1) Univ Rennes, Inria

Place: Campus de Beaulieu, Rennes, France

Contacts: olivier.sentieys@inria.fr, silviu.filip@inria.fr

Assignment

This PhD Thesis will be funded through the HOLIGRAIL project.

Main activities

Transformer-based large language models (LLMs) have been getting significant attention recently with the arrival of networks such as ChatGPT and GPT-4. While they show impressive potential in a variety of tasks (notably in vision and natural language processing), most modern Transformer-based architectures are nevertheless characterized by significant storage (with up to hundreds of billions of parameters and hundreds of GB of required
memory/disk space), computational and energy costs, making it challenging to fine-tune, deploy, and use them in many practical settings without a powerful hardware infrastructure. For instance, even state-of-the-art compressed model families, such as LLaMA [6], require at least 7B parameters (LLaMA-7B) and at least 3.5GB of storage (i.e., when using 4-bit integer quantization).


The PhD candidate will be tasked with investigating extreme compression methods for Transformer-based models, combining pruning and quantization methodologies (such as post-training quantization methods [2] and quantiation-aware training [3]). We will look at hybrid format mixed precision approaches (e.g. mixing integer and floating-point data [1]), taking hardware constraints into account (e.g. available compute units and supported formats and memory). The goal is to efficiently explore the large design space of available quantization formats and propose compressed models that are optimized for low latency and energy consumption. This will entail extending low precision simulation tools that we have been developing in our team (mptorch, built on top of PyTorch [4,5]), and also working towards FPGA/ASIC hardware accelerator prototypes for small to medium sized Transformer-based models that will be developed with other members in the HOLIGRAIL project.


Context: The successful candidate will be a member of the TARAN team, based in the Inria Research centre at Rennes University and IRISA Lab. in Rennes, France. The thesis is part of the upcoming PEPR HOLIGRAIL project, part of the larger PEPR programme in Artificial Intelligence. It brings together researchers working on machine learning, computer arithmetic, hardware acceleration and compiler optimization for embedded systems and deep learning applications from University of Rennes, Inria, CEA List, INSA Lyon and Grenoble-INP. HOLIGRAIL is a large and competitive project that will fund more than 20 people ranging from PhD students to postdoctoral fellows.

 

References:

[1] Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. arXiv preprint arXiv:2208.07339, 2022.


[2] Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. OPTQ: Accurate Quantization for Generative Pre-Trained Transformers. In The Eleventh International Conference on Learning Representations, 2023.


[3] Zechun Liu, Barlas Oguz, Changsheng Zhao, Ernie Chang, Pierre Stock, Yashar Mehdad, Yangyang Shi, Raghuraman Krishnamoorthi, and Vikas Chandra. LLM-QAT: Data-Free Quantization Aware Training for Large Language Models. arXiv preprint arXiv:2305.17888, 2023.

[4] Mariko Tatsumi, Silviu-Ioan Filip, Caroline White, Olivier Sentieys, and Guy Lemieux. Mixing low-precision formats in multiply-accumulate units for dnn training. In 2022 International Conference on Field-Programmable Technology (ICFPT), pages 1–9. IEEE, 2022.


[5] Mariko Tatsumi, Yuxiang Xie, Caroline White, Silviu-Ioan Filip, Olivier Sentieys, and Guy Lemieux. MPTorch and MPArchimedes: Open source frameworks to explore custom mixed-precision operations for dnn training on edge devices. In 2nd Research Open Automatic Design for Neural Networks (ROAD4NN) Workshop, co-located with IEEE/ACM Design Automation Conference (DAC), Dec. 2021, 2021.

[6] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971, 2023.

 

Skills

When: The desired starting date is October 1st 2023 or as soon as possible after.

Who: The successful candidate should be highly motivated and creative. The position requires a strong background in computer arithmetic, computer architecture, with knowledge of Deep Learning models (ideally LLMs) and techniques. Strong proficiency in Python with knowledge of popular deep learning frameworks such as PyTorch or TensorFlow is also required.


Application: Informal inquiries are strongly encouraged and the interested candidates can contact us for additional details and information. Applications are accepted until the positions are filled. The formal application should be sent by email to Silviu Filip (silviu.filip@inria.fr) and Olivier Sentieys (olivier.sentieys@inria.fr) and it should include:

  • motivation letter
  • CV
  • transcripts for the courses undertaken in the last two years of study
  • references and recommendation letters
  • links to publications or MSc thesis if relevant
  • contact information of two references (title, name, organization, e-mail)

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Possibility of teleworking (90 days per year) and flexible organization of working hours
  • Partial payment of insurance costs

Remuneration

monthly gross salary amounting to 2082 euros for the first and second years and 2190 euros for the third year