2020-02799 - PhD Position F/M Search engine for genomic sequencing data
Type de contrat : CDD

Niveau de diplôme exigé : Bac + 5 ou équivalent

Fonction : Doctorant

A propos du centre ou de la direction fonctionnelle

The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.

Contexte et atouts du poste

The recruited PhD student will work in the GenScale team at Inria Rennes, France. She/he will work in close collaboration with the SeqDigger partners (Pasteur Institute Paris, CEA Genoscope, MIO) and external collaborators (EBI, Bielefeld University, …). Short stays at EBI, Cambridge, are expected.

Mission confiée


For a better knowledge of the proposed research subject :
We are currently witnessing a deep knowledge revolution due to the availability of exponentially expanding DNA sequence databases. This is made possible by the continuous acceleration of DNA sequencing throughput. Sequencing data is accumulating faster than Moore’s Law, bringing fundamental new insights, conjecture, and understanding, with impacts in medicine, agronomy and ecology. Today, the Sequence Read Archive raw data archive stores more than 1016 nucleotides, in the form of short sequences (<1000 bp) which represent fragments from generally unknown genomic location (the “reads”).

Currently there exists no way to query this treasure of information. Today, it would be unthinkable to access the Internet without powerful search engines. However, this is precisely the current situation for raw read archives, where precious data sleep undisturbed in rarely-opened drawers. In this project we propose to develop a new scaling breakthrough, allowing users to directly query sequencing data on the fly in order to tap into the largest underexploited resource in life sciences.

In the framework of the broader SeqDigger ANR project, we propose to design and propose new indexing schemes, scaling up very large DNA collection (assembled or not), and offering a way to query in real time input sequences of interest. The recruited PhD student will explore existing methods, mainly based on Bloom Filters, and will propose new algorithmic solutions.

Collaboration :
The recruited person will be in connection with SeqDigger ANR members for co-development, tests, validations and deployement.

Responsibilities :
The person recruited is responsible for

  • bibliography analyses
  • tests and analyses of state of the art tools
  • modelisation and development of new indexing schemes
  • validations
  • redaction of reports and articles



Principales activités

See "responsibilities"


Candidates must have strong interest and expertise in algorithmics, data structures and C++ implementation.

Knowledge in genomics and biology will be highly appreciated but is not a prerequisite.


  • Subsidized meals
  • Partial reimbursement of public transport costs


Monthly gross salary amounting to 1982 euros for the first and second years and 2085 euros for the third year