PhD Position F/M - Robust storage on DNA
Type de contrat : CDD
Niveau de diplôme exigé : Bac + 5 ou équivalent
Fonction : Doctorant
Niveau d'expérience souhaité : Jeune diplômé
A propos du centre ou de la direction fonctionnelle
The Inria Rennes - Bretagne Atlantique Centre is one of Inria's eight centres and has more than thirty research teams. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.
Mission confiée
Supervisors:
- Aline ROUMY (aline.roumy@inria.fr)
- Thomas MAUGEY (thomas.maugey@inria.fr)
Goal The goal of the project is to develop an algorithm to allow robust storage data on DNA.
Context Data volume growth has led to a projected data storage requirement of 175 ZB by 2025 [1]. However, the actual data storage capacity currently falls short of this forecast. One potential solution to address these challenges is DNA storage as it offers several advantages, including high data density, extended retention, and low energy cost [2]. Indeed, in terms of data density, DNA can store about bytes per , enabling the storage of all data generated throughout human history within a 30 cm-sided cube [3]. Regarding retention, DNA can endure for centuries, in contrast to contemporary storage mediums that typically last for decades [3]. Additionally, DNA storage is energy-efficient, since it can be stored at ambient temperatures, if it is kept away from light and humidity.
Challenges and envisaged approach Nonetheless, making DNA an efficient storage solution involves overcoming numerous challenges. These challenges encompass:
(i) Data Transformation: convert data into a quaternary code (ACGT).
(ii) DNA Synthesis: write data, essentially synthesizing DNA.
(iii) DNA Sequencing: extract the quaternary code from DNA, i.e., sequencing DNA.
(iv) Data Retrieval: transform back the read quaternary code into the original data.
The goal of the project is to address the first and fourth challenges by developing compression algorithms that are robust to sequencing errors that occur during step (iii). Indeed, efficient DNA storage heavily relies on rapid sequencing methods, which introduce errors. For instance, real time analysis has been achieved at the price of increased error rates with nanopore sequencing, developed by Oxford Nanopore Technologies (ONT). The main difficulty comes from the type of errors: nanopore introduces not only conventional substitution errors but also unconventional deletion and insertion errors [4-5]. Deletion differs from erasure errors, where it is known which part is missing (e.g., lost packets on the internet can be identified by packet headers). Such knowledge of the existence and position of the missing part is unavailable for deletions, and this complicates the correction of this type of error.
The goal of the project is to propose novel ways to structure the compressed DNA-stream in order to robustify nanopore sequencing. For instance, we will exploit the similarities between sequencing and network transmission, to develop robust compression, based on ideas from packet scheduling for noisy networks (for example Dynamic Adaptive Streaming over HTTP). However, there are also differences between network transmission and nanopore sequencing. One of the main differences between the two problems is the random position of the extracted DNA-segment. To address this issue, we will build upon ideas of random extraction in the compressed streams [7], but also on the compressive sensing framework [6].
Bibliography
[1] David Reinsel-John Gantz-John Rydning, John Reinsel, and John Gantz. The digitization of the world from edge to core.Framingham: International Data Corporation, 16:1–28, 2018.
[2] Luis Ceze, Jeff Nivala, and Karin Strauss. Molecular digital data storage using DNA. NatureReviews Genetics, 20(8):456–466, 2019.
[3] Victor Zhirnov, Reza M Zadegan, Gurtej S Sandhu, George M Church, and William L Hughes.Nucleic acid memory. Nature materials, 15(4):366–370, 2016.
[4] Delahaye, Clara, and Jacques Nicolas. “Nanopore MinION Long Read Sequencer: An Overview of Its Error Landscape,” November 23, 2020. https://hal.inria.fr/hal-03123133.
[5] ———. “Sequencing DNA with Nanopores: Troubles and Biases.” PLoS ONE, October 1, 2021, 1. https://doi.org/10.1371/journal.pone.0257521.
[6] Huo, Dongming, Xuehua Zhu, Guangzhen Dai, Huicheng Yang, Xin Zhou, and Minghui Feng. “Novel Image Compression–Encryption Hybrid Scheme Based on DNA Encoding and Compressive Sensing.” Applied Physics B 126, no. 3 2020.
[7] T. Maugey, A. Roumy, E. Dupraz and M. Kieffer. ``Incremental coding for extractable compression in the context of Massive Random Access'', IEEE Transactions on Signal and Information Processing over Networks, 2020
Compétences
Candidate profile The candidate should have
- strong background in image/signal processing, optimization and programming,
- notions of source coding, information theory would be appreciated.
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
Rémunération
monthly gross salary amounting to 2100 euros
Informations générales
- Thème/Domaine :
Vision, perception et interprétation multimedia
Systèmes d'information (BAP E) - Ville : Rennes
- Centre Inria : Centre Inria de l'Université de Rennes
- Date de prise de fonction souhaitée : 2024-11-01
- Durée de contrat : 3 ans
- Date limite pour postuler : 2024-07-01
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postuler
Please submit online : your resume, cover letter and letters of recommendation eventually
Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
Contacts
- Équipe Inria : SIROCCO
-
Directeur de thèse :
Roumy Aline / aline.roumy@inria.fr
A propos d'Inria
Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.