PhD Position F/M PhD Semantically-enriched queries and analysis of metagenomic datasets
Contract type : Fixed-term contract
Level of qualifications required : Graduate degree or equivalent
Fonction : PhD Position
Level of experience : Recently graduated
About the research centre or Inria department
The Inria Centre at Rennes University is one of Inria's eight centres and has more than thirty research teams. The Inria Centre is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc
Context
Genomic data enable critical advances in medicine, ecology, ocean monitoring, and agronomy. A major limitation is that it is impossible to query these entire data (petabytes of sequences).
The OmicFinder project (https://project.inria.fr/omicfinder/) will provide a search engine able to remove this lock. The central algorithmic idea of a genomic search engine is to index and query small exact words (hundreds of billions over millions of datasets), as well as the associated metadata. The project brings together Inria teams in algorithmic on strings, ontologies, computing architectures, and data distribution. They will bring algorithmic advances including computation frugality, clever index distributions, and smart ontology-based questions and answers filtration.
The core idea of the OmicFinder is to build an index of small exact words present in millions of datasets, so that a query based on this index will return the list of datasets that have (at least) a sequence containing this word. This corresponds to the syntactic aspect of query resolution.
Assignment
The expected benefits are two-folds.
Smart queries First, this will allow users to specify *a priori* relevance criteria that will reduce noise and improve performances. For example, it will allow an user to specify that (s)he is interested in Human gut microbiome, so that the datasets containing sequences that match the word but obtained in a Tara oceanic expedition can be ignored. Even better, OmicFinder will not even channel this query to the tara repository, avoiding unnecessary computations. Note that we want to support multiple levels of granularity in order to focus on mammal gut microbiome, or mammal omnivorous gut microbiome.
Smart answers Second, it will allow the OmicFinder query engine to provide *a posteriori* characterization of the datasets, similar to the classical enrichment analyses. Typically, one could compare the frequencies of annotations in the datasets returned by the query with the frequencies of the same annotations among the whole set of datasets, or among the datasets that match the semantic criteria. For example, one could find that the datasets returned by a query on a particular word on datasets related to Human gut microbiome are enriched in liver-related diseases compared to the datasets related to Human gut microbiome in general.
Main activities
The expected contributions of this PhD thesis are:
- the creation of a semantic index of the datasets based on FAIR principles. This will require to retrieve the metadata from the main dataset repositories, and to represent them in an unified schema, based on Semantic Web technologies such as RDF, RDFS+OWL and bioschemas.
- the comparison of the trade-off between a centralized and a decentralized storage of the semantic annotations in terms of implementation simplicity, performance impact, and scalability.
- the capability for users to express semantically rich queries. This will rely on SPARQL for representing the queries, but will necessitate an adequate user interface.
- the capability to describe and characterize the query results.
Skills
Technical skills and level required : Programming (Python or Java)
Languages : French or English
Other valued appreciated : Semantic Web
Benefits package
- Subsidized meals
- Partial reimbursement of public transport costs
- Possibility of teleworking (90 days per year) and flexible organization of working hours
- Partial payment of insurance costs
Remuneration
Monthly gross salary: 2100€ during the 2 1st years and 2200€ during the 3rd year.
General Information
- Theme/Domain :
Computational Biology
Information system (BAP E) - Town/city : Rennes
- Inria Center : Centre Inria de l'Université de Rennes
- Starting date : 2024-10-01
- Duration of contract : 3 years
- Deadline to apply : 2024-11-03
Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.
Instruction to apply
Please submit online : your resume, cover letter and letters of recommendation eventually
Defence Security :
This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.
Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people with disabilities.
Contacts
- Inria Team : DYLISS
-
PhD Supervisor :
Dameron Olivier / olivier.dameron@irisa.fr
About Inria
Inria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, often at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact.