PhD Position F/M PhD in applied mathematics: Stochastic modeling and statistics for quantifying and predicting the evolution of tumor heterogeneity in chronic lymphocytic leukemia
Type de contrat : CDD
Niveau de diplôme exigé : Bac + 5 ou équivalent
Fonction : Doctorant
Contexte et atouts du poste
Thesis context
The thesis will take place in the Probability and Statistics team of the Institut Élie Cartan de Lorraine (IECL) in Nancy and in the SIMBA team (Statistical Inference and Modeling for Biological Applications) of Inria Nancy. The PhD student will be involved in discussions with staff at the Strasbourg University Hospital on medical and data aspects all along the PhD project. During the thesis, the PhD student will have the opportunity to discover the world of mathematical research through the life of a dynamic mathematics laboratory, and to attend seminars and working groups in probability and statistics.
Supervision
The thesis will be supervised by Nicolas Champagnat, Coralie Fritsch and Ulysse Herbach (IECL and INRIA Nancy - Grand Est) for the mathematical part and by Laurent Vallat (CHRU Strasbourg and University of Strasbourg) for the medical part.
Contacts
nicolas.champagnat@inria.fr, coralie.fritsch@inria.fr, ulysse.herbach@inria.fr
Full PhD subject: https://nchampagnat.perso.math.cnrs.fr/PhD_subject_Predi-CLL_ITMO_2024.pdf
Mission confiée
The development of targeted therapies has allowed considerable progress in the treatment of many cancers, but their efficacy is dependent on intra-tumor heterogeneity. In lymphomas and leukemias, the identification of gene alterations by high-throughput sequencing allows the characterization of this heterogeneity. In healthy B cells, the maturation process provides a unique sequence of DNA, called VDJ genes, encoding for the immune repertoire of the antigen receptor (BCR) by combining 3 immunoglobulin chains V, D and J. In contrast, in hemopathies, every B cell in the initial leukemic clone (i.e. population of tumor cells with the same genome) has the same antigen receptor encoded by a specific VDJ gene sequence. The occurrence of additional mutations in VDJ genes may be responsible for the emergence of subclones with increased antigen receptor reactivity further complicating the clonal heterogeneity of these hemopathies. Leukemic B cells therefore have two levels of heterogeneity: the heterogeneity of cancer genes (a feature shared by any cancer) and the heterogeneity of VDJ genes (a feature specific to leukemia). However, these two levels of clonal heterogeneity and their co-evolution remain poorly characterized and are not considered in the management of these cancers today.
Project description
We propose to develop a mathematical model for the evolution of the two levels of clonal heterogeneity in leukemia, allowing to characterize their evolution from temporal bulk sequencing data of VDJ and cancer genes mutations using a Bayesian approach. We will test the predictive performance of clonal evolution from the inferred model.
Principales activités
Tasks
In this PhD project, we propose to tackle the problem of clonal reconstruction, first from data collected at a single time (already available), and second from longitudinal data. Data will be collected throughout the duration of the PhD thesis.
The main problem consists in reconstructing the phylogenetic tree of mutations and the dynamics of frequencies of each clone. The originality comes from the fact that data are heterogeneous: we will have the full profile of VDJ mutations of clones with frequencies and each cancer genes variants with allele frequencies. From the mathematical modeling perspective, VDJ data share common features with single-cell data since full sequences can be reconstructed using tools like MiXCR (https://mixcr.com). Existing packages for clonal heterogeneity analysis are B-SCITE (Malikic et al., 2017) and ddClone (Salehi et al., 2017). They are able to deal with both types of data (bulk and single-cell) and could in principle be used here. However, there are specificities of CLL that do not fit into these methods.
The PhD student will first construct a probabilistic model accounting for all the data. This model will contain the phylogenetic tree as latent variable, where each node in the tree corresponds either to a VDJ mutation, a mutation of cancer genes, or a chromosomic alteration, where each mutation occurs only once in the tree. The observations will then be obtained, following the classical rules of the infinitely many sites model, as linear combi- nations of the frequency of every clone in the sample (which are other latent variables), possibly with some noise.
Treating latent variables as parameters, we could use the maximum likelihood method, but maximization is a difficult problem in practice due to the very large number of possible trees. We will test genetic algorithms (Metropolis-Hastings, MCMC...), but we expect better results using a Bayesian approach, combined with a variational method to maximize the a posteriori likelihood.
Second, the PhD student will validate the method from data simulated from our model, then using the benchmark simulation tool proposed by Foglierini et al. (2020), adapting them to our double heterogeneity context, and finally comparing with single-cell sequencing data of 3D in vitro cultures of proliferating cells that will be collected all along the project. Prediction performances will also be tested.
Finally, we will try to detect if groups of patients have similar mutational patterns (such as phylogenetic tree topology), which could correspond to a similar tumorigenesis, or a similar stage of progression, or a similar response to treatments. This is a clustering problem that can be addressed by model-free artificial intelligence tools (such as latent Dirichlet allocation: Pritchard et al., 2000; Falush et al., 2003), or using models like those developed by Beerenwinkel et al. (2004, 2005). This will allow us to build a predictive model of treatment efficiency given the clonal heterogeneity of a patient, that can be used by clinicians in a context of personalized medicine.
Compétences
Skills
The candidate should have skills in statistics and/or stochastic modeling. R, Python or Matlab programming skills are also required. An affinity or experience with medical applications will be highly appreciated.
Keywords: Applied probability, stochastic modeling, statistical modeling for medicine, variational Bayesian methods, clonal heterogeneity, chronic lymphocytic leukemia
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Rémunération
2100€ gross/month the 1st year
Informations générales
- Thème/Domaine : Modélisation et commande pour le vivant
- Ville : Vandœuvre-lès-Nancy
- Centre Inria : Centre Inria de l'Université de Lorraine
- Date de prise de fonction souhaitée : 2024-10-01
- Durée de contrat : 3 ans
- Date limite pour postuler : 2024-05-28
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postuler
Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
Contacts
- Équipe Inria : SIMBA
-
Directeur de thèse :
Herbach Ulysse / ulysse.herbach@inria.fr
L'essentiel pour réussir
Bibliography
-
▶ Malikic, S. et al. (2019). Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. Nature communications, 10(1), 1-12. https://doi.org/10. 1038/s41467-019-10737-5
-
▶ Salehi, S. et al. (2017). ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data. Genome biology, 18(1), 1-18. https://doi.org/10. 1186/s13059-017-1169-3
-
▶ Foglierini, M. et al. (2020). AncesTree: An interactive immunoglobulin lineage tree visualizer. PLoS computational biology, 16(7), e1007731. https://doi.org/10.1371/journal.pcbi. 1007731
-
▶ Pritchard, J. K. et al. (2000). Inference of population structure using multilocus genotype data. Genetics, 155(2), 945-959. https://doi.org/10.1093/genetics/155.2.945
-
▶ Falush, D. et al. (2003). Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics, 164(4), 1567-1587. https: //doi.org/10.1093/genetics/164.4.1567
-
▶ Beerenwinkel, N. et al. (2004). Learning multiple evolutionary pathways from cross-sectional data. In Proceedings of the eighth annual international conference on Research in computa- tional molecular biology (pp. 36-44). https://doi.org/10.1145/974614.974620
-
▶ Beerenwinkel, N. et al. (2005). Mtreemix: a software package for learning and using mixture models of mutagenetic trees. Bioinformatics, 21(9), 2106-2107. https://doi.org/ 10.1093/bioinformatics/bti274
A propos d'Inria
Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.