2018-00835 - Post-doctoral Position/ Implementing a meta-mining framework for the exploration of complex data
Le descriptif de l’offre ci-dessous est en Anglais

Niveau de diplôme exigé : Thèse ou équivalent

Fonction : Post-Doctorant

Contexte et atouts du poste

Context and positioning:
This project aims to study knowledge discovery systems at a foundational level to understand how the knowledge discovery process should be carried out in view of the data and the mining methods available. There is usually a variety of algorithms to choose from and some criteria to guide our choice. However, there is no  clear strategy for combining them that takes into account the relationship between datasets and methods at work. This strategic information should be extracted from data, analysed and evaluated, used for  descriptive purposes and reused to guide the strategic combination of mining methods.

Case studies are found in biomedical sciences, in particular, in the “omics” area (genomics, transcriptomics, lipidomics, etc.), where  data sets that are complex and highly heterogeneous. In the case of biomedical data a problem could be to identify homogeneous groups of patients with certain diseases and thus contribute to better diagnosis and to efficient treatments by identifying the key drivers of the disease. A challenge is then how to choose the knowledge discovery approaches (symbolic/numerical, unsupervised/supervised) to employ and how to combine them for mining this complex and heretogeneous data. There is usually a variety of algorithms to choose from and some criteria to guide our choice. However, there is no clear strategy for combining them that takes into account the relationship between datasets and methods at work. This strategic information should be extracted from data, analysed and evaluated, and used for descriptive purposes and to guide the strategic combination of mining methods.

Mission confiée

Main objectives and Assignments:
Inspired by the frameworks of meta-learning [1,2] through knowledge-based mining and Exploratory Data Analysis (EDA) [3], we aim to defining an operational and reusable framework for hybrid exploratory knowledge discovery. More precisely, the hired post-doctoral candidate is expected to implement a interactive workflow integrating the following modules:
Module I - Cluster analysis. This module aims to identify plausible and meaningful clusters (classes) of individuals from unsupervised heterogeneous data, that may be incomplete and from different sources. We will propose a novel approach based on unsupervised random forests (URF) [4] that was recently improved [5] and that outperforms existing methods in running time while giving similar or better clusterings on numerical data. However, the adaptation to heterogeneous data remains a challenging problem.  
Module II - Strategic combination of mining methods. This module is implemented within a meta-learning framework for a strategic selection [2] and combination of mining methods [6] based on characteristics of the datasets, areas (features) of expertise of the different methods considered as well as corresponding performance measures. We will consider both numerical methods as well as symbolic methods such as pattern mining and formal concept analysis.

Bibliography:

[1] P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta, editors. Meta-learning : applications to data mining, Springer, 2009.
[2] K. A. Smith-Miles. Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys, 41(1) :6, 2008.
[3] P. F. Vellemand, D. C. Hoaglin. Application. Basics and Computing of Exploratory Data Analysis. The Internet-First University Press, 2004.
[4] T. Shi and S. Horvath. Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15(1) :118138, 2006.
[5] K. Dalleau, M. Smail-Tabbone, M.Couceiro. Unsupervised Extremely Randomized Trees. To appear PAKDD 2018.
https://hal.inria.fr/hal-01667317
[6] M. Wozniak , M. Graña , E. Corchado, A survey of multiple classifier systems as hybrid systems, Information Fusion, 16 (2014) 317.

Principales activités

Main activities:

  • Propose theoretical solutions for bridging the gap between numerical and symbolical computing
  • Develop programs/applications/interfaces
  • Write and publish research papers
  • Write reports

 

Compétences

The ideal candidate holds a PhD in computer science and/or applied mathematics, is familiar with knowledge discovery techniques (preferably, both symbolic and numerical) and/or decision making tools.
He/she should be acquainted with programming languages, preferably Python, as part of the work entails the implementation of the frameworks developed in course of this post-doctoral work.

He/she should have good English skills.

Avantages sociaux

  • Subsidised catering service
  • Partially-reimbursed public transport
  • Social security
  • Paid leave

Rémunération

Salary: 2653€ gross/month