Master Internship - Revisiting PCA with norm-ratio sparsity penalties

Contract type : Internship

Level of qualifications required : Master's or equivalent

Fonction : Internship Research

Level of experience : Recently graduated

About the research centre or Inria department

The Inria Saclay-Île-de-France Research Centre was established in 2008. It has developed as part of the Saclay site in partnership with Paris-Saclay University and with the Institut Polytechnique de Paris .

The centre has 40 project teams , 32 of which operate jointly with Paris-Saclay University and the Institut Polytechnique de Paris; Its activities occupy over 600 people, scientists and research and innovation support staff, including 44 different nationalities.

 

Context

In the context of the ERC MAJORIS, and in collaboration with IFPEN company, the aim of this internship is to investigate the problem of sparse principal component analysis (PCA), with norm-ratio sparsifying penalties.

Subject: 

Principal component analysis (PCA) is a workhorse in linear dimensionality reduction [Jol02]. It is widely applied in exploratory data analysis, visualization, data preprocessing).
Principal components are usually linear combinations of all input variables. For high-dimension data, this may involve input variables that contribute very little to the understanding. Finding the few directions in space that explain best observations is desirable. Sparse PCA overcomes this disadvantage by finding linear combinations that contain just a few input variables, by adding sparsity constraints [CR24,ZX18]. One of such is formulated (cf. lasso) with the help of an absolute norm penalty/regularization. In [MBPS10], one designs this matrix factorization problem as:


minimize_{\alpha} || X - D \alpha ||^2_F + \lambda|| \alpha ||_{1,1}


where: X = [x_1,...,x_n] is the matrix of data vectors; D is a square matrix from a suitable basis set, ||.||_F denotes the Frobenius norm; ||.||_{1,1} denotes the sum of the magnitude of matrix coefficients, \lambda is a positive penalty weight.

A penalty such as ||.||_{1,1} is 1-homogeneous. This may only weakly emulate the sheer count of non-zero entries of a matrix, that would be scale-invariant or 0-homogeneous. 


Recently, the SOOT/SPOQ family of penalties has been developed in our research group, as smooth emulations to the scale-invariant lp/lq norm ratios. The latter had been used for a while, as stopping-criteria, penalties or ``continuous'' sparsity count estimators [HR09]. They have been used successfully for the restoration/deconvolution/source separation of sparse signals [CCDP20,RPD+15].

The goal of the internship is to investigate the resolution of sparse PCA models, by replacing the standard l1 norm by such norm ratios. Convergence analysis of the proposed optimization algorithm, imlementation and validation over public benchmarks will be conducted. 

[CCDP20] Afef Cherni, Emilie Chouzenoux, Laurent Duval, and Jean-Christophe Pesquet. SPOQ ℓp-over-ℓq regularization for sparse signal
recovery applied to mass spectrometry. IEEE Trans. Signal Process., 68:6070–6084, 2020.
[CR24] Fan Chen and Karl Rohe. A new basis for sparse principal component analysis. J. Comp. Graph. Stat.), 33(2):421–434, 2024.
[HR09] N. Hurley and S. Rickard. Comparing measures of sparsity. IEEE Trans. Inform. Theory, 55(10):4723–4741, Oct. 2009.
[Jol02] I. T. Jolliffe. Principal component analysis. Springer Series in Statistics, 2nd edition, 2002.
[MBPS10] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online learning for matrix factorization and sparse coding. J. Mach.
Learn. Res., 11:19–60, 2010.
[RPD+15] A. Repetti, M. Q. Pham, L. Duval, E. Chouzenoux, and J.-C. Pesquet. Euclid in a taxicab: Sparse blind deconvolution with smoothed
ℓ1/ℓ2 regularization. IEEE Signal Process. Lett., 22(5):539–543, May 2015.
[ZCD23] Paul Zheng, Emilie Chouzenoux, and Laurent Duval. PENDANTSS: PEnalized Norm-ratios Disentangling Additive Noise, Trend
and Sparse Spikes. IEEE Signal Process. Lett., 30:215–219, 2023.
[ZX18] Hui Zou and Lingzhou Xue. A selective overview of sparse principal component analysis. Proc. IEEE, 106(8):1311–1320, August
2018.

Assignment

Missions: The goal of this subject is to:
• investigate potential derivations using SOOT/SPOT penalties,
• implement the algorithmic work-flow in a scientific toolkit (eg scikit-learn),
• benchmark it against competing methods.

Environment: The intern will be supervised by Emilie Chouzenoux (Head of OPIS team, Inria Saclay) and Laurent Duval (Research Engineer, IFPEN, Rueil Malmaison). The intern student will join the Inria Saclay team OPIS (https://opis-inria.eu/). He/she will be located in the Centre de la Vision Numérique, in CentraleSupélec campus, Saclay, France. He/she will enjoy an international and creative environment where research seminars and reading groups take place very often. Informatic material expenses will be covered within the limits of the scale in force.

Organization: The proposed offer is dedicated to internship of Master 1 / Master 2 / Engineering students. The starting/end dates are flexible, with a minimum duration of 4 months.

Main activities

Main activities :

Bibliographical study

Programming in Python environment

Benchmark on public datasets

Scientific meetings

Writing of scientific reports

Skills

Languages : The candidate must be fluent in english and/or french languages.

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration

Gratification