On unsupervised learning in high dimension.

Authors
  • SEBBAR Mehdi
  • DALALYAN Arnak s.
  • TSYBAKOV Alexandre b.
  • DALALYAN Arnak s.
  • TSYBAKOV Alexandre b.
  • RIVOIRARD Vincent
  • MARTEAU Cl?ment
  • MEZIANI Katia
  • ROLET Philippe
  • RIVOIRARD Vincent
  • MARTEAU Cl?ment
Publication date
2017
Publication type
Thesis
Summary In this thesis, we address two topics, high dimensional clustering on the one hand and mixture density estimation on the other hand. The first chapter is an introduction to clustering. We present different methods and we focus on one of the main models of our work which is the Gaussian mixture. We also discuss the problems inherent to high dimensional estimation and the difficulty of estimating the number of clusters. We briefly explain here the concepts discussed in this manuscript. Let's consider a mixed distribution of K Gaussians in R^p. One of the common approaches to estimate the parameters of the mixture is to use the maximum likelihood estimator. Since this problem is not convex, the convergence of classical methods cannot be guaranteed. However, by exploiting the biconvexity of the negative log-likelihood, we can use the iterative procedure 'Expectation-Maximization' (EM). Unfortunately, this method is not well suited to address the challenges posed by high dimensionality. Moreover, this method requires to know the number of clusters. Chapter 2 presents three methods that we have developed to try to solve the problems described above. The work presented here has not been extensively researched for various reasons. The first method, 'graphical lasso on Gaussian mixtures', consists in estimating the inverse matrices of the covariance matrices under the assumption that they are parsimonious. We adapt the graphical lasso method of [Friedman et al., 2007] on a component in the case of a mixture and we experimentally evaluate this method. The other two methods address the problem of estimating the number of clusters in the mixture. The first one is a penalized estimation of the posterior probability matrix whose component (i,j) is the probability that the i-th observation is in the j-th cluster. Unfortunately, this method proved to be too expensive in complexity. Finally, the second method considered consists in penalizing the weight vector in order to make it parsimonious. This method shows promising results. In Chapter 3, we study the maximum likelihood estimator of a density of n observations i.e. under the hypothesis that it is well approximated by a mixture of several given densities. We are interested in the performance of the estimator with respect to the Kullback-Leibler loss. We establish risk bounds in the form of exact oracle inequalities, either in probability or in expectation. We show through these bounds that, in the case of the convex aggregation problem, the maximum likelihood estimator reaches the speed (log K)/n)^{1/2}, which is optimal to within one logarithmic term, when the number of components is larger than n^{1/2}. More importantly, under the additional assumption that the Gram matrix of the dictionary components satisfies the compatibility condition, the obtained oracle inequalities give the optimal speed in the parsimonious scenario. In other words, if the weight vector is (almost) D-parcimonious, we obtain a speed (Dlog K)/n. In addition to these oracle inequalities, we introduce the notion of (almost)-D-parcimonious aggregation and establish the corresponding lower bounds for this type of aggregation. Finally, in Chapter 4, we propose an algorithm that performs the Kullback-Leibler aggregation of dictionary components as studied in Chapter 3. We compare its performance with different methods. We then propose a method to build the density dictionary and study it numerically. This thesis was carried out within the framework of a CIFRE agreement with the company ARTEFACT.
Topics of the publication
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr