Asynchronous optimization for machine learning.

Authors
  • LEBLOND Remi
  • BACH Francis
  • LACOSTE JULIEN Simon
  • VERT Jean philippe
  • BACH Francis
  • LACOSTE JULIEN Simon
  • VERT Jean philippe
  • DUCHI John c.
  • DAUME III Hal
  • GRAMFORT Alexandre
  • DUCHI John c.
  • DAUME III Hal
Publication date
2018
Publication type
Thesis
Summary The combined explosions of computational power and the amount of available data have made algorithms the new limiting factors in machine learning. The objective of this thesis is therefore to introduce new methods capable of taking advantage of large amounts of data and computational resources. We present two independent contributions. First, we develop fast optimization algorithms, adapted to the advances in parallel computing architecture to handle massive amounts of data. We introduce an analytical framework for asynchronous parallel algorithms, which allows us to make correct and simple proofs. We demonstrate its usefulness by analyzing the convergence and speedup properties of two new algorithms. Asaga is a parsimonious asynchronous parallel variant of Saga, a variance-reduced algorithm that has a fast linear convergence rate in the case of a smooth and strongly convex objective. Under the right conditions, Asaga is linearly faster than Saga, even in the absence of parsimony. ProxAsaga is an extension of Asaga to the more general case where the regularization term is not smooth. ProxAsaga also obtains a linear acceleration. We have performed extensive experiments to compare our algorithms to the state of the art. Second, we present new methods suitable for structured prediction. We focus on recurrent neural networks (RNNs), whose traditional training algorithm - based on the maximum likelihood principle (MLP) - has several limitations. The associated cost function ignores the information contained in the structured metrics. Moreover, it causes discrepancies between training and prediction. We therefore propose SeaRNN, a new training algorithm for RNNs inspired by the "learning to search" approach. SeaRNN relies on a state space exploration to define global-local cost functions, closer to the evaluation metric than the MLE objective. Models trained with SeaRNN perform better than those learned via MLE on three challenging tasks, including machine translation. Finally, we study the behavior of these models and perform a detailed comparison of our new approach to related research.
Topics of the publication
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr