Asynchronous optimization for machine learning.

Authors

LEBLOND Remi
BACH Francis
LACOSTE JULIEN Simon
VERT Jean philippe
BACH Francis
LACOSTE JULIEN Simon
VERT Jean philippe
DUCHI John c.
DAUME III Hal
GRAMFORT Alexandre
DUCHI John c.
DAUME III Hal

Publication date

2018

Publication type

Thesis

Summary The combined explosions of computational power and the amount of available data have made algorithms the new limiting factors in machine learning. The objective of this thesis is therefore to introduce new methods capable of taking advantage of large amounts of data and computational resources. We present two independent contributions. First, we develop fast optimization algorithms, adapted to the advances in parallel computing architecture to handle massive amounts of data. We introduce an analytical framework for asynchronous parallel algorithms, which allows us to make correct and simple proofs. We demonstrate its usefulness by analyzing the convergence and speedup properties of two new algorithms. Asaga is a parsimonious asynchronous parallel variant of Saga, a variance-reduced algorithm that has a fast linear convergence rate in the case of a smooth and strongly convex objective. Under the right conditions, Asaga is linearly faster than Saga, even in the absence of parsimony. ProxAsaga is an extension of Asaga to the more general case where the regularization term is not smooth. ProxAsaga also obtains a linear acceleration. We have performed extensive experiments to compare our algorithms to the state of the art. Second, we present new methods suitable for structured prediction. We focus on recurrent neural networks (RNNs), whose traditional training algorithm - based on the maximum likelihood principle (MLP) - has several limitations. The associated cost function ignores the information contained in the structured metrics. Moreover, it causes discrepancies between training and prediction. We therefore propose SeaRNN, a new training algorithm for RNNs inspired by the "learning to search" approach. SeaRNN relies on a state space exploration to define global-local cost functions, closer to the evaluation metric than the MLE objective. Models trained with SeaRNN perform better than those learned via MLE on three challenging tasks, including machine translation. Finally, we study the behavior of these models and perform a detailed comparison of our new approach to related research.

See the publication

Topics of the publication

Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr