Machine learning based on Hawkes processes and stochastic optimization.

Authors
Publication date
2019
Publication type
Thesis
Summary The common thread of this thesis is the study of Hawkes processes. These point processes decipher the inter-causality that can occur between several series of events. Concretely, they determine the influence that the events of one series have on the future events of all other series. For example, in the context of social networks, they describe how likely a user's action, such as a Tweet, will be to trigger reactions from others.The first chapter is a brief introduction to point processes followed by a deeper look at Hawkes processes and in particular the properties of the most commonly used exponential kernel parameterization. In the next chapter, we introduce an adaptive penalty to model, with Hawkes processes, the propagation of information in social networks. This penalty is able to take into account a priori knowledge of the characteristics of these networks, such as sparse interactions between users or community structure, and reflect them on the estimated model. Our technique uses weighted penalties whose weights are determined by a fine-grained analysis of the generalization error.Next, we discuss convex optimization and the progress made with first order stochastic methods with variance reduction. The fourth chapter is dedicated to the adaptation of these techniques to optimize the data attachment term most commonly used with Hawkes processes. Indeed, this function does not verify the gradient-Lipschitz hypothesis usually used. Thus, we work with another regularity assumption, and obtain a linear convergence rate for a lagged version of Stochastic Dual Coordinate Ascent that improves the state of the art. Moreover, such functions have many linear constraints that are frequently violated by classical first-order algorithms, but in their dual version these constraints are much easier to satisfy. Thus, the robustness of our algorithm is more comparable to that of second-order methods which are prohibitively expensive in high dimensions.Finally, the last chapter presents a new statistical learning library for Python 3 with a particular focus on temporal models. Called tick, this library relies on a C++ implementation and state-of-the-art optimization algorithms to perform very fast estimates in a multi-core environment. Published on Github, this library has been used throughout this thesis to perform experiments.
Topics of the publication
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr