Statistical modeling and analysis of Internet latency traffic data.

Authors
Publication date
2020
Publication type
Thesis
Summary The speed of information exchange in the Internet network is measured using latency: a time that measures the time elapsed between the sending of the first bit of information of a request and the reception of the first bit of information of the response. In this thesis realized in collaboration with Citrix, we are interested in the study and modeling of latency data in a context of Internet traffic optimization. Citrix collects data through two different channels, generating latency measures suspected to share common properties. In a first step, we address a distributional fitting problem where the co-variates and the responses are probability measures imaged from each other by a deterministic transport, and the observables are independent samples drawn according to these laws. We propose an estimator of this transport and show its convergence properties. We show that our estimator can be used to match the distributions of the latency measures generated by the two channels.In a second step we propose a modeling strategy to predict the process obtained by computing the moving median of the latency measures on regular partitions of the interval [0, T] with a mesh size D > 0. We show that the conditional mean of this process, which plays a major role in Internet traffic optimization, is correctly described by a Fourier series decomposition and that its conditional variance is organized in clusters that we model using an ARMA Seasonal-GARCH process, i.e., an ARMA-GARCH process with added deterministic seasonal terms. The predictive performance of this model is compared to the reference models used in the industry. A new measure of the amount of residual information not captured by the model based on a certain entropy criterion is introduced.We then address the problem of fault detection in the Internet network. We propose an algorithm for detecting changes in the distribution of a stream of latency data based on the comparison of two sliding windows using a certain weighted Wasserstein distance.Finally, we describe how to select the training data of predictive algorithms in order to reduce their size to limit the computational cost without impacting the accuracy.
Topics of the publication
Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr