Analysis of Bayesian and frequentist strategies for sequential resource allocation.

Authors

KAUFMANN Emilie
CAPPE Olivier
GARIVIER Aurelien
MARIN Jean michel
BONALD Thomas
BIAU Gerard
CESA BIANCHI Nicolo
CATONI Olivier
MUNOS Remi

Publication date

2014

Publication type

Thesis

Summary In this thesis, we study sequential resource allocation strategies. The statistical model adopted in this framework is the multi-arm stochastic bandit. In this model, when an agent draws an arm of the bandit, it receives as a reward a realization of a probability distribution associated with the arm. We are interested in two different bandit problems: maximizing the sum of rewards and identifying the best arms (where the agent tries to identify the arm(s) leading to the best average reward, without suffering any loss when it draws a "bad" arm). We focus on proposing arm drawing strategies, also known as bandit algorithms, for these two objectives that can be described as optimal. The maximization of rewards is equivalent to the minimization of a quantity called regret. Thanks to an asymptotic lower bound on the regret of a uniformly efficient strategy established by Lai and Robbins, we can define the notion of an asymptotically optimal algorithm as one whose regret reaches this lower bound. In this thesis, we propose for two Bayesian inspired algorithms, Bayes-UCB and Thompson Sampling, a finite time analysis in the framework of binary reward bandit models, i.e. a non-asymptotic augmentation of their regret. This majorization allows us to establish the asymptotic optimality of both algorithms. In the context of the identification of the best arms, one can seek to determine the total number of samples of the arms needed to identify, with high probability, the best arm(s), without the constraint of maximizing the sum of the observations. We define two complexity terms for the identification of the best arms in two settings considered in the literature, which correspond to a fixed budget or a fixed confidence level. We propose new lower bounds on these complexities, and we analyze new algorithms, some of which reach the lower bounds in particular cases of two-armed bandit models, and can thus be called optimal.

See the publication

Topics of the publication

Themes detected by scanR from retrieved publications. For more information, see https://scanr.enseignementsup-recherche.gouv.fr