KAUFMANN Emilie

< Back to ILB Patrimony

Topics of productions

Affiliations

2015 - 2020

Séquential learning
2020 - 2021

Centrale Lille Institut
2015 - 2020

Centre de recherche en informatique, signal et automatique de Lille
2017 - 2018

Manouba University
2012 - 2014

Télécom ParisTech
2012 - 2016

Laboratoire traitement et communication de l'information
2013 - 2014

Informatique, telecommunications et electronique de paris
2012 - 2013

Laboratoire traitement du signal et de l'image

2021
A kernel-based approach to non-stationary reinforcement learning in metric spaces.
Episodic reinforcement learning in finite MDPs: Minimax lower bounds revisited.
Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models.
Top-m identification for linear bandits.
2020
Fast active learning for pure exploration in reinforcement learning.
Sub-sampling for Efficient Non-Parametric Bandit Exploration.
Machine learning applications in drug development.
Adaptive Reward-Free Exploration.
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity.
On Multi-Armed Bandit Designs for Dose-Finding Trials.
Regret bounds for kernel-based reinforcement learning.
2019
A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players.
Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit models.
Non-asymptotic analysis of a sequential breakpoint detection test and application to non-stationary bandits.
Fixed-confidence guarantees for Bayesian best-arm identification.
A simple dynamic bandit algorithm for hyper-parameter tuning.
Asymptotically optimal algorithms for budgeted multiple play bandits.
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling.
General parallel optimization without a metric.
The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits.
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits.
2018
A spectral algorithm with additive clustering for the recovery of overlapping communities in networks.
Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence.
Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals.
Adaptive black-box optimization got easier: HCT only needs local smoothness.
Multi-Player Bandits Revisited.
Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings.
A spectral algorithm with additive clustering for the recovery of overlapping communities in networks.
Aggregation of multi-armed bandits learning algorithms for opportunistic spectrum access.
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling.
Learning for large-scale parallel platform control.
What Doubling Tricks Can and Can't Do for Multi-Armed Bandits.
Corrupt Bandits for Preserving Local Privacy.
2017
Monte-Carlo Tree Search by Best Arm Identification.
On Bayesian index policies for sequential resource allocation.
A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks.
Learning the distribution with largest mean: two bandit frameworks.
2016
A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks.
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models.
Maximin Action Identification: A New Bandit Framework for Games.
On Explore-Then-Commit Strategies.
Optimal Best Arm Identification with Fixed Confidence.
2014
On the Complexity of A/B Testing.
Analysis of bayesian and frequentist strategies for sequential resource allocation.
Analysis of Bayesian and frequentist strategies for sequential resource allocation.
2013
Information Complexity in Bandit Subset Selection.
Thompson Sampling for one-dimensial exponential family bandits.
Thompson sampling for one-dimensional exponential family bandits.

A kernel-based approach to non-stationary reinforcement learning in metric spaces.
Omar DOMINGUES, Pierre MENARD, Matteo PIROTTA, Emilie KAUFMANN, Michal VALKO
International Conference on Artificial Intelligence and Statistics | 2021

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in nonstationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which quantifies its level of non-stationarity. Our method generalizes previous approaches based on sliding windows and exponential discounting used to handle changing environments. We further propose a practical implementation of KeRNS, we analyze its regret and validate it experimentally.

More informations
Episodic reinforcement learning in finite MDPs: Minimax lower bounds revisited.
Omar DOMINGUES, Pierre MENARD, Emilie KAUFMANN, Michal VALKO
Algorithmic Learning Theory | 2021

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a lower bound of Ω((H 3 SA/ε 2) log(1/δ)) on the sample complexity of an (ε, δ)-PAC algorithm for best policy identification in a non-stationary MDP, relying on a construction of "hard MDPs" which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the Ω(√ H 3 SAT) regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.

More informations
Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models.
Aurelien GARIVIER, Emilie KAUFMANN
Sequential Analysis | 2021

No summary available.

More informations See the publication
Top-m identification for linear bandits.
Clemence REDA, Emilie KAUFMANN, Andree DELAHAYE DURIEZ
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021 | 2021

Motivated by an application to drug repurposing, we propose the first algorithms to tackle the identification of the m ≥ 1 arms with largest means in a linear bandit model, in the fixed-confidence setting. These algorithms belong to the generic family of Gap-Index Focused Algorithms (GIFA) that we introduce for Top-m identification in linear bandits. We propose a unified analysis of these algorithms, which shows how the use of features might decrease the sample complexity. We further validate these algorithms empirically on simulated data and on a simple drug repurposing task.

More informations
Fast active learning for pure exploration in reinforcement learning.
Pierre MENARD, Omar DOMINGUES, Anders JONSSON, Emilie KAUFMANN, Edouard LEURENT, Michal VALKO
2020

Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on exploring efficiently. The exploration remains a challenge while it has been addressed with many hand-tuned heuristics with different levels of generality on one side, and a few theoretically backed exploration strategies on the other. Many of them are incarnated by intrinsic motivation and in particular explorations bonuses. A common rule of thumb for exploration bonuses is to use 1/ √ n bonus that is added to the empirical estimates of the reward, where n is a number of times this particular state (or a state-action pair) was visited. We show that, surprisingly, for a pure-exploration objective of reward-free exploration, bonuses that scale with 1/n bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon H. Furthermore, we show that with an improved analysis of the stopping time, we can improve by a factor H the sample complexity in the best-policy identification setting, which is another pure-exploration objective, where the environment provides rewards but the agent is not penalized for its behavior during the exploration phase.

More informations
Sub-sampling for Efficient Non-Parametric Bandit Exploration.
Dorian BAUDRY, Emilie KAUFMANN, Odalric ambrym MAILLARD
NeurIPS 2020 | 2020

In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling Duelling Algorithms (SDA) which combines the sub-sampling idea first used by the BESA [1] and SSMC [2] algorithms with different sub-sampling schemes. In particular, RB-SDA uses Random Block sampling. We perform an experimental study assessing the flexibility and robustness of this promising novel approach for exploration in bandit models.

More informations
Machine learning applications in drug development.
Clemence REDA, Emilie KAUFMANN, Andree DELAHAYE DURIEZ
Computational and Structural Biotechnology Journal | 2020

No summary available.

More informations See the publication
Adaptive Reward-Free Exploration.
Emilie KAUFMANN, Pierre MENARD, Omar DARWICHE DOMINGUES, Anders JONSSON, Edouard LEURENT, Michal VALKO
2020

Reward-free exploration is a reinforcement learning setting recently studied by Jin et al., who address it by running several algorithms with regret guarantees in parallel. In our work, we instead propose a more adaptive approach for reward-free exploration which directly reduces upper bounds on the maximum MDP estimation error. We show that, interestingly, our reward-free UCRL algorithm can be seen as a variant of an algorithm of Fiechter from 1994 [11], originally proposed for a different objective that we call best-policy identification. We prove that RF-UCRL needs O (SAH 4 /ε 2) log(1/δ) episodes to output, with probability 1 − δ, an ε-approximation of the optimal policy for any reward function. We empirically compare it to oracle strategies using a generative model.

More informations
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity.
Anders JONSSON, Emilie KAUFMANN, Pierre MENARD, Omar DOMINGUES, Edouard LEURENT, Michal VALKO
2020

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.

More informations
On Multi-Armed Bandit Designs for Dose-Finding Trials.
Maryam AZIZ, Emilie KAUFMANN, Marie karelle RIVIERE
2020

We study the problem of finding the optimal dosage in early stage clinical trials through the multi-armed bandit lens. We advocate the use of the Thompson Sampling principle, a flexible algorithm that can accommodate different types of monotonicity assumptions on the toxicity and efficacy of the doses. For the simplest version of Thompson Sampling, based on a uniform prior distribution for each dose, we provide finite-time upper bounds on the number of sub-optimal dose selections, which is unprecedented for dose-finding algorithms. Through a large simulation study, we then show that variants of Thompson Sampling based on more sophisticated prior distributions outperform state-of-the-art dose identification algorithms in different types of dose-finding studies that occur in phase I or phase I/II trials.

More informations
Regret bounds for kernel-based reinforcement learning.
Omar DOMINGUES, Pierre MENARD, Matteo PIROTTA, Emilie KAUFMANN, Michal VALKO
2020

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. Unlike existing approaches with regret guarantees, it does not use any kind of partitioning of the state-action space. For problems with K episodes and horizon H, we provide a regret bound of O H 3 K max(1 2 , 2d 2d+1) , where d is the covering dimension of the joint state-action space. We empirically validate Kernel-UCBVI on discrete and continuous MDPs.

More informations
A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players.
Etienne BOURSIER, Emilie KAUFMANN, Abbas MEHRABIAN, Vianney PERCHET
2019

We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider the challenging heterogeneous setting, in which different arms may have different means for different players, and propose a new, efficient algorithm that combines the idea of leveraging forced collisions for implicit communication and that of performing matching eliminations. We give a finite-time analysis of our algorithm, bounding its regret by O((log T)^{1+\kappa}) for any fixed \kappa>0. If the optimal assignment of players to arms is unique, we further show that it attains the optimal O(log(T)) regret, solving an open question raised at NeurIPS 2018.

More informations
Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit models.
Aurelien GARIVIER, Emilie KAUFMANN
2019

In this paper, we study sequential testing problems with overlapping hypotheses. We first focus on the simple problem of assessing if the mean µ of a Gaussian distribution is $≥ ε− or ≤ε. if µ ∈ (−ε,ε)$, both answers are considered to be correct. Then, we consider PAC-best arm identification in a bandit model: given K probability distributions on R with means $µ_1,. . , µ_K$ , we derive the asymptotic complexity of identifying, with risk at most $δ$, an index $I ∈ {1,. . , K}$ such that $µ_I ≥ max_i µ_i −ε$. We provide non asymptotic bounds on the error of a parallel General Likelihood Ratio Test, which can also be used for more general testing problems. We further propose lower bound on the number of observation needed to identify a correct hypothesis. Those lower bounds rely on information-theoretic arguments, and specifically on two versions of a change of measure lemma (a high-level form, and a low-level form) whose relative merits are discussed.

More informations
Non-asymptotic analysis of a sequential breakpoint detection test and application to non-stationary bandits.
Lilian BESSON, Emilie KAUFMANN
GRETSI 2019 - XXVIIème Colloque francophone de traitement du signal et des images | 2019

We study a test for sequential break detection based on the generalized likelihood ratio (GLR) and expressed as a function of the binary relative entropy. It is applied to the detection of breaks on the mean of a bounded distribution, and we obtain a non-asymptotic control of its false alarm probability and its detection delay. We explain its use for sequential decision making by proposing the GLR-klUCB bandit strategy, which is efficient in piecewise stationary bandit models.

More informations
Fixed-confidence guarantees for Bayesian best-arm identification.
2019

We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS). In particular, we justify its use for fixed-confidence best-arm identification. We further propose a variant of TTTS called Top-Two Transportation Cost (T3C), which disposes of the computational burden of TTTS. As our main contribution, we provide the first sample complexity analysis of TTTS and T3C when coupled with a very natural Bayesian stopping rule, for bandits with Gaussian rewards, solving one of the open questions raised by Russo (2016). We also provide new posterior convergence results for TTTS under two models that are commonly used in practice: bandits with Gaussian and Bernoulli rewards and conjugate priors.

More informations
A simple dynamic bandit algorithm for hyper-parameter tuning.
Xuedong SHANG, Emilie KAUFMANN, Michal VALKO
Workshop on Automated Machine Learning at International Conference on Machine Learning | 2019

Hyper-parameter tuning is a major part of modern machine learning systems. The tuning itself can be seen as a sequential resource allocation problem. As such, methods for multi-armed bandits have been already applied. In this paper, we view hyper-parameter optimization as an instance of best-arm identification in infinitely many-armed bandits. We propose D-TTTS, a new adaptive algorithm inspired by Thompson sampling, which dynamically balances between refining the estimate of the quality of hyper-parameter configurations previously explored and adding new hyper-parameter configurations to the pool of candidates. The algorithm is easy to implement and shows competitive performance compared to state-of-the-art algorithms for hyper-parameter tuning.

More informations
Asymptotically optimal algorithms for budgeted multiple play bandits.
Alexander LUEDTKE, Emilie KAUFMANN, Antoine CHAMBAZ, Alex LUEDTKE
Machine Learning | 2019

We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rate and leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.

More informations See the publication
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling.
Cindy TRINH, Emilie KAUFMANN, Claire VERNADE, Richard COMBES
2019

Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework for regret minimization problems over rank-one matrices of arms. The initially proposed algorithms are proved to have logarithmic regret, but do not match the existing lower bound for this problem. We close this gap by first proving that rank-one bandits are a particular instance of unimodal bandits, and then providing a new analysis of Unimodal Thompson Sampling (UTS), initially proposed by Paladino et al (2017). We prove an asymptotically optimal regret bound on the frequentist regret of UTS and we support our claims with simulations showing the significant improvement of our method compared to the state-of-the-art.

More informations
General parallel optimization without a metric.
Xuedong SHANG, Emilie KAUFMANN, Michal VALKO
Algorithmic Learning Theory | 2019

Hierarchical bandits are an approach for global optimization of extremely irregular functions. This paper provides new elements regarding POO, an adaptive meta-algorithm that does not require the knowledge of local smoothness of the target function. We first highlight the fact that the subroutine algorithm used in POO should have a small regret under the assumption of local smoothness with respect to the chosen partitioning, which is unknown if it is satisfied by the standard subroutine HOO. In this work, we establish such regret guarantee for HCT, which is another hierarchical optimistic optimization algorithm that needs to know the smoothness. This confirms the validity of POO. We show that POO can be used with HCT as a subroutine with a regret upper bound that matches the one of best-known algorithms using the knowledge of smoothness up to a √ log n factor. On top of that, we propose a general wrapper, called GPO, that can cope with algorithms that only have simple regret guarantees. Finally, we complement our findings with experiments on difficult functions.

More informations
The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits.
Lilian BESSON, Emilie KAUFMANN
2019

We propose a new algorithm for the piece-wise i.i.d. non-stationary bandit problem with bounded rewards. Our proposal, GLR-klUCB, combines an efficient bandit algorithm, klUCB, with an efficient, parameter-free, change-point detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. We analyze two variants of our strategy, based on local restarts and global restarts, and show that their regret is upper-bounded by $O(\Upsilon_T \sqrt{T \log(T)})$ if the number of change-points $Υ_T$ is unknown, and by $O(\sqrt{\Upsilon_T T \log(T)})$ if $\Upsilon_T$ is known. This improves the current state-of-the-art bounds, as our algorithm needs no tuning based on knowledge of the problem complexity other than $Υ_T$. We present numerical experiments showing that GLR-klUCB outperforms passively and actively adaptive algorithms from the literature, and highlight the benefit of using local restarts.

More informations
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits.
Alexander LUEDTKE, Emilie KAUFMANN, Antoine CHAMBAZ
Machine Learning Journal | 2019

We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rate and leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.

More informations
A spectral algorithm with additive clustering for the recovery of overlapping communities in networks.
Emilie KAUFMANN, Thomas BONALD, Marc LELARGE
Theoretical Computer Science | 2018

This paper presents a novel spectral algorithm with additive clustering, designed to identify overlapping communities in networks. The algorithm is based on geometric properties of the spectrum of the expected adjacency matrix in a random graph model that we call stochastic blockmodel with overlap (SBMO). An adaptive version of the algorithm, that does not require the knowledge of the number of hidden communities, is proved to be consistent under the SBMO when the degrees in the graph are (slightly more than) logarithmic. The algorithm is shown to perform well on simulated data and on real-world graphs with known overlapping communities.

More informations
Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence.
Maryam AZIZ, Jesse ANDERTON, Emilie KAUFMANN, Javed ASLAM
ALT 2018 - Algorithmic Learning Theory | 2018

We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive and cast results. (2) derive a sample complexity lower bound for near-optimal arm identification. (3) propose an algorithm that identifies a nearly-optimal arm with high probability and derive an upper bound on its sample complexity which is within a log factor of our lower bound. and (4) discuss whether our log^2(1/delta) dependence is inescapable for ``two-phase'' (select arms first, identify the best later) algorithms in the infinite setting. This work permits the application of bandit models to a broader class of problems where fewer assumptions hold.

More informations
Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals.
Emilie KAUFMANN, Wouter KOOLEN
2018

This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model. The deviations are measured using the Kullback-Leibler divergence in a given one-dimensional exponential family, and may take into account several arms at a time. They are obtained by constructing for each arm a mixture martingale based on a hierarchical prior, and by multiplying those martingales. Our deviation inequalities allow us to analyze stopping rules based on generalized likelihood ratios for a large class of sequential identification problems, and to construct tight confidence intervals for some functions of the means of the arms.

More informations
Adaptive black-box optimization got easier: HCT only needs local smoothness.
Xuedong SHANG, Emilie KAUFMANN, Michal VALKO
European Workshop on Reinforcement Learning | 2018

Hierarchical bandits is an approach for global optimization of extremely irregular functions. This paper provides new elements regarding POO, an adaptive meta-algorithm that does not require the knowledge of local smoothness of the target function. We first highlight the fact that the subroutine algorithm used in POO should have a small regret under the assumption of local smoothness with respect to the chosen partitioning, which is unknown if it is satisfied by the standard subroutine HOO. In this work, we establish such regret guarantee for HCT, which is another hierarchical optimistic optimization algorithm that needs to know the smoothness. This confirms the validity of POO. We show that POO can be used with HCT as a subroutine with a regret upper bound that matches the one of best-known algorithms using the knowledge of smoothness up to a √ log n factor.

More informations
Multi-Player Bandits Revisited.
Lilian BESSON, Emilie KAUFMANN
Algorithmic Learning Theory | 2018

Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that sensing information is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, RandTopM and MCTopM, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called Selfish, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.

More informations
Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings.
Remi BONNEFOI, Lilian BESSON, Christophe MOY, Emilie KAUFMANN, Jacques PALICOT
Cognitive Radio Oriented Wireless Networks | 2018

Setting up the future Internet of Things (IoT) networks will require to support more and more communicating devices. We prove that intelligent devices in unlicensed bands can use Multi-Armed Bandit (MAB) learning algorithms to improve resource exploitation. We evaluate the performance of two classical MAB learning algorithms, UCB1 and Thompson Sampling, to handle the decentralized decision-making of Spectrum Access, applied to IoT networks. as well as learning performance with a growing number of intelligent end-devices. We show that using learning algorithms does help to fit more devices in such networks, even when all end-devices are intelligent and are dynamically changing channel. In the studied scenario, stochastic MAB learning provides a up to 16% gain in term of successful transmission probabilities, and has near optimal performance even in non-stationary and non-i.i.d. settings with a majority of intelligent devices.

More informations See the publication
A spectral algorithm with additive clustering for the recovery of overlapping communities in networks.
Emilie KAUFMANN, Thomas BONALD, Marc LELARGE
Theoretical Computer Science | 2018

This paper presents a novel spectral algorithm with additive clustering designed to identify overlapping communities in networks. The algorithm is based on geometric properties of the spectrum of the expected adjacency matrix in a random graph model that we call stochastic blockmodel with overlap (SBMO). An adaptive version of the algorithm, that does not require the knowledge of the number of hidden communities, is proved to be consistent under the SBMO when the degrees in the graph are (slightly more than) logarithmic. The algorithm is shown to perform well on simulated data and on real-world graphs with known overlapping communities.

More informations See the publication
Aggregation of multi-armed bandits learning algorithms for opportunistic spectrum access.
Lilian BESSON, Emilie KAUFMANN, Christophe MOY
2018 IEEE Wireless Communications and Networking Conference (WCNC) | 2018

Multi-armed bandit algorithms have been recently studied and evaluated for Cognitive Radio (CR), especially in the context of Opportunistic Spectrum Access (OSA). Several solutions have been explored based on various models, but it is hard to exactly predict which could be the best for real-world conditions at every instants. Hence, expert aggregation algorithms can be useful to select on the run the best algorithm for a specific situation. Aggregation algorithms, such as Exp4 dating back from 2002, have never been used for OSA learning, and we show that it appears empirically sub-efficient when applied to simple stochastic problems. In this article, we present an improved variant, called Aggregator. For synthetic OSA problems modeled as Multi-Armed Bandit (MAB) problems, simulation results are presented to demonstrate its empirical efficiency. We combine classical algorithms, such as Thompson sampling, Upper-Confidence Bounds algorithms (UCB and variants), and Bayesian or Kullback-Leibler UCB. Our algorithm offers good performance compared to state-of-the-art algorithms (Exp4, CORRAL or LearnExp), and appears as a robust approach to select on the run the best algorithm for any stochastic MAB problem, being more realistic to real-world radio settings than any tuning-based approach.

More informations See the publication
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling.
Emilie KAUFMANN, Wouter KOOLEN, Aurelien GARIVIER
Advances in Neural Information Processing Systems (NIPS) | 2018

Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. We complement our theoretical guarantees by experiments showing that MS works best in practice.

More informations
Learning for large-scale parallel platform control.
Valentin REIS, Denis TRYSTRAM, Jerome LELONG, Arnaud LEGRAND, Emilie KAUFMANN, Kim thang NGUYEN, Alfredo GOLDMAN, Michela TAUFER
2018

Providing the computing infrastructures needed to solve the complex problems of modern society is a strategic challenge. Organizations traditionally respond to this challenge by setting up large parallel and distributed computing infrastructures. Vendors of High Performance Computing systems are driven by competition to produce ever more computing and storage power, leading to specific and sophisticated "Petascale" platforms, and soon to "Exascale" machines. These systems are centrally managed with the help of job management software solutions and dedicated resources. A special problem that these software solutions address is the scheduling problem, where the resource manager must choose when, and on which resources, to execute which computational task. This thesis provides solutions to this problem. All platforms are different. Indeed, their infrastructure, the behavior of their users and the objectives of the host organization vary. We therefore argue that scheduling policies must adapt to the behavior of the systems. In this paper, we present several ways to achieve this adaptability. Through an experimental approach, we study several tradeoffs between the complexity of the approach, the potential gain, and the risks taken.

More informations See the publication
What Doubling Tricks Can and Can't Do for Multi-Armed Bandits.
Lilian BESSON, Emilie KAUFMANN
2018

An online reinforcement learning algorithm is anytime if it does not need to know in advance the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from any non-anytime algorithm is the "Doubling Trick". In the context of adversarial or stochastic multi-armed bandits, the performance of an algorithm is measured by its regret, and we study two families of sequences of growing horizons (geometric and exponential) to generalize previously known results that certain doubling tricks can be used to conserve certain regret bounds. In a broad setting, we prove that a geometric doubling trick can be used to conserve (minimax) bounds in $R_T = O(\sqrt{T})$ but cannot conserve (distribution-dependent) bounds in $R_T = O(\log T)$. We give insights as to why exponential doubling tricks may be better, as they conserve bounds in $R_T = O(\log T)$, and are close to conserving bounds in $R_T = O(\sqrt{T})$.

More informations
Corrupt Bandits for Preserving Local Privacy.
Pratik GAJANE, Tanguy URVOY, Emilie KAUFMANN
ALT 2018 - Algorithmic Learning Theory | 2018

We study a variant of the stochastic multi-armed bandit (MAB) problem in which the rewards are corrupted. In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters. We provide a lower bound on the expected regret of any bandit algorithm in this corrupted setting. We devise a frequentist algoritthm, KLUCB-CF, and a Bayesian algorithm, TS-CF and give upper bounds on their regret. We also provide the appropriate corruption parameters to guarantee a desired level of local privacy and analyze how this impacts the regret. Finally, we present some experimental results that confirm our analysis.

More informations
Monte-Carlo Tree Search by Best Arm Identification.
Emilie KAUFMANN, Wouter KOOLEN
NIPS 2017 - 31st Annual Conference on Neural Information Processing Systems | 2017

Recent advances in bandit tools and techniques for sequential learning are steadily enabling new applications and are promising the resolution of a range of challenging related problems. We study the game tree search problem, where the goal is to quickly identify the optimal move in a given game tree by sequentially sampling its stochastic payoffs. We develop new algorithms for trees of arbitrary depth, that operate by summarizing all deeper levels of the tree into confidence intervals at depth one, and applying a best arm identification procedure at the root. We prove new sample complexity guarantees with a refined dependence on the problem instance. We show experimentally that our algorithms outperform existing elimination-based algorithms and match previous special-purpose methods for depth-two trees.

More informations
On Bayesian index policies for sequential resource allocation.
Emilie KAUFMANN
Annals of Statistics | 2017

This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem. Our main contribution is to prove that the Bayes-UCB algorithm, which relies on quantiles of posterior distributions, is asymptotically optimal when the reward distributions belong to a one-dimensional exponential family, for a large class of prior distributions. We also show that the Bayesian literature gives new insight on what kind of exploration rates could be used in frequentist, UCB-type algorithms. Indeed, approximations of the Bayesian optimal solution or the Finite Horizon Gittins indices provide a justification for the kl-UCB+ and kl-UCB-H+ algorithms, whose asymptotic optimality is also established.

More informations
A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks.
Emilie KAUFMANN, Thomas BONALD, Marc LELARGE
Theoretical Computer Science | 2017

This paper presents a novel spectral algorithm with additive clustering designed to identify overlapping communities in networks. The algorithm is based on geometric properties of the spectrum of the expected adjacency matrix in a random graph model that we call stochastic blockmodel with overlap (SBMO). An adaptive version of the algorithm, that does not require the knowledge of the number of hidden communities, is proved to be consistent under the SBMO when the degrees in the graph are (slightly more than) logarithmic. The algorithm is shown to perform well on simulated data and on real-world graphs with known overlapping communities.

More informations
Learning the distribution with largest mean: two bandit frameworks.
Emilie KAUFMANN, Aurelien GARIVIER
ESAIM: Proceedings and Surveys | 2017

Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature . they can be formulated as (sequentially) learning which distribution has the highest mean among a set of distributions, with some constraints on the learning process. For both of them (regret minimization and best arm identification) we present recent, asymptotically optimal algorithms. We compare the behaviors of the sampling rule of each algorithm as well as the complexity terms associated to each problem.

More informations
A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks.
Emilie KAUFMANN, Thomas BONALD, Marc LELARGE
Algorithmic Learning Theory | 2016

This paper presents a novel spectral algorithm with additive clustering designed to identify overlapping communities in networks. The algorithm is based on geometric properties of the spectrum of the expected adjacency matrix in a random graph model that we call stochastic blockmodel with overlap (SBMO). An adaptive version of the algorithm, that does not require the knowledge of the number of hidden communities, is proved to be consistent under the SBMO when the degrees in the graph are (slightly more than) logarithmic. The algorithm is shown to perform well on simulated data and on real-world graphs with known overlapping communities.

More informations See the publication
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models.
Emilie KAUFMANN, Olivier CAPPE, Aurelien GARIVIER
Journal of Machine Learning Research | 2016

The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning. Whereas the achievable limit in terms of regret minimization is now well known, our aim is to contribute to a better understanding of the performance in terms of identifying the m best arms. We introduce generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings. In the fixed-confidence setting, we provide the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m is larger than 1 under general assumptions. In the specific case of two armed-bandits, we derive refined lower bounds in both the fixed-confidence and fixed-budget settings, along with matching algorithms for Gaussian and Bernoulli bandit models. These results show in particular that the complexity of the fixed-budget setting may be smaller than the complexity of the fixed-confidence setting, contradicting the familiar behavior observed when testing fully specified alternatives. In addition, we also provide improved sequential stopping rules that have guaranteed error probabilities and shorter average running times. The proofs rely on two technical results that are of independent interest : a deviation lemma for self-normalized sums (Lemma 19) and a novel change of measure inequality for bandit models (Lemma 1).

More informations
Maximin Action Identification: A New Bandit Framework for Games.
Aurelien GARIVIER, Emilie KAUFMANN, Wouter KOOLEN
29th Annual Conference on Learning Theory (COLT) | 2016

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds. and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

More informations
On Explore-Then-Commit Strategies.
Aurelien GARIVIER, Emilie KAUFMANN, Tor LATTIMORE
NIPS | 2016

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main message, we also refine existing deviation inequalities, which allow us to design fully sequential strategies with finite-time regret guarantees that are (a) asymptotically optimal as the horizon grows and (b) order-optimal in the minimax sense. Furthermore we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case.

More informations
Optimal Best Arm Identification with Fixed Confidence.
Aurelien GARIVIER, Emilie KAUFMANN
29th Annual Conference on Learning Theory (COLT) | 2016

We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.

More informations
On the Complexity of A/B Testing.
Emilie KAUFMANN, Olivier CAPPE, Aurelien GARIVIER
Conference on Learning Theory | 2014

A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distribution-dependent lower bounds for the performance of A/B testing that improve over the results currently available both in the fixed-confidence (or delta-PAC) and fixed-budget settings. When the distribution of the outcomes are Gaussian, we prove that the complexity of the fixed-confidence and fixed-budget settings are equivalent, and that uniform sampling of both alternatives is optimal only in the case of equal variances. In the common variance case, we also provide a stopping rule that terminates faster than existing fixed-confidence algorithms. In the case of Bernoulli distributions, we show that the complexity of fixed-budget setting is smaller than that of fixed-confidence setting and that uniform sampling of both alternatives -though not optimal- is advisable in practice when combined with an appropriate stopping criterion.

More informations
Analysis of bayesian and frequentist strategies for sequential resource allocation.
Emilie KAUFMANN
2014

In this thesis, we study strategies for sequential resource allocation, under the so-called stochastic multi-armed bandit model. In this model, when an agent draws an arm, he receives as a reward a realization from a probability distribution associated to the arm. In this document, we consider two different bandit problems. In the reward maximization objective, the agent aims at maximizing the sum of rewards obtained during his interaction with the bandit, whereas in the best arm identification objective, his goal is to find the set of m best arms (i.e. arms with highest mean reward), without suffering a loss when drawing ‘bad’ arms. For these two objectives, we propose strategies, also called bandit algorithms, that are optimal (or close to optimal), in a sense precised below. Maximizing the sum of rewards is equivalent to minimizing a quantity called regret. Thanks to an asymptotic lower bound on the regret of any uniformly efficient algorithm given by Lai and Robbins, one can define asymptotically optimal algorithms as algorithms whose regret reaches this lower bound. In this thesis, we propose, for two Bayesian algorithms, Bayes-UCB and Thompson Sampling, a finite-time analysis, that is a non-asymptotic upper bound on their regret, in the particular case of bandits with binary rewards. This upper bound allows to establish the asymptotic optimality of both algorithms. In the best arm identification framework, a possible goal is to determine the number of samples of the armsneeded to identify, with high probability, the set of m best arms. We define a notion of complexity for best arm identification in two different settings considered in the literature: the fixed-budget and fixed-confidence settings. We provide new lower bounds on these complexity terms and we analyse new algorithms, some of which reach the lower bound in particular cases of two-armed bandit models and are therefore optimal Dans cette thèse, nous étudions des stratégies d’allocation séquentielle de ressources.

More informations
Analysis of Bayesian and frequentist strategies for sequential resource allocation.
Emilie KAUFMANN, Olivier CAPPE, Aurelien GARIVIER, Jean michel MARIN, Thomas BONALD, Gerard BIAU, Nicolo CESA BIANCHI, Olivier CATONI, Remi MUNOS
2014

In this thesis, we study sequential resource allocation strategies. The statistical model adopted in this framework is the multi-arm stochastic bandit. In this model, when an agent draws an arm of the bandit, it receives as a reward a realization of a probability distribution associated with the arm. We are interested in two different bandit problems: maximizing the sum of rewards and identifying the best arms (where the agent tries to identify the arm(s) leading to the best average reward, without suffering any loss when it draws a "bad" arm). We focus on proposing arm drawing strategies, also known as bandit algorithms, for these two objectives that can be described as optimal. The maximization of rewards is equivalent to the minimization of a quantity called regret. Thanks to an asymptotic lower bound on the regret of a uniformly efficient strategy established by Lai and Robbins, we can define the notion of an asymptotically optimal algorithm as one whose regret reaches this lower bound. In this thesis, we propose for two Bayesian inspired algorithms, Bayes-UCB and Thompson Sampling, a finite time analysis in the framework of binary reward bandit models, i.e. a non-asymptotic augmentation of their regret. This majorization allows us to establish the asymptotic optimality of both algorithms. In the context of the identification of the best arms, one can seek to determine the total number of samples of the arms needed to identify, with high probability, the best arm(s), without the constraint of maximizing the sum of the observations. We define two complexity terms for the identification of the best arms in two settings considered in the literature, which correspond to a fixed budget or a fixed confidence level. We propose new lower bounds on these complexities, and we analyze new algorithms, some of which reach the lower bounds in particular cases of two-armed bandit models, and can thus be called optimal.

More informations See the publication
Information Complexity in Bandit Subset Selection.
Emilie KAUFMANN, Shivaram KALYANAKRISHNAN
Conference On Learning Theory | 2013

We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset of a specified size. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the ``Chernoff information'' between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between strategies based on uniform and adaptive sampling for pure-exploration problems, finding evidence in favor of the latter.
.

More informations
Thompson Sampling for one-dimensial exponential family bandits.
Nathaniel KORDA, Emilie KAUFMANN, Remi MUNOS
NIPS 2013 - Neural Information Processing Systems Conference | 2013

Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case. Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for one-dimensional exponential family bandits. Our proof builds on previous work, but also makes extensive use of closed forms for Kullback-Leibler divergence and Fisher information (through the Jeffreys prior) available in an exponential family. This allow us to give a finite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right. Moreover our analysis covers some distributions for which no optimistic algorithm has yet been proposed, including heavy-tailed exponential families.
.

More informations
Thompson sampling for one-dimensional exponential family bandits.
Nathaniel KORDA, Emilie KAUFMANN, Remi MUNOS
Advances in Neural Information Processing Systems | 2013

Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case. Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for 1-dimensional exponential family bandits. Our proof builds on previous work, but also makes extensive use of closed forms for Kullback-Leibler divergence and Fisher information (and thus Jeffreys prior) available in an exponential family. This allow us to give a finite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right. Moreover our analysis covers some distributions for which no optimistic algorithm has yet been proposed, including heavy-tailed exponential families.

More informations

Affiliations are detected from the signatures of publications identified in scanR. An author can therefore appear to be affiliated with several structures or supervisors according to these signatures. The dates displayed correspond only to the dates of the publications found. For more information, see https://scanr.enseignementsup-recherche.gouv.fr

KAUFMANN Emilie

A kernel-based approach to non-stationary reinforcement learning in metric spaces.

Episodic reinforcement learning in finite MDPs: Minimax lower bounds revisited.

Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models.

Top-m identification for linear bandits.

Fast active learning for pure exploration in reinforcement learning.

Sub-sampling for Efficient Non-Parametric Bandit Exploration.

Machine learning applications in drug development.

Adaptive Reward-Free Exploration.

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity.

On Multi-Armed Bandit Designs for Dose-Finding Trials.

Regret bounds for kernel-based reinforcement learning.

A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players.

Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit models.

Non-asymptotic analysis of a sequential breakpoint detection test and application to non-stationary bandits.

Fixed-confidence guarantees for Bayesian best-arm identification.

A simple dynamic bandit algorithm for hyper-parameter tuning.

Asymptotically optimal algorithms for budgeted multiple play bandits.

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling.

General parallel optimization without a metric.

The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits.

Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits.

A spectral algorithm with additive clustering for the recovery of overlapping communities in networks.

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence.

Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals.

Adaptive black-box optimization got easier: HCT only needs local smoothness.

Multi-Player Bandits Revisited.

Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings.

A spectral algorithm with additive clustering for the recovery of overlapping communities in networks.

Aggregation of multi-armed bandits learning algorithms for opportunistic spectrum access.

Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling.

Learning for large-scale parallel platform control.

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits.

Corrupt Bandits for Preserving Local Privacy.

Monte-Carlo Tree Search by Best Arm Identification.

On Bayesian index policies for sequential resource allocation.

A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks.

Learning the distribution with largest mean: two bandit frameworks.

A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks.

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models.

Maximin Action Identification: A New Bandit Framework for Games.

On Explore-Then-Commit Strategies.

Optimal Best Arm Identification with Fixed Confidence.

On the Complexity of A/B Testing.

Analysis of bayesian and frequentist strategies for sequential resource allocation.

Analysis of Bayesian and frequentist strategies for sequential resource allocation.

Information Complexity in Bandit Subset Selection.

Thompson Sampling for one-dimensial exponential family bandits.

Thompson sampling for one-dimensional exponential family bandits.