Multi-Armed Bandits and Reinforcement Learning

Lukas Zierahn (Ph.D. Student)

Multi-armed bandits (MAB) is a decision making framework with a rich application space ranging from medical trials to recommendation systems. It is characterised by repeatedly taking an action and observing the reward of just the action taken. MAB enjoy strong theoretical guarantees for algorithms that are also computationally efficient. Adapting MAB to new settings is the first goal of this PhD project. A natural extension of the bandit framework is reinforcement learning where actions are allowed to influence the future states. However, unlike MAB, reinforcement learning experiences a large gap between theory and practice. Algorithms with strong mathematical guarantees can usually not be used in practice and practical algorithms usually enjoy unsatisfactory mathematical justification. Narrowing this gap by also employing the knowledge gained in working with MAB is the second goal of this PhD project.

Primary Host: Nicolò Cesa-Bianchi (Università degli Studi di Milano)
Exchange Host: Gergely Neu (Universitat Pompeu Fabra)
PhD Duration: 01 November 2021 - 31 October 2024
Exchange Duration: 21 February 2023 - 05 July 2023 - Ongoing