PAC-Bayes Analysis and Derived Algorithms
Omar Rivasplata (Ph.D. Student)
The main focus of my current research is on generalization bounds, specifically the kind called PAC-Bayes bounds, in relation to the phenomenon referred to as "generalization" in machine learning: After an algorithm has been trained based on a finite list of examples, how will the trained algorithm do on unseen examples? In statistical learning theory this question is addressed by giving generalization bounds, which in the context of supervised classification take the form of high-probability upper bounds on the gap between the probability of error and its empirical estimate. Empirical error rates (e.g. training set error rate, test set error rate) can be evaluated based on available data, of course, but the real question is what can we say about the error on "unseen" (i.e. currently unavailable) data. I am interested in the generalization problem in connection with machine learning with guarantees. An interesting research direction exploits the connection between generalization and algorithmic stability, which is widely acknowledged, credit is due to O. Bousquet and A. Elisseeff for their 2002 paper "Stability and Generalization" that revived the discussion and motivated further research. There are different ways to formalize and quantify the idea of algorithmic stability, but what they all have in common is being some measure of sensitivity (of an output) with respect to small changes (in the input). For instance if replacing or removing one training example has only a small effect on the algorithm's learned function or its error rate, then we describe it as a stable algorithm. My paper "PAC-Bayes bounds for stable algorithms with instance-dependent priors" (with co-authors), published in NeurIPS 2018, considered stability-based generalization bounds under a notion of "hypothesis stability" that quantifies changes (measured by a norm distance) in the weight vector learned by an algorithm with respect to replacing one example in the training set. Stability lead to concentration of the weight vector output by the algorithm, while in turn concentration lead to a tighter generalization bound. Another kind of stability shows up in the literature on privacy-preserving methods, where the starting point is a definition quantifying changes in the distribution of a randomizing algorithm with respect to replacing one data point. This form of distributional stability was used in my paper "PAC-Bayes Analysis Beyond the Usual Bounds" (with co-authors) presented at the NeurIPS 2019 Workshop on Machine Learning with Guarantees. In fact, that workshop paper also used stability of the loss functions in the proof of one of its main contributions. My future research will include deriving PAC-Bayes (and other kinds of) generalization bounds for various learning problems/algorithms and data-generating processes.
|Primary Host:||John Shawe-Taylor (University College London)|
|Exchange Host:||Csaba Szepesvari (DeepMind & University of Alberta)|
|PhD Duration:||27 November 2017 - 26 November 2020|
|Exchange Duration:||01 April 2020 - 30 June 2020|