Advances in Probabilistic Deep Learning and Their Applications
Erik Daxberger (Ph.D. Student)
Deep learning and probabilistic modeling are two machine learning paradigms with complementary benefits. Probabilistic deep learning aims to unify the two, with the potential to offer compelling theoretical properties and practical functional benefits across a variety of problems. This thesis provides contributions to the methodology and application of probabilistic deep learning. In particular, we develop new methods to address four different application domains. The first application is out-of-distribution detection. Neural networks tend to make unreliable predictions when the data distribution changes after training. To address this, we propose a new probabilistic deep learning method based on a Bayesian variational autoencoder, where a full distribution is inferred over the model parameters, rather than just a point estimate. We then use information-theoretic measures to detect out-of-distribution inputs with this model. The second application is data-efficient optimization. Many science and engineering problems require optimizing a costly black-box function over a high-dimensional, structured space. To tackle this, we develop a new probabilistic deep learning method that efficiently optimizes the function in the low-dimensional, continuous latent space of a variational autoencoder. We propose to periodically retrain the model to keep the latent manifold useful for optimization. The third application is neural network calibration. Neural networks tend to be poorly calibrated on inputs not seen during training. To avoid overconfidence, models must be able to quantify their uncertainty. To this end, we develop a new probabilistic deep learning method that performs Bayesian inference over just a subset of a neural network’s parameters. We propose a way to choose such subnetworks to faithfully preserve the model‘s predictive uncertainty. The fourth application is continual deep learning. Neural networks often catastrophically forget previously learned tasks when trained on new tasks. To enable models to learn across task sequences, we introduce a new probabilistic deep learning method that unifies two popular continual learning approaches: Bayesian weight regularization and experience replay. Our method explicitly aims to approximate the model obtained from batch-training on all tasks jointly. Overall, the goals of this thesis are twofold. Firstly, we aim to develop new methods at the intersection of probabilistic modeling and deep learning that combine their respective advantages. Secondly, we aim to demonstrate the practical potential of those probabilistic deep learning methods by applying them to advance the diverse application areas mentioned before.
Primary Host: | José Miguel Hernández-Lobato (University of Cambridge) |
Exchange Host: | Bernhard Schölkopf (ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems) |
PhD Duration: | 01 February 2019 - 30 June 2023 |
Exchange Duration: | 01 November 2021 - 30 June 2023 - Ongoing |