Yukti Makhija
Despite the triumphs of representation learning, achieving reliable out-of-distribution (OOD) generalization and robust reasoning remains a significant challenge. Increasingly, researchers are incorporating causal principles as a strategy to address these difficulties. To leverage causal principles with deep learning, it is important to first disentangle the underlying factors of variation from low-level observations, thereby uncovering the causal representations that generate the data.
Previous work in unsupervised disentanglement has introduced a simple class of decoders known as additive decoders, which resemble architectures used in object-centric representation learning. These decoders can disentangle latent variables while making only weak assumptions about their distribution, and they offer a framework for understanding creativity and formally studying extrapolation in modern generative models. Notably, additive decoders have been shown to generate novel images outside the support of the training data by recombining disentangled factors in novel ways. However, a key limitation of this approach is that it assumes latent factors to be continuous, which can be restrictive in many practical scenarios. Additionally, additive decoders struggle to model complex scenes involving occlusion. More recently, these methods have been extended to more impressive and flexible additive energy models, which enable compositional generalization with discrete factors.
Through this PhD thesis, I aim to study additive energy models in depth, focusing on identifying appropriate assumptions, like in settings where label attributes are absent, to establish theoretical guarantees and provide empirical evidence of their ability to extrapolate to novel combinations.