Explainable Multimodal Representation Learning
Karsten Roth (Ph.D. Student)
Human perception, but also interaction, is inherently multimodal, i.e. visual and audio information get processed together and the emerging knowledge is communicated through some form of language. In this project, our aim is to learn multimodal representations using artificial neural networks. Our application domains are downstream computer vision and machine learning tasks such as visual question answering, multimodal retrieval or translating between modalities for out of distribution data synthesis. Our proposed framework will go beyond doing reliable predictions, towards effectively communicating its thought process tailored to users with various levels of understanding of the world to reveal how the system arrives at a decision.
|Primary Host:||Zeynep Akata (University of Tübingen)|
|Exchange Host:||Oriol Vinyals (Google DeepMind)|
|PhD Duration:||15 May 2021 - 31 December 2024|
|Exchange Duration:||15 May 2024 - 31 December 2024 - Ongoing|