Karsten Roth
PhD
University of Tübingen
Explainable Multimodal Representation Learning

Human perception, but also interaction, is inherently multimodal, i.e. visual and audio information get processed together and the emerging knowledge is communicated through some form of language. In this project, our aim is to learn multimodal representations using artificial neural networks. Our application domains are downstream computer vision and machine learning tasks such as visual question answering, multimodal retrieval or translating between modalities for out of distribution data synthesis. Our proposed framework will go beyond doing reliable predictions, towards effectively communicating its thought process tailored to users with various levels of understanding of the world to reveal how the system arrives at a decision.

Track:
Academic Track
PhD Duration:
May 15th, 2021 - December 31st, 2024
First Exchange:
May 15th, 2024 - December 31st, 2024
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.