Nina Shvetsova
PhD
Goethe University Frankfurt
Multimodal Learning for Image and Video Understanding

PhD project is dedicated to advancing video and image understanding through the exploration of self- and weakly-supervised machine learning techniques and learning from multimodal data. Multimodal data, such as video, accompanied by audio, user comments, or other modalities, inherently contain self-supervised learning signals. Namely, the co-occurrence of multiple modalities might be used to learn meaningful representation across these modalities. By uncovering the inherent structures and patterns within the data, the research aims to discover underlying relationships between modalities and extract relevant semantic information for those. Moreover, different modalities, such as vision and sound or textual narration, also contain complementary signals about the world. Therefore, the research aims to enhance the overall understanding and representation of images and videos by leveraging the complementary information presented in other modalities. The fusion of multiple modalities enables the development of a more comprehensive and context-aware understanding of visual content. The outcomes of this research will contribute to the advancement of multimodal learning for video and image understanding.

Track:
Academic Track
PhD Duration:
May 1st, 2021 - April 30th, 2025
First Exchange:
July 1st, 2023 - September 30th, 2023
Second Exchange:
July 1st, 2024 - September 30th, 2024
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.