Multimodal Learning for Image and Video Understanding
Nina Shvetsova (Ph.D. Student)
PhD project is dedicated to advancing video and image understanding through the exploration of self- and weakly-supervised machine learning techniques and learning from multimodal data. Multimodal data, such as video, accompanied by audio, user comments, or other modalities, inherently contain self-supervised learning signals. Namely, the co-occurrence of multiple modalities might be used to learn meaningful representation across these modalities. By uncovering the inherent structures and patterns within the data, the research aims to discover underlying relationships between modalities and extract relevant semantic information for those. Moreover, different modalities, such as vision and sound or textual narration, also contain complementary signals about the world. Therefore, the research aims to enhance the overall understanding and representation of images and videos by leveraging the complementary information presented in other modalities. The fusion of multiple modalities enables the development of a more comprehensive and context-aware understanding of visual content. The outcomes of this research will contribute to the advancement of multimodal learning for video and image understanding.
Primary Host: | Hilde Kühne (University of Tübingen) |
Exchange Host: | Christian Rupprecht (University of Oxford) |
PhD Duration: | 01 May 2021 - 30 April 2025 |
Exchange Duration: | 01 July 2023 - 30 September 2023 01 July 2024 - 30 September 2024 |