Multimodal video learning
Yunhua Zhang (Ph.D. Student)
Video streams consist of multiple modalities, e.g., RGB frames, optical flow, audio. Their natural correspondence provides rich semantic information to achieve effective multi-modal perception and learning. In this project, our aim is to understand the video content by analyzing multiple modalities and decide which modality to trust in different scenarios, especially under harsh vision conditions. To this end, various cross-modal interaction modules will be developed to fit specific tasks, such as action recognition, repetition counting, activity localization.
|Primary Advisor:||Cees Snoek (University of Amsterdam)|
|Industry Advisor:||Xiantong Zhen (University of Amsterdam & Inception Institute of Artificial Intelligence)|
|PhD Duration:||01 October 2019 - 30 September 2023|