Yunhua Zhang
PhD
University of Amsterdam (UvA)
Multi modal video learning

Video streams consist of multiple modalities, e.g., RGB frames, optical flow, audio. Their natural correspondence provides rich semantic information to achieve effective multi-modal perception and learning. In this project, our aim is to understand the video content by analyzing multiple modalities and decide which modality to trust in different scenarios, especially under harsh vision conditions. To this end, various cross-modal interaction modules will be developed to fit specific tasks, such as action recognition, repetition counting, activity localization.

Track:
Industry Track
PhD Duration:
October 1st, 2019 - September 1st, 2023
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.