Christina Sartzetaki
Humans are inherently effective and efficient when it comes to processing continuous streams of visual information and distilling it into semantic understanding, used to make decisions and interact with the world around them. Progress in video-Al has come far, but current models are often computationally heavy and suffer from problems with robustness to noise and generalization across domain shifts. At the same time, there is a lot of uncharted territory in the mechanisms underlying neural computations in the brain when exposed to dynamic natural input. This project aims to explore what insight can be gained on these aspects from measuring, as well as increasing alignment of state-of-the-art video-Al model representations to representations in the human brain (e.g. fMRI) and behavior (e.g. similarity ratings) while watching videos. It additionally targets to investigate the direction of drawing inspiration from human neural processes to design human-aligned video architectures in order to achieve desirable improvements such as efficiency, generalizability, and robustness.