PhD Position on Causal Multimodal Foundation Models
Recent breakthroughs in Artificial Intelligence have led to the emergence of the first generation of foundation models capable of generalizing across tasks, domains, and modalities. These advances have opened up powerful new paradigms for solving complex, domain-specific problems through generalist models that can be efficiently fine-tuned for diverse applications. However, the promise of these models is limited by two key challenges: (i) their difficulty in robustly generalizing to new contexts and domains, and (ii) their limited capacity for reasoning and adapting over multimodal, spatio-temporal data streams.
This PhD project addresses these limitations by focusing on causal exploration, enabling agents to actively seek out informative interventions in order to learn the underlying structure of their environment. Rather than relying solely on passive data, the agent will experiment, hypothesize, and test causal relationships to construct more robust and transferable world models. This active, structured approach to learning is crucial for collecting fine-tuning data for flexible multimodal generalist foundation models (MGFMs) that can generalize to novel tasks and adapt to previously unseen settings autonomously through exploration.