Cagatay Alici

PhD
Technical University of Munich (TUM)
Learning Neuro-Symbolic World Models from Multimodal Clinical Data with Structured Semantic Representations

Modern machine learning systems operating in complex, real-world environments are limited not only by data scarcity, but by the lack of structured, interpretable representations that support reasoning, prediction, and generalization. This PhD project aims to develop neuro-symbolic world models that learn from multimodal clinical data, combining neural representation learning with explicit semantic structure to model complex spatial and temporal processes. Rather than focusing on data generation as an end goal, the project investigates how realistic synthetic and semi-synthetic data can be used strategically to support learning, evaluation, and robustness of world models trained on clinical data.

The central hypothesis is that incorporating hierarchical semantic representations, such as scene graphs encoding entities and their relations, leads to more robust and interpretable world models than purely pixel-driven approaches. The project will study how multimodal inputs, such as volumetric data and visual observations, can be embedded into a shared latent space that supports prediction, consistency checking, and structured reasoning over time. Controlled data synthesis will be used to introduce known transformations and perturbations, enabling quantitative analysis of model behavior and generalization beyond limited clinical samples.

Methodologically, the research will explore representation learning, multimodal generative modeling, graph-based reasoning, and contrastive learning objectives, with an emphasis on learning dynamics under semantic constraints. The outcome is a principled framework for learning structured world models grounded in real clinical data, while using realistic synthetic data as a tool for validation and scaling. Beyond the clinical setting, the proposed methods contribute to core machine learning research on neuro-symbolic modeling, multimodal world models, and data-efficient learning, aligning closely with the scientific goals of the ELLIS program.

The proposed research targets publication in leading machine-learning venues, with particular relevance to NeurIPS and ICML through its contributions to structured world models, neuro-symbolic learning, and multimodal representation learning. The vision-centric and graph-based aspects of the work naturally align with major computer vision conferences such as CVPR, ICCV, and ECCV. In addition, the strong grounding in

Track:
Academic Track
PhD Duration:
January 1st, 2026 - September 30th, 2029
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.