Joshua Wendland
Automated decision-making systems based on reinforcement learning (RL) and planning are increasingly used in complex, high-stakes environments, yet their trustworthiness, interpretability, and theoretical soundness remain limited. This project aims to address these challenges by developing scalable and principled learning methods that unite the theoretical guarantees of planning with the flexibility and efficiency of reinforcement learning.
A first research direction focuses on learning and acting under missing or incomplete data in sequential decision-making. In contrast to classical partially observable settings, we consider decision processes where individual state features may become unavailable dynamically. The project will study how different forms of missingness affect optimal policy learning and develop algorithms capable of adapting to such uncertainty.
A second line of work explores how causal reasoning can improve both the interpretability and reliability of policy learning. By deriving causal structural models from the environment' S transition dynamics, the project aims to use the causal structure as a search heuristic to guide planning.
Finally, the project investigates explainable policy modulation by identifying latent activation vectors in neural policies that correspond to interpretable behavioural concepts. Adjusting these directions allows for controlled and explainable variations in agent behaviour without retraining.
Together, these research directions aim to build a foundation for trustworthy and scalable policies, combining formal guarantees from planning with data-driven flexibility in RL to enable safer and more transparent decision-making systems.