Gloria Desideri

PhD
University of Technology Nuremberg (UTN)
Learning and Refining Temporal Abstractions in Non-Episodic Reinforcement Learning

In many real-world applications of reinforcement learning (RL), agents must contend with non-stationary dynamics and prohibitively large state and action spaces. Continual learning provides a framework for addressing these challenges by enabling agents to acquire new skills incrementally, retain previously learned abilities, and quickly adapt across a sequence of changing tasks. A common strategy within this framework is to decompose complex problems into smaller subtasks, as in the options framework. But automatically discovering such a useful decomposition remains a major open problem. A core issue is determining when two states are similar enough that the same option should be applicable. In non-stationary environments, a fixed similarity metric can degrade: regions of the state space that once behaved the same under one subgoal may later diverge. Representation drift can make options obsolete, so representations must adapt while preserving past knowledge. A possible solution to handle such drift comes from the representation-driven option discovery (ROD) cycle, where the options and the representation are continuously improved through a refinement cycle. This method is highly compatible with continual learning and can be used as a basis to be expanded. One direction is to learn a better world representation through auxiliary tasks, which can take the form of General Value Functions (GVFs). Their benefit in option learning has already been explored with a fixed state representation, and the options discovered were proven to be useful in both planning and model learning. Another direction is to learn compact latent representations of long-horizon dynamics. Latent-Space Collocation (LatCo) trains a latent dynamics model and plans by optimizing latent trajectories, and such low-dimensional states also enable efficient experience replay. Recent work has also highlighted the drawbacks of relying on episodic training and argued for truly non-episodic continual reinforcement learning formulations. This is driven by the fact that in many real-world applications, it is impossible to reset the environment to a specific state. In this work, we aim to explore methods for enhancing automatic skill discovery by leveraging alternative world representations, such as General Value Functions (GVFs) and latent state-space models, in non-stationary, non-episodic continual reinforcement learning. We will investigate how to integrate GVFs with temporal abstractions as a higher-level planning strategy, helping to navigate large state spaces. We will first provide the mathematical formalism of the problem, the GVF learning process, and possible integration strategies. We evaluate on continual-RL benchmarks and metrics, including JellyBean World and AgarCL, with emphasis on task discovery, skill reuse, dynamics understanding, and incremental learning. Applications include drug discovery, video games, and water-plant management, providing tools for decision-making under continual change.

Track:
Academic Track
PhD Duration:
September 15th, 2025 - September 1st, 2029
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.