Cognitive models for Meta-Reinforcement-Learning
Julian Coda-Forno (Ph.D. Student)
The project aims to use cognitive science models and inspiration to develop better Meta-RL agents. Meta-RL is the combination of two disciplines in Machine Learning, Meta-learning and Reinforcement Learning. Meta-learning is the idea inspired by psychology of “learning to learn” in humans, and looks at how an AI can leverage learning from previous tasks to other related tasks. This field tries to tackle the sample efficiency problem of AI due to the fact that it is currently too “task-specific”. Reinforcement Learning also comes from psychology and animal conditioning tasks analogous to how the dopaminergic system works in the brain. The framework relies on an agent interacting with an environment which produces both changes of states and rewards. Meta-RL is therefore trying to develop agents which can learn in a given task/environment and transfer to a new task/environment to leverage this previous learning without having to re-initialize all/most of its parameters. The first year of the project aimed to give the PhD candidate Julian Coda-Forno more understanding of cognitive science models and investigate how it could relate to Meta-RL. To this end, the first projects focused on analyzing how Large Language Models (LLMs) could be analyzed using cognitive psychology. This gave the student the nice feature of getting more experience with Cognitive Science literature and techniques, as well as observing limitations to potentially assess in the future, in one of AI’s current hot areas: LLMs. We observed if/how LLMs could do Meta-in context learning in two-armed bandit tasks, Function learning tasks and classification tasks. We also examined other LLMs cognitive abilities which are well established in cognitive psychology such as meta-cognition, risk-taking behavior, model-based reasoning, and directed exploration. Now, Julian aims to leverage the insights gained from LLMs and cognitive science. His goal is to extract the priors and inductive biases inherent in LLMs and incorporate them into Meta-RL agents. By doing so, he hopes to enhance the adaptability and efficiency of these agents across various tasks and environments. Currently, he is looking into using the representations of LLMs to learn policy and value functions for RL agents.
Primary Advisor: | Eric Schulz (Max Planck Institute for Biological Cybernetics) |
Industry Advisor: | Jane X. Wang (DeepMind) |
PhD Duration: | 22 August 2022 - 22 August 2026 |