Yuval Shalev

PhD
University of Cambridge
Explicit Reasoning for Optimal Decision-Making in Large Language Models

Large Language Models (LLMs) have recently shown strong performance across a wide range of tasks, leveraging on extensive pre-training on human-generated text to produce explicit, human-like reasoning. Their success on reasoning tasks, together with their ability to follow natural-language instructions, makes them a natural choice for developing decision-making applications (agents). In this setting, LLMs use their language abilities to tackle tasks that require decisions in interactive environments, generating tokens that are interpreted as actions.

However, despite their impressive reasoning capabilities, LLMs often struggle with optimal decision making. Unlike reasoning, interacting with external environments demands abilities that extend beyond linguistic step-chaining. Optimal decision making requires balancing exploration and exploitation, learning from one's own observations, and anticipating the long-term consequences of decisions. Recent studies show that even frontier LLMs fall short on these requirements, suggesting that such abilities may not naturally arise from large-scale training on human-generated text.

This PhD project investigates the core factors that prevent LLM agents from acting optimally in decision-making tasks and aims to develop methods that bridge the gap between their linguistic and decision-making abilities. The main approach is to design novel agents that use explicit text to represent their understanding of the environment. We posit that, by iteratively shaping a textual representation of their environment, LLMs can use their explicit reasoning abilities to aggregate their own observations into a coherent and useful representation. This, in turn, may enable them to reason and act in novel environments by learning their arbitrary rules in context.

For example, while a long record of raw observations may make it difficult to determine how to act more effectively, a concise summary can highlight the essential information needed for improvement. We further argue that maintaining a useful textual model of an environment will allow LLMs to simulate their own actions and mentally explore alternative directions in order to plan how to act.

Overall, the project aims to study and advance the potential of language models as decision makers, using text to support explainable, human-like behaviour of artificial general intelligence systems.

Track:
Industry Track
PhD Duration:
October 1st, 2025 - September 30th, 2029
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.