Long-Horizon Planning and Reasoning for Intelligent Robotic Systems

Imen Mahdi (Ph.D. Student)

Vision Language Action (VLA) models have been proposed to enable robots to execute a variety of tasks by integrating perception, language understanding, and action planning. Large VLAs trained on a diverse set of tasks have been shown to generalize well to new tasks and environments, achieving high precision and efficiency. However, these tasks are typically not complex in nature and do not require long-term planning, such as opening a drawer, grasping an object, or reaching a target. Long-term goals can be decomposed into sub-tasks that are more manageable and easier to solve. For example, grabbing a soda from the fridge can be decomposed into reaching the fridge, opening the door, and grabbing the soda. This can be addressed as a sequence of primitive skills that can be executed in a hierarchical manner i.e. given a language instruction and a visual input, the robot plans a sequence of sub-tasks that lead to the final goal. This PhD aims to explore methodologies for hierarchical planning and reasoning to enable robots to perform complex tasks in a way that leverages the advancements made in the field of VLAs.

Primary Host: Abhinav Valada (University of Freiburg)
Exchange Host: Cordelia Schmid (INRIA)
PhD Duration: 01 October 2024 - 30 September 2028
Exchange Duration: 01 September 2025 - 31 March 2026 - Ongoing