Yutong Hu

PhD
KU Leuven
Vision-Driven Robot Manipulation with Foundation Models

This PhD project investigates the integration of foundation models-such as large language models (LLMs) and vision-language models (VLMs)-into robotic manipulation tasks. Leveraging the rich multimodal representations learned from large-scale text and image datasets, the research aims to bridge the gap between abstract model knowledge and physical robotic actions.

A key objective is to develop methodologies for grounding the high-level semantic outputs of foundation models in low-level control policies. The project will explore how these models can interpret natural language instructions and visual cues to guide real-time robot behavior. Emphasis will be placed on evaluating alignment between model-generated representations and actionable robotic goals. This research will involve designing benchmarking tasks that test the robot's ability to execute complex, open-ended commands in unstructured environments. Additional focus will be given to fine-tuning or adapting foundation models for real-world robotic settings, where data is limited and noisy. Techniques such as reinforcement learning, imitation learning, and few-shot prompting may be employed to improve performance.

The project aims to yield insights into both the potential and limitations of using foundation models for embodied Al. Ultimately, the work aspires to contribute novel frameworks for building more generalizable, intelligent robotic systems that seamlessly integrate perception, language, and action.

Track:
Academic Track
PhD Duration:
November 18th, 2024 - November 18th, 2028
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.