Henri-Jacques Geiss
As children we seem to effortlessly pick-up everyday physical concepts such as the effect of gravity or the ability of a glass to contain a liquid and can transfer them to new scenarios for zero-shot task solving. Naturally, such qualities are also desired for robotic control. While exemplary recently presented end-to-end VLA-models are task performant and even show hints of emergent capabilities, they fail to include explicit inductive biases for an abstract and mechanistic understanding of the physical regularities of their environment, as well as its intrinsically motivated exploration. With this work, we aim to bridge this gap and enable an embodied agent, placed in an unknown environment, to acquire an abstract understanding of the physical concepts that govern the effects of its actions and the interactions between objects. More precisely, we employ a theory from cognitive linguistics - Image Schemas - as the framework for these abstractions. Once the robot has an embodied understanding of concepts like SUPPORT or CONTAINMENT, it can use it for hierarchical, logic-based planning in long-horizon tasks as well as knowledge transfer. For instance, given the task to place several objects at a certain goal position in the shortest amount of time, the agent first searches for a container in the workplace, with which it can carry all objects to the target position at once.