Despoina Paschalidou

My research so far seeks to give an answer to a very simple question, how can we teach machines to learn to see in 3D? Or in other words, what is the best representation, that would allow us to capture the world such that a machine would be able to robustly perceive it? Humans develop a common-sense understanding of the physical behavior of the world, within the first year of their life. For example, we are able to identify 3D objects in a scene and infer their geometric and physical properties, even when only parts of these objects are visible. Throughout the years, researchers tried to endow computers with similar visual capabilities, however we are still far away from real artificial intelligence. The advent of deep neural networks coupled with breakthroughs in high-performance computing led to substantial gains in various perceptual tasks such as motion estimation, object detection etc. However, their applicability to more complicated tasks that involve higher-level reasoning, such as scene understanding, is not straight-forward. Towards this direction, the goal of this project is to investigate whether primitive-based representations offer a more interpretable alternative in contrast to existing representations. Primitive-based representations are inspired by the way the human visual system processes the vast amount of raw visual input. It has long been hypothesized that humans perceive their surroundings into compact parsimonious representations where complex objects are decomposed into a small number of shape primitives that can each be represented using low-dimensional descriptions.