Advancements in 3D Scene Understanding: From Few-shot Segmentation to Multi-modal Perception
Zhaochong An (Ph.D. Student)
Accurate 3D scene understanding allows machines to recognize objects, perceive spatial relationships, infer scene semantics, and predict object behavior. This understanding is essential for many applications such as autonomous navigation, augmented and virtual reality, and robotics. One of the main challenges in developing 3D deep learning is the scarcity of annotated datasets due to the high cost of data collection and annotation. This results in difficulty in training models to effectively address the segmentation tasks. Thus, I will address the crucial issue in 3D deep learning: the scarcity of annotated 3D datasets. I intend to focus on few-shot 3D point cloud segmentation to enable models to adapt to new scenes and concepts with minimal annotations. Besides, I will explore the integration of multi-modal information for scene perception. As humans perceive the real world through multiple senses, it is essential to develop models that can leverage vision, audio, and text modalities to achieve a comprehensive understanding of a scene. Considering the current popularity of large language models (LLMs), I intend to investigate how to harness the generalization capabilities of LLMs to improve scene perception.
Primary Host: | Serge Belongie (University of Copenhagen & Cornell University) |
Exchange Host: | Philip H. S. Torr (University of Oxford) |
PhD Duration: | 01 October 2023 - 30 September 2026 |
Exchange Duration: | 01 March 2024 - 01 September 2024 - Ongoing |