Orest Kupyn
The PhD project explores a novel paradigm of dataset synthesis through repurposing foundational generative models for synthesizing representations suitable for training models for various vision tasks. Since deep learning has transformed computer vision research and applications, methodological progress now advances hand in hand with the availability of large annotated datasets. From the one million images of ImageNet to LAION-5B, the number of samples has increased by three orders of magnitude. Large-scale datasets proved to be easily applicable to a wide range of computer vision transfer tasks. Yet, growing and curating such datasets at this scale is a cumbersome and costly challenge. ImageNet labelling required 49,000 annotators over the span of three years, while for domains such as 3D vision, obtaining large-scale ground truth data from the real world is infeasible. This highlights the critical need for more efficient data generation methodologies. Synthetic data generation methods emerge as a promising solution to address these challenges. High-quality synthetic data can potentially improve cost efficiency, accelerate dataset creation, enhance data diversity, and mitigate privacy concerns associated with real-world data collection.
The recent advancement of diffusion models, particularly their capability to generate high-resolution, photorealistic images and videos, has opened unprecedented opportunities for dataset synthesis. These models represent a breakthrough technology with the potential to revolutionize data generation approaches. The project examines the applicability and limitations of state-of-the-art latent diffusion models for dataset generation.