Fast feed-forward video avatars with diffusion prior
David Svitov (Ph.D. Student)
Photo-realistic human avatars are a rapidly developing area of research. However, to the best of my knowledge, all current methods have two shortcomings: Lack of human appearance prior and long training time. These shortcomings limit the applicability of photorealistic human avatars. The drawbacks described above can be addressed by developing a method for getting personalized avatars using diffusion prior and operating in feed-forward mode. In my PhD project, I plan to use video frames to obtain single-view avatars and then combine them using the information from several frames. For this, I will use a pre-trained model to extract features from each view. Then, I plan to train the neural network to combine the features for a multi-view avatar using the diffusion prior as the loss function. My research will accelerate to several seconds the speed of obtaining personalized animated avatars from a video. This will expand the applicability of avatars in telepresence tasks and also in the entertainment industry, for example, in the metaverse or AR / VR applications. I believe the avatar generation time and the difficulty of getting the input video are the major barriers to the broad use of this technology.
Primary Host: | Alessio Del Bue (Istituto Italiano di Tecnologia) |
Exchange Host: | Lourdes Agapito (University College London) |
PhD Duration: | 01 November 2023 - 01 June 2026 |
Exchange Duration: | 01 January 2025 - 01 June 2025 - Ongoing |