Niladri Shekhar Dutt

PhD
Language driven whitebox workflows for image and 3D content creation

In the realm of computer vision and graphics, achieving efficient, interpretable, and human-guided automation in complex tasks remains a challenge. The thesis shall propose to leverage the reasoning and contextual understanding of multi-modal large language models (LLMs) to enhance procedural workflows in image and 3D content creation. By integrating LLMs with tools such as GIMP for procedural image editing, the approach aims to enable whitebox, interpretable operations that mimic professional-level artistry. This approach will be further extended to frameworks such as Blender for 3D modeling as well as in animation workflows, effectively expanding capabilities to 4D. The project will also explore how LLMs can be fine-tuned to improve its capabilities in graphics workflows by breaking down complex tasks into a series of reasoning, execution, and critiquing steps.

Track:
Academic Track
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.