Donato Crisostomi
Large foundation models already encapsulate most of the knowledge necessary to solve diverse tasks, eliminating the need for costly retraining. My research explores post-training techniques that enhance model interoperability, focusing on model merging and latent space alignment.
Regarding the former, my research involves merging both models trained independently with different initialization and datasets as well as models stemming from the same pretrained base and fine-tuned on different tasks. In the first case, my work encompasses permutation-based matching techniques that ensure cycle consistency of the permutations, defining an implicit universal space that serves as a bridge between any two pairs of models. For the latter, I analyze the geometric properties of task-specific update matrices-obtained by subtracting the common pretrained model from each independently fine-tuned model-and mitigate inter-task interference at a spectral level.
From the representation perspective, I work on designing universal representations-representations that are common to a large number of models and can be used interchangeably across them. This involves developing geometric transformations that map latent spaces either in a pairwise fashion or across a set of models to a unified space. These transformations can be optimized, learned, or even derived through closed-form solutions, such as mapping data points to a relative space defined in terms of inter-sample distances.
Though fundamentally theoretical, my work has immediate practical implications: from enabling users to generate state-of-the-art LLMs through efficient evolutionary merging to achieving extreme compression of deep vision models without significant performance degradation.
By addressing both weight-space and representation-space compatibility, my work contributes to a future where deep learning models can interact seamlessly, reducing redundant computation and fostering more cohesive Al systems.