Sajad Movahedi

PhD
ELLIS Institute Tübingen
Max Planck Institute for Intelligent Systems (MPI-IS)
Optimization Theory in the Era of Foundation Models

The project aims to explore several aspects regarding optimization and design of new deep learning models (transformers, state-space models). These architectures have applications in various domains, including language modeling, vision, audio, time series forecasting/classification, and multimodal analysis. It currently needs to be clarified how to properly design the backbone of such models, often including, on top of standard elements such as residual connections and normalization layers, peculiar components such as gates and convolutions. Current research has shown that some of these elements might not be crucial for inductive bias purposes (e.g., convolutions in vision) but are instead fundamental for correct signal propagation. A well-known example involving recently introduced blocks is the gated linear unit (GLU), which replaces MLPs in many large-scale models such as Llama3.1. Our research will cover the interaction between model design and training dynamics in foundation models, with the purpose of demystifying and understanding which architectural choices are crucial for training speed and generalization, in combination with the choice of optimizer and its hyperparameters (peculiar choices: high regularization, high momentum). Looking forward, a crucial question regards generalization capabilities of adaptive methods in the context of attention-based or ssm-based models: such architectures are exclusively trained with Adam just for a few (usually 1 or 0.5) epochs. On top of this, a feature of current models is that they are underparameterized (the number of parameters is less than the number of tokens on the internet). In the near future, it will become increasingly important to understand the effects of underparametrization on generalization, and to revisit the application of generalization boosters (sharpness-aware minimization, regularizers, noise injection) on performance. 

Track:
Academic Track
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.