Diyuan Wu
This PhD project aims to advance the theoretical understanding of deep learning through two primary research directions: (1) the study of feature learning, and (2) the investigation of modern generalization phenomena.
The first line of research focuses on understanding feature learning in deep neural networks. The student's spotlighted ICML 2025 paper provides an end-to-end analysis of the neural collapse phenomenon in the mean-field regime, revealing its connections to the loss landscape and training dynamics. Building on this foundation, current and ongoing work explores the emergence of specific feature structures across different problem setups and models, with particular emphasis on the dynamics of feature learning during training. In particular, our recent work posted on arXiv (https://arxiv.org/abs/2505.17282) investigates the embedding structure of a one-layer attention model trained on a binary classification task, and it provides a theoretical analysis that aligns with practical findings. The long-term goal is to develop a rigorous understanding of how structured representations emerge in deep networks across various training regimes.
The second line of research investigates modern generalization behaviors in large-scale models, with a particular focus on large language models (LLMs). Current work aims to understand neural scaling laws and weak-to-strong generalization phenomena, and how these are influenced by model architecture, data distribution, and training dynamics. To this end, ongoing projects develop tractable analyses of generalization error and scaling laws in simplified settings-such as random feature models and linearized networks. These theoretical results are intended to interpret and predict empirical behaviors observed in real-world LLMs. The long-term objective is to build mathematical frameworks that explain the generalization capabilities of modern neural networks and inform the design and scaling of future LLMs.
Together, these two research directions aim to contribute to the broader goal of developing an insightful theory of deep learning, grounded in rigorous mathematical analysis and closely aligned with empirical observations.