Vicent Bürgin

PhD
Technical University of Munich (TUM)
Interpreting Neural Network Loss Landscape Properties and Parameter Symmetries

Neural network training loss landscapes are complex objects with many properties that are not yet well understood. A better understanding has the potential to help inform training algorithms, allow model merging, help federated learning, as well as answer fundamental question about the nature of functions computed by neural networks. Of particular interest in this area are parameter symmetries, such as neuron permutations within a network layer, that leave the computed function unchanged: The presence of such symmetries impacts the loss landscape structure in significant ways, implying for example that copies of each local minimum are repeated at many different permuted locations in parameter space, or, in case of continuous symmetries, that points in parameter space stretch out into manifolds of parameters which all compute the same function. An open hypothesis by Entezari et al. even states that independently trained neural networks tend to all converge to essentially the same solution, with almost-zero loss barrier in between, if parameter symmetries are taken into account. Recent work by Lim et al. proposes W-asymmetric networks (W-MLPs) that remove symmetries from the architecture, and exhibit surprising behavior such as being able to interpolate the weights of trained networks without loss of predictive accuracy (known as linear mode connectivity).

Many fundamental questions relating to this area remain unanswered. W-MLP landscapes in some settings behave almost like convex functions - but what are the limits of this observation? Can different low-loss basins correspond to different interpretable algorithmic mechanisms or generalization properties that the training process may converge to, and can this be used to steer the training? What are the broader implications of parameter symmetries, how do they impact different settings, and can these findings be exploited for network design or optimization? The goal of this PhD project is to explore and make progress on the open questions around this area of research.

Track:
Academic Track
PhD Duration:
June 1st, 2025 - May 31st, 2029
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.