Łukasz Staniszewski

PhD
Warsaw University of Technology (WUT)
Towards Interpretable and Controllable Generative Models

Generative models have achieved remarkable success in producing realistic synthetic content, including text, images, audio, and video. However, their outputs often remain difficult to control, frequently diverging from human intent. Analyzing computations within neural networks offers new insights into the internal workings of foundation models. One such idea is the linear representation hypothesis, which suggests that high-level concepts are encoded as directions in the model's representation space. By contrasting the activations for opposite concepts, one can derive steering vectors that, further applied to the generation process, enable smooth and continuous modulation of the outputs. Building on this perspective, the proposed PhD research will pursue three complementary directions. First, I will design a unified evaluation framework for controllable generation. The introduced metric will jointly measure expressiveness, disentanglement, and the linearity of control, the three objectives not previously assessed together. Second, leveraging the fact that training dynamics implicitly cause individual model components to specialize in representing particular attributes, I will design new activation-based control methods that compute steering vectors within specialized layers, achieving more precise and disentangled manipulation. Third, I will advance model interpretability by introducing a new approach for models' components localization based on gradient approximations, addressing the high computational cost of current techniques. Together, these contributions aim to bridge the gap between human intent and model behavior. By enabling transparent and controllable generation, this work aspires to support the development of safer, fairer, and more trustworthy generative systems.

Track:
Academic Track
PhD Duration:
October 1st, 2025 - October 31st, 2029
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.