Andrei Panferov
One of the major challenges towards practicality of LLMs is given by their size and their computational requirements. The PhD topic under development concerns new ways of executing LLMs under limited computational and memory constraints, by investigating new algorithms and practical implementations which would allow for extremely high compression of LLMs. By this, we mean methods which would reduce the amount of information to around or even below one bit per parameter.
The project would be naturally split into 3 major topics:
- Next-generation compression for already-trained models.
- Next-generation compression for training from scratch.
- Architectural (software and hardware) feasibility and implications.
In this context, the co-advising between Profs. Alistarh and Jaggi is a really great opportunity, as the former group has had significant experience with practical post-training compression techniques, whereas Prof. Jaggi's group has significant expertise in pre-training and optimization.