Filipp Zmushko
Modern neural network training faces increasing computational and memory demands, particularly for large-scale models. We focus on developing optimization algorithms specifically designed for quantization-aware training scenarios. We move beyond standard 16-bit computations to explore 8-bit, 4-bit, and sub-4-bit precision training, which can dramatically reduce memory requirements and computational costs. However, extreme quantization poses significant challenges for traditional optimization methods, requiring novel algorithmic adaptations. In addition, while conventional approaches maintain full precision for network parameters and optimizer states, we also investigate techniques to quantize these components as well, achieving additional memory savings without compromising training effectiveness.
In this context, the co-advising between Profs. Alistarh and Cevher is a really great opportunity, as the former group has had significant experience with practical compression techniques, whereas Prof. Cevher's group has significant expertise in optimization.