Diganta Misra
Efficiently processing long sequences is a critical challenge in deep learning, particularly for language models that rely on understanding long-range dependencies in text. Traditional transformers face limitations due to the quadratic increase in memory and computation costs as sequences grow. State-space models (SSMs) offer a promising solution with linear complexity, making foundation models scalable to longer sequences. However, these models demand extensive tuning and careful design, especially when handling diverse and large-scale datasets. The effects of quantization, compression, and expansion still need to be explored. This project will focus on accelerating pretraining and finetuning efficient foundation models based on SSMs and refined linear attention variants. Topics will include identifying the current compute/memory/performance tradeoffs on new foundation models and their reasoning capabilities beyond text. We will also work on efficient deployment of such models and issues such as robustness and quantization.