Korbinian Pöppel
PhD
Hardware-Efficient Machine Leaming Architectures and their Capabilities, Optimization and Scaling Behavior

While Transformers are omni-present in current Machine Learning architectures, they scale quadratically in their number of inputs. In this
thesis, more efficient architectures should be developed, combining strong ideas like LSTM with new techniques on parallelizability,
hardware-efficiency and optimization for scaling to !arger model sizes. We extended the LSTM in the xLSTM project to match Transformer performance in Lan9uage Modelin9. We implement Hardware-efficient CUDA kernels to test the limits in speed of existing and new LSTM variants. Finally, we test the new models' optimization specifics and scaling laws to make them suited for more training data and better
performance. Specifically, we want to find scaling laws for optimal model and training hyperparameters along training data and model sizes.

Track:
Academic Track
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.