Sidak Pal Singh

The main aim of this doctoral thesis is to better understand the properties and mechanisms underlying the success of deep neural networks. In particular, our emphasis is on identifying relevant structural properties of neural networks that are inherently at play and then utilize them to holistically investigate the problems concerning parameterization of neural networks — especially, the remarkable generalization behaviour in the over-parameterized regime [Neyshabur et al., 2017, Belkin et al., 2018] and excessive redundancy observed empirically in the usual parameterization [Han et al., 2015, Li et al., 2018]. Previous works explaining generalization for over-parameterized models are restricted to the linear regression case [Bartlett et al., 2020, Hastie et al., 2019] — moving beyond which fundamentally requires identifying suitable non-vacuous generalization bounds for a given neural network (unlike, compressed network or family of networks in Arora et al. [2018], Zhou et al. [2018] respectively). In regards to observed parameter redundancy, most prior work [Li et al., 2018, Frankle and Carbin, 2018, Singh and Alistarh, 2020] is based on pruning techniques [LeCun et al., 1989]. Consequently, due to their empirical nature, it remains intractable to know the precise extent of redundancy, the factors affecting it, and the reasons behind its origin. Inspired by the significant degeneracy of neural network Hessian maps observed empirically [Sagun et al., 2016], our approach aims at grounding this theoretically — by a rigorous analysis of the rank and structure of the Hessian. This provides us novel tools and insights, which we aim to use as a basis to (i) identify the precise phenomenology of generalization in the over-parameterized regime, (ii) explain the source and extent of immense redundancy in the parameterization, (c) compare the benefit imparted by different architectures. Overall, we hope that the derived structural properties and tools not only provide an improved understanding of the mysteries surrounding neural networks, but also eventually take neural networks beyond their current deficiencies, e.g., in regards to inefficient parameterization, out-of-distribution generalization, and more.