By removing as much as 90% or more of a deep neural network’s (DNN’s) parameters, a wide variety of pruning approaches not only allow for DNN compression but also increase generalization (model performance on test/unseen data). This observation is in conflict with emerging DNN generalization theory and empirical observations, however, which suggest that DNNs generalize better as their parameter counts rise, despite overparameterization (use of more parameters than data points). Seeking to reconcile such modern findings and pruning-based generalization improvements, this thesis empirically studies the cause of improved generalization in pruned DNNs.
This dissertation begins by providing support for our hypothesis that pruning regularizes similarly to noise injection with a perhaps surprising result: pruning parameters more immediately important to the network leads to better generalization later, after the network has adapted to the pruning.
This study shows that this behavior is a manifestation of a more general phenomenon. Across a wide variety of experimental configurations and pruning algorithms, pruning’s benefit to generalization increases with pruning’s instability (defined as the drop in test accuracy immediately after pruning).
The researcher studies the limits of this generalization-stability tradeoff and use it to inform the derivation of a novel pruning algorithm that produces particularly unstable pruning and higher generalization. Such results suggest that accounting for this tradeoff would improve pruning algorithm design.
Finally, this dissertation empirically examines the consistency of several generalization theories with the generalization-stability tradeoff and pruning-based generalization improvements. Notably, we find that pruning less stably heightens measures of DNN flatness (robustness to data-sample and param- eter changes) that are positively correlated with generalization, and pruning-based generalization improvements are maintained when pruning is modified to only remove parameters temporarily. Thus, by demonstrating a regularization mechanism in pruning that depends on changes to sharpness- related complexity rather than parameter-count complexity, this thesis elucidates the compatibility of pruning-based generalization improvements and high generalization in overparameterized DNNs, while also corroborating the relevance of flatness to DNN generalization.
Dr. Bartoldson is a graduate is Florida State University’s scientific computing program. This post is based on Dr. Bartoldson’s dissertation abstract. You can learn more about this project here.
The feature image is from Pexels.