Generalization bounds for Deep Neural Networks: Insight and Design

Deep-learning algorithms have dramatically improved the state of the art in many machine-learning problems, including computer vision, natural language processing, and audio recognition. However, there is no satisfactory mathematical theory that adequately explains their success. Clearly, it is unacceptable to utilize such "black box” methods in any application for which performance guarantees are critical (e.g., traffic-safety applications).

DNNs consist of several hidden layers comprising many nodes. The nodes are  where computations happen: the inputs to the nodes are  weighted by coefficients that amplify or dampen that input and then the result is passed through a nonlinear activation function. The coefficients (weights) are optimized, e.g., through stochastic gradient descent (SGD), during a training phase where labelled inputs are provided to the network, and the labels produced by the network are compared with the ground truth by using a suitably chosen loss function. What features of deep neural networks then allow them to learn "general rules'' from training sets? What class of functions can they learn? How many resources (e.g., layers, coefficients) do they need?

This project is geared at increasing our theoretical understanding of DNNs through the development of novel information-theoretic bounds on the generalization error attainable. We will also explore how such bounds can guide the practical design of DNNs network.

Published: Wed 26 Feb 2020.