Deep-learning
algorithms have dramatically improved the state of the art in many
machine-learning problems, including computer vision, natural language
processing, and audio
recognition. However, there is no satisfactory mathematical theory that
adequately explains their success. Clearly, it is unacceptable to utilize such "black box” methods
in any application for which performance guarantees are critical (e.g., traffic-safety applications).

DNNs consist of several hidden layers comprising many nodes. The nodes are where computations happen: the inputs to the nodes are weighted by coefficients that amplify or dampen that input and then the result is passed through a nonlinear activation function. The coefficients (weights) are optimized, e.g., through stochastic gradient descent (SGD), during a training phase where labelled inputs are provided to the network, and the labels produced by the network are compared with the ground truth by using a suitably chosen loss function. What features of deep neural networks then allow them to learn "general rules'' from training sets? What class of functions can they learn? How many resources (e.g., layers, coefficients) do they need?

This project is geared at increasing our theoretical understanding of DNNs through the development of novel information-theoretic bounds on the generalization error attainable. We will also explore how such bounds can guide the practical design of DNNs network.

DNNs consist of several hidden layers comprising many nodes. The nodes are where computations happen: the inputs to the nodes are weighted by coefficients that amplify or dampen that input and then the result is passed through a nonlinear activation function. The coefficients (weights) are optimized, e.g., through stochastic gradient descent (SGD), during a training phase where labelled inputs are provided to the network, and the labels produced by the network are compared with the ground truth by using a suitably chosen loss function. What features of deep neural networks then allow them to learn "general rules'' from training sets? What class of functions can they learn? How many resources (e.g., layers, coefficients) do they need?

This project is geared at increasing our theoretical understanding of DNNs through the development of novel information-theoretic bounds on the generalization error attainable. We will also explore how such bounds can guide the practical design of DNNs network.