algorithms have dramatically improved the state of the art in many
machine-learning problems, including computer vision, natural language
processing, and audio
recognition. However, there is no satisfactory mathematical theory that
adequately explains their success. Clearly, it is unacceptable to utilize such "black box” methods
in any application for which performance guarantees are critical (e.g., traffic-safety applications).
consist of several hidden layers comprising many nodes. The nodes are
where computations happen: the inputs to the nodes are weighted by
coefficients that amplify or dampen that input
and then the result is passed through a nonlinear activation function. The
coefficients (weights) are optimized, e.g., through stochastic gradient
descent (SGD), during a training
phase where labelled inputs are provided to the network, and the labels
produced by the network are compared with the ground truth by using a
suitably chosen loss function. What
of deep neural networks then allow them to learn "general rules'' from
training sets? What class of functions can they learn? How many
resources (e.g., layers, coefficients) do they need?
This project is geared at increasing our theoretical understanding of DNNs through
the development of novel information-theoretic bounds on the
generalization error attainable. We will also explore how such bounds can guide the practical design of DNNs network.