We have become used to computers that can be trained to accomplish intelligent tasks such as image and speech recognition and natural language processing. To explain how this training is performed, we can compare it to how a child learns. For example, a child needs to see a certain number of cats in order to build the general knowledge 'cat'.
Deep neural networks are trained in a similar manner. We feed them with example, which are used to adjust the parameters of the network, until the network delivers correct answers. When the network provides correct answers even when faced with new examples, that is, examples that were not used in the training phase, we know that it has acquired some general knowledge.
Deep neural networks have achieved sensational results, but there is one fundamental problem that concerns researchers and experts. We see that they work, but we do not fully understand why. A common criticism is that deep learning algorithms are used as "a black box" – which is unacceptable for all applications that require guaranteed performance, such as traffic safety applications.
”Right now, we lack the tools to describe why deep neural networks perform so well”, says Giuseppe Durisi, professor of Information Theory.
Here is one of the mysteries about deep neural networks. According to established results in learning theory, we would expect deep neural networks to perform poorly when trained with the amount of data that is typically used. But practice shows that this is perfectly fine.
”It is even the case that if you make the network more complex – which according to established knowledge would impair its ability to generalize, the performance will sometimes improve.”
There is no theoretically based explanation for why this occurs, but Giuseppe Durisi speculates with another analogy with human learning.
”In order to reach a deeper understanding and thus the ability to generalize based on a large number of examples, we are required to overlook, or forget, a certain amount of details that are not important. Somehow, deep neural networks learn which part of the data is worth memorizing and which part can be ignored.”
Many research groups around the world are now working hard to come up with a theory explaining how and why deep neural networks work. In connection with a major international conference in July this year, a competition was announced to see which research team can come up with theoretical bounds able to predict the performance of deep neural networks.
Tools from many different research fields can be used to establish such a theory. Giuseppe Durisi hopes that information theory can be the right one.
“Yes, information theory is my area of expertise, but it remains to be seen if we will succeed. That is how research works – and it is really exciting to apply the theory I am familiar with to address the completely novel challenge of understanding deep neural networks. It will keep us busy for a while.”
Giuseppe Durisi has several research projects under way and collaborates with colleagues in other fields. Within the Chalmers AI Research Centre, he collaborates with Fredrik Hellström, Fredrik Kahl and Christopher Zach, and in a WASP project, Giuseppe Durisi and Rebecka Jörnsten from Mathematical Sciences have recently recruited a doctoral student, Selma Tabakovic, who will work on this problem.
When Giuseppe Durisi reflects on the future, he sees that a greater understanding of deep learning can contribute with additional benefits – besides providing guaranteed performance in safety critical systems.
”With a theoretical understanding of how deep learning works, we could build smaller, more compact, and energy-efficient networks that may be suitable for applications such as Internet-of-Things. It would contribute to increase the sustainability of such a technology.”
INNER: information theory of deep neural networks
Fredrik Hellström, Giuseppe Durisi and Fredrik Kahl
Chalmers AI Research Centre (CHAIR)
Generalization bounds of Deep Neural Networks: Insight and Design
Selma Tabakovic, Rebecka Jörnsten and Giuseppe Durisi
Wallenberg AI, Autonomous Systems and Software Program (WASP)
A deep neural network is a computer program that learns on its own. It is called "neural network" because its structure is inspired by the neural network that forms the human brain. Deep learning is a machine learning method, and part of what we call artificial intelligence.
Illustration above: A deep neural network is fed with training data (in this case images) and the learning algorithms interpret the images through a number of layers – for each layer the degree of abstraction increases. Once the network has learned to identify combinations of patterns in the image – the system is able to distinguish a dog from a cat even on completely new images that were not included in the training material.