Semantic segmentation of images is a crucial component in the development of autonomous driving, enabling vehicles to become aware of their surroundings. Supervised training of state of the art semantic segmentation neural networks rely on large annotated datasets. This thesis investigates different data augmentation techniques to provide new information without the need for more hand annotated data. Two types of data augmentation are evaluated: 1) simple image transformations and 2) deep learning based generative models. All augmentation techniques are evaluated based on the performance of a semantic segmentation network (ENet) trained on an augmented Cityscapes set but validated on a non-augmented set.
Two existing generative models based on generative adversarial models (GAN) and variational autoencoders (VAE) are investigated, one unsupervised for domain translation and one supervised for generating images from semantic maps. Both models are extended with a feedback loss based on the semantic segmentation of the generated images, it is found to be impactful for the unsupervised model. In particular, we find it acts as a regularising effect, limiting the occurrence of artefacts in the generated images. The unsupervised model is further extended to support multi-domain translations and we find that these translations are possible but heavily dependent on the properties of the dataset.
In terms of semantic segmentation, augmentation with the supervised model, trained on real pairs of images and semantic maps, is shown to increase performance. Augmenting images generated from synthetic maps outperforms both adding unprocessed synthetic data as well real data only. Our findings highlight the potential of deep generative models, not only as intricate image generators, but as useful augmentation tools.
Examiner: Lennart Svensson
Student project presentation
EDIT room (room 3364), Hörsalsvägen 11
14 June, 2018, 09:00
14 June, 2018, 10:00