Synthetic Data Generation Techniques for Automotive Machine Learning
Overview
Date:
Starts 14 June 2023, 10:00Ends 14 June 2023, 11:00Location:
MV:L14, Chalmers tvärgata 3Language:
English
Abstract: Seat belts drastically reduces the risk of injury or death, given that one is wearing it correctly. This thesis emanates from Volvo Cars’ aspiration to tackle this risk, using the growing potential of machine learning (ML). The foundation of this work stems from another thesis at Volvo Cars, where a semantic segmentation model was developed, for identifying and segmenting the seat belt in an image of a car occupant. To apply this segmentation model approach, the tedious and costly process of collecting and annotating data is fundamental. The thesis explores the concept of using synthetic data, i.e., data that is made by software and annotated in silico, as a substitute for, and a complement to, previously collected real-world data. Specifically, the thesis explores different methods on how to apply and generate synthetic data and what aspects improve its quality, in regard to the prediction accuracy of segmentation model. As a measure of prediction accuracy, the mean intersection over union (IoU) over a test set consisting of real-world images is used. Several segmentation models, with different architectures, are evaluated to find the best performing network. The thesis also explores the concept of domain randomization, which aims to narrow the domain gap between the synthetic and real data, as well as multiple label annotations to investigate whether identifying other objects improves segmentation of the seat belt, and guided backpropagation (GBP) to explain predictions made by the segmentation model.
This thesis shows that, when there is a scarcity of real-world data, introducing synthetic data can improve prediction accuracy. When evaluating the segmentation model which provided the overall highest performance, using a Unet++ decoder and a ResNet 34 encoder, the results show a mean IoU of 0.76 when trained on only real data, and 0.79 when trained on real and synthetic data. The same model obtains a 0.73 mean IoU when trained on only synthetic data. It is also shown that when
the model is trained to identify objects which often interacts with the seat belt, its prediction accuracy on the seat belt is slightly improved. The largest improvement was found when the model was trained to also identify the occupant’s shirt, where the mean IoU improved from 0.63 to 0.66.
The thesis has identified ways to make synthetic data more appropriate for training the seat belt segmentation model. Together with the positive results from the multiple label training, this thesis successfully demonstrates that there is a lot of potential to further develop the application of synthetic data in the future, in this use case specifically, and in image segmentation in general. One obvious approach would be to use a more powerful graphics engine, making the synthetic data even more realistic.
Examiner: Serik Sagitov