Examiner: Lars Hammarstrand
More reasonable decisions regarding path security rely on a better understanding of the surrounding environment. The perception collects measurements from the environment using sensors, such as cameras, lidars and radars. Based on the collected data, the location and class of objects are predicted by object detection algorithms.
There are different detection methods based on single or multiple sensors. To fully use the characteristics from various sensors, combining the information from multiple sensors may be a potential prospect, so-called sensor fusion. Sensor fusion can
achieve more reliable results by the complementary information from different sensors. It is certified that trained deep neural networks have the strength to achieve accurate object detection.
The main objective of this thesis is to investigate how the lidar-only deep model CenterPoint can be improved by also considering camera information. One of the common ways to extract object classification from a camera is semantic segmentation, partitioning the pixels with semantic labels. The semantic segmentation scores for relevant objects should help object detection. Hence, this thesis focuses on the following research questions: 1) Can the CenterPoint algorithm
be improved by including semantic information from a camera? If so, by how much? 2) Are there situations where fusing with the segmentation information degrade the result? 3) What are the reasons causing the differences?
We propose a fusion strategy called Painted CenterPoint, inspired by the PointPainting fusion algorithm. After projecting lidar point clouds on images, points are painted with the corresponding segmentation scores. Then employ the CenterPoint on painted point clouds to achieve the final detection results. The segmentation methods would differ how PointPainting benefits the lidar detector in different metrics and scenarios, so we introduce three segmentation methods: DeepLabV3, DeepLabV3+ and Hierarchical Multi-scale Attention (HMA). We train the fusion models on a simple KITTI training set to detect and classify Vehicle, Pedestrian and Cyclist classes. Then the model is evaluated on KITTI metrics and also cross evaluated on nuScenes to test the robustness. The final results indicate that CenterPoint can be improved by the "paint" strategy. In conclusion, Painted CenterPoint with DeepLabV3 gives the most unstable results. And Painted CenterPoint with HMA gives the highest improvements and precision on KITTI. The performance of DeepLabV3+ is somewhere in between. Keywords: Deep Learning, Image Segmentation, Object Detection, Sensor Fusion.