Handledare och examinator: Lennart Svensson, Inst för elektroteknik
One of the main challenges in making autonomous vehicles a reality is to provide them with an accurate representation of their surroundings. Multi-object tracking is a perception task enabling such a representation by tracking the states of an unknown number of objects using noisy measurements. For kinematic state tracking, state-of-the-art performance has thus far been achieve by model-based Bayesian filters. In theory, these have the potential to provide Bayes-optimal estimates. Apart from relying on models, these methods must resort to approximations in order to remain computationally tractable in complex scenarios, thus impacting their performance.
In contrast, model-free methods based on deep learning have the potential to learn the optimal filter from data, thus providing an attractive alternative to model-based methods. However, to the best of our knowledge, such approaches have never been compared to the high-performing Bayesian filters, especially in a setting where accurate models are available. This thesis proposes new deep learning methods based on the Transformer architecture to perform the multi-object tracking task. The performance of these methods is compared to state-of-the-art model-based methods, in a setting where the correct model is assumed to be given. While this gives an advantage to the model-based methods, it also allows us to train the deep learning models on an unlimited amount of data. The Transformer models are shown to outperform the Bayesian filters on complex tasks, while performing on par for simpler scenarios. Thus, we display the applicability of the Transformer in yet another field, and show the potential of data-driven approaches in a territory dominated by model-based approaches.