Doktorsavhandling

Yaroslava Lochman, Signalbehandling och medicinsk teknik

Geometry and Learning in 3D Computer Vision

Översikt

This thesis focuses on studying and improving the accuracy, reliability, and efficiency of 3D vision pipelines. We leverage techniques from geometry, optimization, and deep machine learning, and we also try to explore and understand when it is suitable to combine them and when it is not, if the overall success of a 3D reconstruction system is a priority. In modern computer vision, deep neural networks are often utilized as black boxes, not only for perception but also for solving geometric problems. The performance is highly dependent on the amount and quality of the data, and the results can sometimes be surprisingly poor. Classic geometric models and optimization techniques in 3D vision are much better understood. While they are still preferred in many applications, the learning-based counterparts showcase an amazing improvement over traditional methods on certain challenging tasks.

The thesis is structured around three problems: (1) camera calibration, (2) rotation averaging, and (3) motion segmentation. For each of these problems, we analyze the weak points and failure modes of existing methods and propose new algorithms that leverage standard techniques from geometry and optimization or hybrid learning pipelines that aim to retain the interpretability of geometric models while benefiting from the expressivity and adaptability of deep neural networks.

Our contributions include: (i) a versatile pipeline for calibrating central cameras with various lens configurations that relies on simple techniques and solvers and proves to be very stable, (ii) a semidefinite program for anisotropic rotation averaging that leverages the readily-available uncertainties of the relative estimates and relies on a new convex relaxation, leading to improved reconstruction accuracy, (iii) a fast block-coordinate descent solver for anisotropic rotation averaging that achieves similar reconstruction accuracy while significantly reducing the runtime, (iv) robustification pipelines for anisotropic rotation averaging allowing gross outliers in the data, and (v) a metric learning approach addressing the challenging chicken-and-egg problem of motion segmentation via clustering in the space of trajectory feature representations, where inference is done in a fraction of a second.