Doctoral thesis

Arman Rahbar,

Knowledge Transfer and Active Learning for Representation Learning in the Absence of Sufficient Supervision

Overview

In many machine learning applications, labels or feature values may be unavailable or expensive to obtain. This motivates the need to develop methods for learning data representations under limited supervision, where supervision refers to either labels or feature values. In this thesis, we study this important problem from two distinct perspectives.

First, we explore knowledge transfer to improve representation learning in the target domain by leveraging existing sources of knowledge. These include (i) kernel and neural embeddings, which we use to guide learning in wide neural networks based on insights from neural network theory, and (ii) labeled data from a different source domain. We propose an optimal transport framework that enables the use of source domain labels when target labels are not available.

Second, we study active feature acquisition, where the goal is to select a cost-efficient subset of features for each data point under a budget constraint. We review recent works in this area, and then focus on the online setting, where data arrives as a stream and no training data is available in advance. We introduce a general framework for this setting, using methods from online learning and bandits, such as Thompson sampling, to efficiently learn from streaming data. We present two formulations of the problem, based on a partially observable Markov decision process and a combinatorial multi-armed bandit. The latter allows us to generalize our framework to stochastic feature costs. These formulations enable us to theoretically analyze the performance of our online framework. Our experiments show that the proposed framework is competitive in both feature acquisition cost and prediction accuracy.