- Datum:Startar 5 april 2023, 14:00Slutar 5 april 2023, 16:00
Understanding and Evaluating Policies for Sequential Decision-Making
Sequential-decision making is a critical component of many complex systems, such as finance, healthcare, and robotics. The long-term goal of a decision-making process is to optimize the policy under which decisions are made. In safety-critical domains, the search for an optimal policy must be based on observational data, as new decision-making strategies need to be carefully evaluated before they can be tested in practice. In this thesis, we highlight the importance of understanding sequential decision-making at different stages of this procedure. For example, to assess which policies can be evaluated with the available data, we need to understand the policy that actually generated the data. And once we are given a policy to evaluate, we need to understand how it differs from current practice.
First, we focus on the evaluation process, where a target policy is evaluated using off-policy data collected under a different so-called behavior policy. This problem, commonly referred to as off-policy evaluation, is often solved with importance sampling (IS) techniques. Despite their popularity, IS-based methods suffer from high variance and are hard to diagnose. To address these issues, we propose estimating the behavior policy using prototype learning. Using the learned prototypes, we describe differences between target and behavior policies, allowing for better assessment of the IS estimates.
Next, we take a clinical direction and study the sequential treatment of patients with rheumatoid arthritis (RA). The armamentarium of disease-modifying anti-rheumatic drugs (DMARDs) for RA patients has greatly expanded over the past decades. However, it is still unclear which treatments work best for individual patients. To examine how observational data can be used to evaluate new policies, we describe the most common patterns of DMARDs in a large patient registry from the US. We find that the number of unique patterns is large, indicating a significant variation in clinical practice which can be exploited for evaluation purposes. However, additional assumptions may be required to arrive at statistically sound results.