Översikt

Datum:Startar 13 januari 2026, 09:00Slutar 13 januari 2026, 13:00
Plats:
EA, Hörsalsvägen 11
Online disputation (Öppnas i ny flik)
Opponent:Prof. Jan Halborg Jensen, University of Copenhagen, Denmark
AvhandlingLäs avhandlingen (Öppnas i ny flik)

Drug design is an iterative process aimed at identifying suitable molecules for specific biological targets. Modern computer-aided drug design increasingly leverages machine learning to inform decision-making throughout this process. However, a key challenge remains: the interactive acquisition of new knowledge to improve machine learning models using relevant data. This thesis examines sequential decision-making problems in machine learning for optimizing data collection strategies in computer-aided drug design.

To experimentally test a molecule's properties, it must first be synthesized through a sequence of chemical reactions to obtain the desired product. Machine learning can identify and validate suitable chemical reactions by predicting reaction outcomes, but this approach requires sufficient data for each reaction type of interest. This thesis presents work that combinatorially investigates different aspects of active learning to improve predictive capabilities for determining whether a reaction will produce a sufficient amount of product. In practice, only a limited number of molecules can be synthesized per design cycle due to cost and time constraints, whereas current generative models can produce numerous molecular candidates. Therefore, another work in this thesis investigates how to optimally select which generated molecules to test, given a constrained experimental budget. We formulate this challenge as a multi-armed bandit problem and propose a novel algorithm to address it.

To generate novel molecules with desired predicted properties, previous research has successfully employed reinforcement learning to align generative model outputs to a specific biological target. This thesis examines additional perspectives on applying reinforcement learning to sequentially utilize and collect target-specific data. We present a systematic comparison of various reinforcement learning algorithms for generating drug molecules and investigate methods for effectively learning from generated samples. Moreover, designing a diverse set of promising molecules is crucial for a successful drug discovery pipeline. Therefore, we propose new methods to enhance chemical exploration by adaptively modifying the reward signal. We also introduce a mini-batch diversification framework for on-policy reinforcement learning and apply it to molecular generation, thereby improving chemical exploration during the generative process. Together, these contributions advance sequential decision-making in drug design by optimizing the acquisition of new data.

Hampus Gummesson Svensson, Data Science och AI

Översikt