Seminar
The event has passed

Calibration methods for generative models with application to protein structure modeling

AI for Science seminar with Brian Trippe, Stanford University.

Overview

The event has passed

Zoom password: ai4science

The on-site event will be followed by fika in the Analysen coffee area (fika from 16:00-16:30).

Abstract:

Generative models have driven rapid progress in protein structure prediction and design, yet they remain miscalibrated: backbone generation models, for example, produce realistic samples but systematically underestimate the probabilities of important modes and lack sensitivity to destabilizing mutations.

This talk attributes these limitations to a form of miscalibration, wherein the statistics of generations deviate from desired values. Such inconsistencies are not addressed by existing fine-tuning methods for image and text generation. We frame calibration of generative models as a constrained optimization problem. Because the natural objective is intractable, we introduce two surrogate objectives and derive low-variance gradient estimators for stochastic optimization. The resulting procedures reduce the majority of calibration error across hundreds of simultaneous constraints and models with up to nine billion parameters.

Lastly, we describe a preliminary application to assimilate thermodynamic measurements to calibrate a generative model of protein structure Boltzmann ensembles.

About the speaker:

Brian Trippe is an Assistant Professor in the Department of Statistics at Stanford. Previously he was a visiting researcher at the Institute for Protein Design at the University of Washington where he worked with David Baker, and a postdoctoral fellow at Columbia University.

He completed his Ph.D. in Computational and Systems Biology at MIT in 2022. His recent research has developed statistical machine learning methods to address challenges in biotechnology and medicine, with a focus on generative modeling and inference algorithms for protein engineering.

 

Structured learning

This theme focuses on how to make use of structure in data to build machine learning (ML) and artificial intelligence (AI) systems which are safer, more trustworthy and generalize better. Structure includes the relationship between data, in time and space, and how the predictions change when data is transformed in specific ways, for example rotated or scaled. These topics are abstract and general but have a direct impact on the use of AI and ML in the sciences and in applications such as drugs and materials design, or medical imaging.

Rocio Mercado
  • Assistant Professor, Data Science and AI, Computer Science and Engineering
Simon Olsson
  • Associate Professor, Data Science and AI, Computer Science and Engineering