Isac Boström, Chalmers: Bayesian Inference for Models of Text Data
Översikt
- Datum:Startar 18 March 2026, 13:15Slutar 18 March 2026, 14:00
- Plats:MV:L14, Chalmers tvärgata 3
- Språk:Engelska
Abstrakt finns enbart på engelska: Models of text data are increasingly applied to inference tasks in the social sciences to investigate a wide range of linguistic and cultural phenomena. Word embeddings, for example, are commonly used to study semantic change, political language, and social bias in large collections of text. However, these models are typically estimated by optimization, producing point estimates without principled uncertainty quantification.
In this talk, I present a Bayesian formulation of probabilistic word embedding models, focusing on skip-gram with negative sampling and briefly discussing continuous bag-of-words. I explain why the posterior distribution is non-identifiable under general linear transformations of the embedding space and introduce a simple and principled constraint that ensures a well-defined posterior. I then compare different approaches to posterior inference, including mean-field variational inference, Hamiltonian Monte Carlo, and Pólya-Gamma Gibbs sampling. By augmenting the likelihood with Pólya-Gamma latent variables, we obtain an efficient sampler that provides scalable and well-calibrated uncertainty quantification.
I will also briefly discuss the structural topic model as a related example where Bayesian uncertainty plays a central role.
- Postdoc, Tillämpad matematik och statistik, Matematiska vetenskaper