New Algorithms for Efficient LDA Topic Reconstruction

​Välkommen till ett seminarium med Alessandro Panconesi, Computer Science, Sapienza University of Rome, Director Bertinoro Informatics Center (BiCi).

Abstract: Informally, topic reconstruction is the problem of automatically recovering the topics of a given corpus of documents. LDA (Latent Dirichlet Allocation) is a famous paradigm that has been proposed to tackle the problem.

We present a novel approach for LDA (Latent Dirichlet Allocation) topic reconstruction. The main technical idea is to show that the distribution over the documents generated by LDA can be transformed into a distribution for a much simpler generative model in which documents are generated from the same set of topics, but have a much simpler structure: documents are single topic and topics are chosen uniformly at random. 

Furthermore, this reduction is approximation preserving, in the sense that approximate distributions --- the only ones we can hope to compute in practice --- are mapped into approximate distribution in the simplified world. This opens up the possibility of efficiently reconstructing LDA topics in a roundabout way. Given the input corpus, compute an approximation of the document distribution generated by LDA, transform it into an approximate distribution for the single-topic world, and run a reconstruction algorithm in the uniform, single-topic world --- a much simpler task than direct LDA reconstruction. 

We show the viability of the approach by giving very simple algorithms for a generalization of two notable cases that have been studied in the literature, $p$-separability and matrix-like topics.

Joint work with: Matteo Almanza, Flavio Chierichetti, Andrea Vattani

Kategori Seminarium
Plats: Sal EC, vån 3, EDIT-huset, Hörsalsvägen 11, Johanneberg
Tid: 2018-11-14 13:15
Sluttid: 2018-11-14 14:15

Publicerad: on 31 okt 2018. Ändrad: ti 06 nov 2018