Föreläsning
Evenemanget har passerat

Lecture by Simon Maskell, University of Liverpool

In terms of the content of the talk:
• Shared and Distributed Memory SMC Samplers for Decision Trees
• Efthyvoulos Drousiotis, Alessandro Varsi, Simon Maskell and Paul Spirakis
• Modern classification problems tackled by using Decision Tree (DT) models often require demanding constraints in terms of accuracy and scalability.

Översikt

Evenemanget har passerat

Simon Maskell from University of Liverpool will visit Chalmers to give a short talk, see topics below.

More information about Simon:
https://www.liverpool.ac.uk/electrical-engineering-and-electronics/staff/simon-maskell/

In terms of the content of the talk:
• Shared and Distributed Memory SMC Samplers for Decision Trees
• Efthyvoulos Drousiotis, Alessandro Varsi, Simon Maskell and Paul Spirakis
• Modern classification problems tackled by using Decision Tree (DT) models often require demanding constraints in terms of accuracy and scalability.

This is often hard to achieve due to the ever-increasing volume of data used for training and testing. Bayesian approaches to DTs using Markov Chain Monte Carlo (MCMC) methods have demonstrated great accuracy in a wide range of applications. However, the inherently sequential nature of MCMC makes it unsuitable to meet both accuracy and scaling constraints. One could run multiple MCMC chains in an embarrassingly parallel fashion. Despite the improved runtime, this approach sacrifices accuracy in exchange for strong scaling. Sequential Monte Carlo (SMC) samplers are another class of Bayesian inference methods that also have the appealing property of being parallelizable without trading off accuracy. Nevertheless, finding an effective parallelization for the SMC sampler is difficult, due to the challenges in parallelizing its bottleneck, redistribution, in such a way that the workload is equally divided across the processing elements, especially when dealing with variable-size models such as DTs. We discuss how to implement an SMC sampler for DTs on Shared and Distributed Memory architectures, with an O(log2 N) time complexity for parallel redistribution. Work is ongoing to extend the implementation to consider GPUs and distributed memory. However, we have quantified performance on a shared CPU memory machine mounting 32 cores: the experimental results show that our proposed method scales up to a factor of 16 compared to its serial implementation, and provides comparable accuracy to MCMC, but is 51 times faster. We also discuss results we have recently achieved in applications that include identifying students who will go on to exhibit suicidal ideation.

Welcome!

Simon Maskell and Lennart Svensson