Statistics Seminar: Generative Modeling with Trees and Recursive Partitions

March 4, 2024 - 3:00pm

Abstract:

Trees and recursive partitions are most well-known in supervised learning for predictive tasks, such as regression and classification. Famous examples include CART and its various forms of ensembles—e.g., random forest and boosting. A natural question is whether such successes can be replicated in the context of unsupervised problems and for the task of generative modeling. I present a recent example of tree-based approaches in the context of generative modeling, where the primary objective is to learn the underlying nature of complex multivariate, possibly high-dimensional distributions based on unlabeled i.i.d. training data and allow effective sampling from the trained distribution. The approach addresses density learning and generative modeling in multivariate sample spaces using an additive ensemble of tree-based density learners. The employment of trees and partitions leads to highly efficient, statistically rigorous algorithms that scale approximately linearly in the sample size. The resulting model can be trained quickly or even in real-time on single computers and achieves competitive performance compared to deep neural network-based approaches in some application contexts. The method is a counterpart of supervised tree boosting and preserves many desirable properties of its supervised cousin. It also has a close connection to the so-called normalizing flows based on sequentially transforming a base distribution to obtain a desired distribution and so produces a generative model that can be directly sampled from through sequential inverse transforms.

Speaker:

Li Ma (Duke University)

Location and Address

1811 Posvar Hall/Zoom: https://pitt.zoom.us/s/99769710291