Bayesian additive regression trees for probabilistic programming

Published in arXiv, 2022

Recommended citation: Quiroga, M., Garay, P. G., Alonso, J. M., Loyola, J. M., & Martin, O. A. (2022). Bayesian additive regression trees for probabilistic programming. arXiv preprint arXiv:2206.03619. https://arxiv.org/pdf/2206.03619.pdf

Bayesian additive regression trees (BART) is a non-parametric method to approximate functions. It is a black-box method based on the sum of many trees where priors are used to regularize inference, mainly by restricting trees’ learning capacity so that no individual tree is able to explain the data, but rather the sum of trees. We discuss BART in the context of probabilistic programming languages (PPLs), i.e. we present BART as a primitive that can be used as a component of a probabilistic model rather than as a standalone model. Specifically, we introduce the Python library PyMC-BART, which works by extending PyMC, a library for probabilistic programming. We present a few examples of models that can be built using PyMC-BART, discuss recommendations for selection of hyperparameters, and finally, we close with limitations of our implementation and future directions for improvement.