Table of Contents
Fetching ...

bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond)

Nikolaus Umlauf, Nadja Klein, Thorsten Simon, Achim Zeileis

TL;DR

The paper addresses the need for flexible, distributional Bayesian regression capable of handling big data and providing full predictive distributions. It introduces bamlss as a modular Lego toolbox that decouples distributions, regression terms, and estimation engines, enabling plug-and-play combinations for Bayesian or frequentist inference. Built on the mgcv infrastructure for smooth terms, it supports a wide range of distribution families (GAMLSS-like) and provides workflows via bamlss.frame, optimizer, sampler, and post-processing to facilitate rapid prototyping and scalable inference. The framework is demonstrated through tutorials and applications (e.g., logit models, fused lasso, distributional regression with count data), illustrating improved modeling flexibility, stability, and interpretability for complex regression tasks.

Abstract

Over the last decades, the challenges in applied regression and in predictive modeling have been changing considerably: (1) More flexible model specifications are needed as big(ger) data become available, facilitated by more powerful computing infrastructure. (2) Full probabilistic modeling rather than predicting just means or expectations is crucial in many applications. (3) Interest in Bayesian inference has been increasing both as an appealing framework for regularizing or penalizing model estimation as well as a natural alternative to classical frequentist inference. However, while there has been a lot of research in all three areas, also leading to associated software packages, a modular software implementation that allows to easily combine all three aspects has not yet been available. For filling this gap, the R package bamlss is introduced for Bayesian additive models for location, scale, and shape (and beyond). At the core of the package are algorithms for highly-efficient Bayesian estimation and inference that can be applied to generalized additive models (GAMs) or generalized additive models for location, scale, and shape (GAMLSS), also known as distributional regression. However, its building blocks are designed as "Lego bricks" encompassing various distributions (exponential family, Cox, joint models, ...), regression terms (linear, splines, random effects, tensor products, spatial fields, ...), and estimators (MCMC, backfitting, gradient boosting, lasso, ...). It is demonstrated how these can be easily recombined to make classical models more flexible or create new custom models for specific modeling challenges.

bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond)

TL;DR

The paper addresses the need for flexible, distributional Bayesian regression capable of handling big data and providing full predictive distributions. It introduces bamlss as a modular Lego toolbox that decouples distributions, regression terms, and estimation engines, enabling plug-and-play combinations for Bayesian or frequentist inference. Built on the mgcv infrastructure for smooth terms, it supports a wide range of distribution families (GAMLSS-like) and provides workflows via bamlss.frame, optimizer, sampler, and post-processing to facilitate rapid prototyping and scalable inference. The framework is demonstrated through tutorials and applications (e.g., logit models, fused lasso, distributional regression with count data), illustrating improved modeling flexibility, stability, and interpretability for complex regression tasks.

Abstract

Over the last decades, the challenges in applied regression and in predictive modeling have been changing considerably: (1) More flexible model specifications are needed as big(ger) data become available, facilitated by more powerful computing infrastructure. (2) Full probabilistic modeling rather than predicting just means or expectations is crucial in many applications. (3) Interest in Bayesian inference has been increasing both as an appealing framework for regularizing or penalizing model estimation as well as a natural alternative to classical frequentist inference. However, while there has been a lot of research in all three areas, also leading to associated software packages, a modular software implementation that allows to easily combine all three aspects has not yet been available. For filling this gap, the R package bamlss is introduced for Bayesian additive models for location, scale, and shape (and beyond). At the core of the package are algorithms for highly-efficient Bayesian estimation and inference that can be applied to generalized additive models (GAMs) or generalized additive models for location, scale, and shape (GAMLSS), also known as distributional regression. However, its building blocks are designed as "Lego bricks" encompassing various distributions (exponential family, Cox, joint models, ...), regression terms (linear, splines, random effects, tensor products, spatial fields, ...), and estimators (MCMC, backfitting, gradient boosting, lasso, ...). It is demonstrated how these can be easily recombined to make classical models more flexible or create new custom models for specific modeling challenges.

Paper Structure

This paper contains 3 sections, 1 figure.

Figures (1)

  • Figure 1: Logit model, MCMC trace (left panel), auto-correlation for the intercept (middle panel), maximum auto-correlation for all parameters (right panel).