Table of Contents
Fetching ...

Branch and Bound to Assess Stability of Regression Coefficients in Uncertain Models

Brian Knaeble, R. Mitchell Hughes, George Rudolph, Mark A. Abramson, Daniel Razo

TL;DR

The paper tackles the problem of interpreting regression coefficients under model uncertainty in high-dimensional settings by introducing a branch and bound algorithm that searches a large, discrete space of regularized, linear-in-parameters models to compute tight bounds on the adjusted slope coefficient. Leveraging KOA confounding-interval bounds, the method yields valid lower and upper bounds across all candidate models, effectively summarizing how the coefficient could vary without exhaustively checking $2^p$ models. The authors prove a supporting mathematical result and demonstrate practical utility through four data-analytic trials and a concrete NHANES-based example, showing that the bounds can be surprisingly tight and computationally feasible where brute-force search is impractical. The approach provides a data-summarization tool for researchers to assess coefficient stability under model uncertainty, with potential extensions to more complex models and diagnostic enhancements. This has practical impact for interpreting relationships in large observational datasets where model selection uncertainty is high and traditional standard errors may be inadequate.

Abstract

It can be difficult to interpret a coefficient of an uncertain model. A slope coefficient of a regression model may change as covariates are added or removed from the model. In the context of high-dimensional data, there are too many model extensions to check. However, as we show here, it is possible to efficiently search, with a branch and bound algorithm, for maximum and minimum values of that adjusted slope coefficient over a discrete space of regularized regression models. Here we introduce our algorithm, along with supporting mathematical results, an example application, and a link to our computer code, to help researchers summarize high-dimensional data and assess the stability of regression coefficients in uncertain models.

Branch and Bound to Assess Stability of Regression Coefficients in Uncertain Models

TL;DR

The paper tackles the problem of interpreting regression coefficients under model uncertainty in high-dimensional settings by introducing a branch and bound algorithm that searches a large, discrete space of regularized, linear-in-parameters models to compute tight bounds on the adjusted slope coefficient. Leveraging KOA confounding-interval bounds, the method yields valid lower and upper bounds across all candidate models, effectively summarizing how the coefficient could vary without exhaustively checking models. The authors prove a supporting mathematical result and demonstrate practical utility through four data-analytic trials and a concrete NHANES-based example, showing that the bounds can be surprisingly tight and computationally feasible where brute-force search is impractical. The approach provides a data-summarization tool for researchers to assess coefficient stability under model uncertainty, with potential extensions to more complex models and diagnostic enhancements. This has practical impact for interpreting relationships in large observational datasets where model selection uncertainty is high and traditional standard errors may be inadequate.

Abstract

It can be difficult to interpret a coefficient of an uncertain model. A slope coefficient of a regression model may change as covariates are added or removed from the model. In the context of high-dimensional data, there are too many model extensions to check. However, as we show here, it is possible to efficiently search, with a branch and bound algorithm, for maximum and minimum values of that adjusted slope coefficient over a discrete space of regularized regression models. Here we introduce our algorithm, along with supporting mathematical results, an example application, and a link to our computer code, to help researchers summarize high-dimensional data and assess the stability of regression coefficients in uncertain models.
Paper Structure (10 sections, 1 theorem, 11 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 10 sections, 1 theorem, 11 equations, 1 figure, 3 tables, 1 algorithm.

Key Result

Proposition 3.1

With the definitions of Section methods we have the following identities:

Figures (1)

  • Figure 1: A scatterplot of SD and BMI.

Theorems & Definitions (2)

  • Proposition 3.1
  • proof