A GPU-accelerated Nonlinear Branch-and-Bound Framework for Sparse Linear Models
Xiang Meng, Ryan Lucas, Rahul Mazumder
TL;DR
This work tackles exact sparse linear regression with an $\ell_0-\ell_2$ penalty by formulating it as a MIP with a perspective Big-M representation and solving it via a GPU-accelerated branch-and-bound framework. The system combines an ADMM-based node relaxation that allows fully parallel coordinate updates with a batched proximal-gradient upper-bound solver, enabling parallel exploration of multiple BnB nodes on a single GPU. By batching node solves and carefully managing memory, the approach achieves substantial runtime reductions relative to CPU-based methods and commercial solvers, including MOSEK, on both synthetic and real high-dimensional datasets. The results demonstrate robustness to problem scale and parameter settings, illustrating that hardware-aware algorithm design yields practical improvements for certifying global optima in sparse regression problems.
Abstract
We study exact sparse linear regression with an $\ell_0-\ell_2$ penalty and develop a branch-and-bound (BnB) algorithm explicitly designed for GPU execution. Starting from a perspective reformulation, we derive an interval relaxation that can be solved by ADMM with closed-form, coordinate-wise updates. We structure these updates so that the main work at each BnB node reduces to batched matrix-vector operations with a shared data matrix, enabling fine-grained parallelism across coordinates and coarse-grained parallelism across many BnB nodes on a single GPU. Feasible solutions (upper bounds) are generated by a projected gradient method on the active support, implemented in a batched fashion so that many candidate supports are updated in parallel on the GPU. We discuss practical design choices such as memory layout, batching strategies, and load balancing across nodes that are crucial for obtaining good utilization on modern GPUs. On synthetic and real high-dimensional datasets, our GPU-based approach achieves clear runtime improvements over a CPU implementation of our method, an existing specialized BnB method, and commercial MIP solvers.
