Table of Contents
Fetching ...

Interpretation of High-Dimensional Regression Coefficients by Comparison with Linearized Compressing Features

Joachim Schaeffer, Jinwook Rhyu, Robin Droop, Rolf Findeisen, Richard Braatz

Abstract

Linear regression is often deemed inherently interpretable; however, challenges arise for high-dimensional data. We focus on further understanding how linear regression approximates nonlinear responses from high-dimensional functional data, motivated by predicting cycle life for lithium-ion batteries. We develop a linearization method to derive feature coefficients, which we compare with the closest regression coefficients of the path of regression solutions. We showcase the methods on battery data case studies where a single nonlinear compressing feature, $g\colon \mathbb{R}^p \to \mathbb{R}$, is used to construct a synthetic response, $\mathbf{y} \in \mathbb{R}$. This unifying view of linear regression and compressing features for high-dimensional functional data helps to understand (1) how regression coefficients are shaped in the highly regularized domain and how they relate to linearized feature coefficients and (2) how the shape of regression coefficients changes as a function of regularization to approximate nonlinear responses by exploiting local structures.

Interpretation of High-Dimensional Regression Coefficients by Comparison with Linearized Compressing Features

Abstract

Linear regression is often deemed inherently interpretable; however, challenges arise for high-dimensional data. We focus on further understanding how linear regression approximates nonlinear responses from high-dimensional functional data, motivated by predicting cycle life for lithium-ion batteries. We develop a linearization method to derive feature coefficients, which we compare with the closest regression coefficients of the path of regression solutions. We showcase the methods on battery data case studies where a single nonlinear compressing feature, , is used to construct a synthetic response, . This unifying view of linear regression and compressing features for high-dimensional functional data helps to understand (1) how regression coefficients are shaped in the highly regularized domain and how they relate to linearized feature coefficients and (2) how the shape of regression coefficients changes as a function of regularization to approximate nonlinear responses by exploiting local structures.

Paper Structure

This paper contains 11 sections, 13 equations, 4 figures.

Figures (4)

  • Figure 1: Lithium-ion data from discharge cycles severson2019data. a) Training, primary test, and secondary test data are plotted as curves. b) Mean-centered training data curves with one outlier removed.
  • Figure 2: First case study with the sum-of-squares response. a) and with high regularization and sum-of-squares feature coefficients. b) regression coefficients obtained by cross-validation and sum-of-squares feature coefficients. c) regression coefficients obtained by cross-validation and sum-of-squares feature coefficients.
  • Figure 3: Second case study with the sinusoidal response. a) high regularization and sinusoidal feature coefficients. b) high regularization and sinusoidal feature coefficients. c) and regression coefficients obtained by cross-validation and sinusoidal feature coefficients.
  • Figure A.1: Lithium-ion data from discharge cycles severson2019data. a) Training, primary test, and secondary test data plotted as curves. b) Mean-centered training data curves with one outlier removed, c) z-scored training data curves with one outlier removed, d) training data statistics, e) Pearson correlation coefficient training data columns, f) Pearson correlation coefficients of training data rows.