LIT-LVM: Structured Regularization for Interaction Terms in Linear Predictors using Latent Variable Models
Mohammadreza Nemati, Zhipeng Huang, Kevin S. Xu
TL;DR
LIT-LVM addresses the challenge of estimating interaction-term coefficients in linear predictors when the interaction matrix $\mathbf{\Theta}$ is large and potentially noisy. By imposing an approximate low-dimensional structure through latent representations $\mathbf{Z}$ and flexible models (low rank or latent distance), it jointly estimates model parameters and latent positions via a total loss $\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{pred}} + \lambda_r \mathcal{L}_{\text{reg}} + \lambda_l \mathcal{L}_{\text{lvm}}$, with $\mathcal{L}_{\text{lvm}} = \|\boldsymbol{\epsilon}\|_F^2$ and updates performed by Adam plus proximal steps. The approach substantially improves predictive accuracy over elastic net with interactions, hierarchical lasso, and factorization machines across simulations, OpenML benchmarks, and a kidney transplant survival analysis, while yielding interpretable latent representations of features that capture donor-recipient compatibility. The method balances flexibility with structure, allowing exact low-rank methods to be approached as $\lambda_l\rightarrow\infty$ and providing robustness when the low-rank assumption is only approximate. These results suggest significant practical impact for high-$p$ problems where interaction terms are informative but difficult to estimate reliably, particularly in biomedical and recommender-type settings where latent structure among features can be exploited for both accuracy and interpretability.
Abstract
Some of the simplest, yet most frequently used predictors in statistics and machine learning use weighted linear combinations of features. Such linear predictors can model non-linear relationships between features by adding interaction terms corresponding to the products of all pairs of features. We consider the problem of accurately estimating coefficients for interaction terms in linear predictors. We hypothesize that the coefficients for different interaction terms have an approximate low-dimensional structure and represent each feature by a latent vector in a low-dimensional space. This low-dimensional representation can be viewed as a structured regularization approach that further mitigates overfitting in high-dimensional settings beyond standard regularizers such as the lasso and elastic net. We demonstrate that our approach, called LIT-LVM, achieves superior prediction accuracy compared to the elastic net, hierarchical lasso, and factorization machines on a wide variety of simulated and real data, particularly when the number of interaction terms is high compared to the number of samples. LIT-LVM also provides low-dimensional latent representations for features that are useful for visualizing and analyzing their relationships.
