LCEN: A Nonlinear, Interpretable Feature Selection and Machine Learning Algorithm
Pedro Seber, Richard D. Braatz
TL;DR
LCEN addresses the need for nonlinear, interpretable, and sparse feature selection. It integrates a LASSO-based expansion, two clip steps, and elastic-net fitting to produce sparse, accurate models, capable of rediscovering physical laws from data. Across artificial and real datasets, LCEN demonstrates robustness to noise, multicollinearity, and data scarcity, often matching or surpassing dense nonlinear methods while maintaining interpretability and faster runtimes than comparable thresholded EN approaches. The approach shows practical value for critical domains and offers clear avenues for extension to classification and physics-guided modeling.
Abstract
Interpretable models can have advantages over black-box models, and interpretability is essential for the application of machine learning in critical settings, such as aviation or medicine. This article introduces the LASSO-Clip-EN (LCEN) algorithm for nonlinear, interpretable feature selection and machine learning modeling. In a wide variety of artificial and empirical datasets, LCEN constructed sparse and frequently more accurate models than other methods, including sparse, nonlinear methods, on tested datasets. LCEN was empirically observed to be robust against many issues typically present in datasets and modeling, including noise, multicollinearity, and data scarcity. As a feature selection algorithm, LCEN matched or surpassed the thresholded elastic net but was, on average, 10.3-fold faster based on our experiments. LCEN for feature selection can also rediscover multiple physical laws from empirical data. As a machine learning algorithm, when tested on processes with no known physical laws, LCEN achieved better results than many other dense and sparse methods -- including being comparable to or better than ANNs on multiple datasets.
