TorchSISSO: A PyTorch-Based Implementation of the Sure Independence Screening and Sparsifying Operator for Efficient and Interpretable Model Discovery
Madhav Muthyala, Farshud Sorourifar, Joel A. Paulson
TL;DR
TorchSISSO delivers a Python, GPU-accelerated implementation of the SISSO symbolic regression framework, addressing installation barriers and speed limitations of the original FORTRAN code. By combining recursive feature expansion, mutual information screening, and a sparsifying operator, it achieves strong predictive accuracy with interpretable, sparse expressions across synthetic benchmarks, physics-based equations, and molecular descriptors. The approach demonstrates robust performance and substantial speedups, particularly on GPUs and in high-dimensional settings, enabling broader application in materials science, physics, and chemistry. The work thus provides a practical, scalable tool to discover physically meaningful equations from data, with potential extensions to multi-objective optimization and automated hyperparameter tuning.
Abstract
Symbolic regression (SR) is a powerful machine learning approach that searches for both the structure and parameters of algebraic models, offering interpretable and compact representations of complex data. Unlike traditional regression methods, SR explores progressively complex feature spaces, which can uncover simple models that generalize well, even from small datasets. Among SR algorithms, the Sure Independence Screening and Sparsifying Operator (SISSO) has proven particularly effective in the natural sciences, helping to rediscover fundamental physical laws as well as discover new interpretable equations for materials property modeling. However, its widespread adoption has been limited by performance inefficiencies and the challenges posed by its FORTRAN-based implementation, especially in modern computing environments. In this work, we introduce TorchSISSO, a native Python implementation built in the PyTorch framework. TorchSISSO leverages GPU acceleration, easy integration, and extensibility, offering a significant speed-up and improved accuracy over the original. We demonstrate that TorchSISSO matches or exceeds the performance of the original SISSO across a range of tasks, while dramatically reducing computational time and improving accessibility for broader scientific applications.
