Table of Contents
Fetching ...

TorchSISSO: A PyTorch-Based Implementation of the Sure Independence Screening and Sparsifying Operator for Efficient and Interpretable Model Discovery

Madhav Muthyala, Farshud Sorourifar, Joel A. Paulson

TL;DR

TorchSISSO delivers a Python, GPU-accelerated implementation of the SISSO symbolic regression framework, addressing installation barriers and speed limitations of the original FORTRAN code. By combining recursive feature expansion, mutual information screening, and a sparsifying operator, it achieves strong predictive accuracy with interpretable, sparse expressions across synthetic benchmarks, physics-based equations, and molecular descriptors. The approach demonstrates robust performance and substantial speedups, particularly on GPUs and in high-dimensional settings, enabling broader application in materials science, physics, and chemistry. The work thus provides a practical, scalable tool to discover physically meaningful equations from data, with potential extensions to multi-objective optimization and automated hyperparameter tuning.

Abstract

Symbolic regression (SR) is a powerful machine learning approach that searches for both the structure and parameters of algebraic models, offering interpretable and compact representations of complex data. Unlike traditional regression methods, SR explores progressively complex feature spaces, which can uncover simple models that generalize well, even from small datasets. Among SR algorithms, the Sure Independence Screening and Sparsifying Operator (SISSO) has proven particularly effective in the natural sciences, helping to rediscover fundamental physical laws as well as discover new interpretable equations for materials property modeling. However, its widespread adoption has been limited by performance inefficiencies and the challenges posed by its FORTRAN-based implementation, especially in modern computing environments. In this work, we introduce TorchSISSO, a native Python implementation built in the PyTorch framework. TorchSISSO leverages GPU acceleration, easy integration, and extensibility, offering a significant speed-up and improved accuracy over the original. We demonstrate that TorchSISSO matches or exceeds the performance of the original SISSO across a range of tasks, while dramatically reducing computational time and improving accessibility for broader scientific applications.

TorchSISSO: A PyTorch-Based Implementation of the Sure Independence Screening and Sparsifying Operator for Efficient and Interpretable Model Discovery

TL;DR

TorchSISSO delivers a Python, GPU-accelerated implementation of the SISSO symbolic regression framework, addressing installation barriers and speed limitations of the original FORTRAN code. By combining recursive feature expansion, mutual information screening, and a sparsifying operator, it achieves strong predictive accuracy with interpretable, sparse expressions across synthetic benchmarks, physics-based equations, and molecular descriptors. The approach demonstrates robust performance and substantial speedups, particularly on GPUs and in high-dimensional settings, enabling broader application in materials science, physics, and chemistry. The work thus provides a practical, scalable tool to discover physically meaningful equations from data, with potential extensions to multi-objective optimization and automated hyperparameter tuning.

Abstract

Symbolic regression (SR) is a powerful machine learning approach that searches for both the structure and parameters of algebraic models, offering interpretable and compact representations of complex data. Unlike traditional regression methods, SR explores progressively complex feature spaces, which can uncover simple models that generalize well, even from small datasets. Among SR algorithms, the Sure Independence Screening and Sparsifying Operator (SISSO) has proven particularly effective in the natural sciences, helping to rediscover fundamental physical laws as well as discover new interpretable equations for materials property modeling. However, its widespread adoption has been limited by performance inefficiencies and the challenges posed by its FORTRAN-based implementation, especially in modern computing environments. In this work, we introduce TorchSISSO, a native Python implementation built in the PyTorch framework. TorchSISSO leverages GPU acceleration, easy integration, and extensibility, offering a significant speed-up and improved accuracy over the original. We demonstrate that TorchSISSO matches or exceeds the performance of the original SISSO across a range of tasks, while dramatically reducing computational time and improving accessibility for broader scientific applications.
Paper Structure (20 sections, 17 equations, 4 figures, 5 tables)

This paper contains 20 sections, 17 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of the major steps in the SISSO method from SISSO.
  • Figure 2: Results for TorchSISSO and VS-SISSO on training (top) and testing (bottom) datasets for modeling specific energy of organic compounds.
  • Figure 3: (Left) Parity plot showing training and validation results for a SISSO model trained on data collected over a limited temperature range, with validation data spanning temperatures both near and far outside this range. (Right) Parity plot showing training and validation results for a SISSO model trained on data spanning the full temperature range, with an 80/20 train/validation split. This comparison illustrates the impact of training data distribution on model generalizability across a broader temperature spectrum.
  • Figure 4: Computational time versus parameter $k$ for running TorchSISSO and FORTRAN-SISSO on different hardware. Note that $k$ controls how many models must be fit according to \ref{['eq:sis-method']}--\ref{['eq:subspace-expansion']}.