Table of Contents
Fetching ...

The Liouville Generator for Producing Integrable Expressions

Rashid Barket, Matthew England, Jürgen Gerhard

TL;DR

The Liouville Generator (LIOUVILLE) addresses the need for large, diverse, and reliably integrable expressions to benchmark computer algebra systems and train ML models for symbolic integration. It builds on Liouville's theorem and the Parallel Risch Algorithm to construct integrands with guaranteed integrability, while incorporating design choices that balance the length and complexity of integrands and integrals. The method blends advantages of prior approaches (e.g., BWD, RISCH) and introduces normalization and parallel handling to enhance realism, reduce redundancy, and accommodate special function extensions. The approach yields a scalable, theory-grounded data generator, with public code and data, poised to aid CAS benchmarking and machine-learning-driven symbolic integration tasks.

Abstract

There has been a growing need to devise processes that can create comprehensive datasets in the world of Computer Algebra, both for accurate benchmarking and for new intersections with machine learning technology. We present here a method to generate integrands that are guaranteed to be integrable, dubbed the LIOUVILLE method. It is based on Liouville's theorem and the Parallel Risch Algorithm for symbolic integration. We show that this data generation method retains the best qualities of previous data generation methods, while overcoming some of the issues built into that prior work. The LIOUVILLE generator is able to generate sufficiently complex and realistic integrands, and could be used for benchmarking or machine learning training tasks related to symbolic integration.

The Liouville Generator for Producing Integrable Expressions

TL;DR

The Liouville Generator (LIOUVILLE) addresses the need for large, diverse, and reliably integrable expressions to benchmark computer algebra systems and train ML models for symbolic integration. It builds on Liouville's theorem and the Parallel Risch Algorithm to construct integrands with guaranteed integrability, while incorporating design choices that balance the length and complexity of integrands and integrals. The method blends advantages of prior approaches (e.g., BWD, RISCH) and introduces normalization and parallel handling to enhance realism, reduce redundancy, and accommodate special function extensions. The approach yields a scalable, theory-grounded data generator, with public code and data, poised to aid CAS benchmarking and machine-learning-driven symbolic integration tasks.

Abstract

There has been a growing need to devise processes that can create comprehensive datasets in the world of Computer Algebra, both for accurate benchmarking and for new intersections with machine learning technology. We present here a method to generate integrands that are guaranteed to be integrable, dubbed the LIOUVILLE method. It is based on Liouville's theorem and the Parallel Risch Algorithm for symbolic integration. We show that this data generation method retains the best qualities of previous data generation methods, while overcoming some of the issues built into that prior work. The LIOUVILLE generator is able to generate sufficiently complex and realistic integrands, and could be used for benchmarking or machine learning training tasks related to symbolic integration.
Paper Structure (20 sections, 2 theorems, 8 equations, 2 figures, 2 algorithms)

This paper contains 20 sections, 2 theorems, 8 equations, 2 figures, 2 algorithms.

Key Result

theorem thmcountertheorem

If $u = g(x)$ is a differentiable function whose range is an interval $I$, and $f$ is continuous on $I$, then

Figures (2)

  • Figure 1: Comparing the size of the integrands vs. integrals for the FWD, BWD, and LIOUVILLE generators. The closer a point is to the dotted red line $y=x$, the more balanced a method is. Figure \ref{['fig:lengths_200']} is a zoomed-in portion of Figure \ref{['fig:lengths_600']}.
  • Figure 2: The effect of normalisation when applied to the integrand, the integral, and both. Normalising both produced more points above the dotted line $y=x$ compared to only normalising one of the integrand or integral.

Theorems & Definitions (2)

  • theorem thmcountertheorem: Substitution Rule
  • theorem thmcountertheorem: Liouville's theorem: