The Liouville Generator for Producing Integrable Expressions
Rashid Barket, Matthew England, Jürgen Gerhard
TL;DR
The Liouville Generator (LIOUVILLE) addresses the need for large, diverse, and reliably integrable expressions to benchmark computer algebra systems and train ML models for symbolic integration. It builds on Liouville's theorem and the Parallel Risch Algorithm to construct integrands with guaranteed integrability, while incorporating design choices that balance the length and complexity of integrands and integrals. The method blends advantages of prior approaches (e.g., BWD, RISCH) and introduces normalization and parallel handling to enhance realism, reduce redundancy, and accommodate special function extensions. The approach yields a scalable, theory-grounded data generator, with public code and data, poised to aid CAS benchmarking and machine-learning-driven symbolic integration tasks.
Abstract
There has been a growing need to devise processes that can create comprehensive datasets in the world of Computer Algebra, both for accurate benchmarking and for new intersections with machine learning technology. We present here a method to generate integrands that are guaranteed to be integrable, dubbed the LIOUVILLE method. It is based on Liouville's theorem and the Parallel Risch Algorithm for symbolic integration. We show that this data generation method retains the best qualities of previous data generation methods, while overcoming some of the issues built into that prior work. The LIOUVILLE generator is able to generate sufficiently complex and realistic integrands, and could be used for benchmarking or machine learning training tasks related to symbolic integration.
