Table of Contents
Fetching ...

The CAISAR Platform: Extending the Reach of Machine Learning Specification and Verification

Michele Alberti, François Bobot, Julien Girard-Satabin, Alban Grastien, Aymeric Varasse, Zakaria Chihani

TL;DR

The paper introduces CAISAR, an open-source platform that extends formal ML specification and verification by moving beyond local robustness through a Why3-based pipeline and a Neural Intermediate Representation (NIR) that can express properties across multiple models. It presents a high-level CAISAR specification language and an automated graph-editing approach to translate these specs into inputs for off-the-shelf provers, including VNN-LIB-compatible backends, enabling cross-verification among diverse tools. The authors demonstrate use cases like ACAS Xu with unnormalized inputs and multi-network compositions to illustrate expressiveness and practical utility, while also discussing limitations and tool-dependent performance. Overall, CAISAR aims to bridge the gap between high-level ML specifications and diverse provers, facilitating principled verification of richer semantic properties and supporting industrial adoption with reproducible artifacts and guidance for future enhancements.

Abstract

The formal specification and verification of machine learning programs saw remarkable progress in less than a decade, leading to a profusion of tools. However, diversity may lead to fragmentation, resulting in tools that are difficult to compare, except for very specific benchmarks. Furthermore, this progress is heavily geared towards the specification and verification of a certain class of property, that is, local robustness properties. But while provers are becoming more and more efficient at solving local robustness properties, even slightly more complex properties, involving multiple neural networks for example, cannot be expressed in the input languages of winners of the International Competition of Verification of Neural Networks VNN-Comp. In this tool paper, we present CAISAR, an open-source platform dedicated to machine learning specification and verification. We present its specification language, suitable for modelling complex properties on neural networks, support vector machines and boosted trees. We show on concrete use-cases how specifications written in this language are automatically translated to queries to state-of-the-art provers, notably by using automated graph editing techniques, making it possible to use their off-the-shelf versions. The artifact to reproduce the paper claims is available at the following DOI: https://doi.org/10.5281/zenodo.15209510

The CAISAR Platform: Extending the Reach of Machine Learning Specification and Verification

TL;DR

The paper introduces CAISAR, an open-source platform that extends formal ML specification and verification by moving beyond local robustness through a Why3-based pipeline and a Neural Intermediate Representation (NIR) that can express properties across multiple models. It presents a high-level CAISAR specification language and an automated graph-editing approach to translate these specs into inputs for off-the-shelf provers, including VNN-LIB-compatible backends, enabling cross-verification among diverse tools. The authors demonstrate use cases like ACAS Xu with unnormalized inputs and multi-network compositions to illustrate expressiveness and practical utility, while also discussing limitations and tool-dependent performance. Overall, CAISAR aims to bridge the gap between high-level ML specifications and diverse provers, facilitating principled verification of richer semantic properties and supporting industrial adoption with reproducible artifacts and guidance for future enhancements.

Abstract

The formal specification and verification of machine learning programs saw remarkable progress in less than a decade, leading to a profusion of tools. However, diversity may lead to fragmentation, resulting in tools that are difficult to compare, except for very specific benchmarks. Furthermore, this progress is heavily geared towards the specification and verification of a certain class of property, that is, local robustness properties. But while provers are becoming more and more efficient at solving local robustness properties, even slightly more complex properties, involving multiple neural networks for example, cannot be expressed in the input languages of winners of the International Competition of Verification of Neural Networks VNN-Comp. In this tool paper, we present CAISAR, an open-source platform dedicated to machine learning specification and verification. We present its specification language, suitable for modelling complex properties on neural networks, support vector machines and boosted trees. We show on concrete use-cases how specifications written in this language are automatically translated to queries to state-of-the-art provers, notably by using automated graph editing techniques, making it possible to use their off-the-shelf versions. The artifact to reproduce the paper claims is available at the following DOI: https://doi.org/10.5281/zenodo.15209510

Paper Structure

This paper contains 25 sections, 8 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: The overall functional architecture of CAISAR. Significant extensions to the original Why3 platform are outlined in shaded rectangles.
  • Figure 2: Grammar of CAISAR's high-level specification language. The nonterminal $\langle \mathit{built\textnormal{-}in} \rangle$ denotes WhyML functions having a special interpretation in CAISAR.
  • Figure 3: CAISAR specification about the output prediction of a composition of NNs. The specification begins by declaring a verification goal named sequencing. Lines 2-4 introduce two NNs and the dataset by loading their ONNX model and CSV files, followed by line 5 which sets the perturbation $\epsilon$ of the first input to $0.01$. Line 6 ensures the specification will be checked against all samples in the dataset. Lines 7-11 define a perturbed vector that has the same length of the original vector and have the same bounded values. Lines 12-13 describe the computation: the input of the first NN is perturbed, then the computation is performed, and the composition by the second NN takes place. Finally, line 14-15 assert that the prediction for the given output label is always the preferred one.
  • Figure 4: CAISAR specification about the output equality up-to $\delta$ between two NNs. The specification begins by declaring a verification goal named equality_up_to_delta. Lines 2-3 introduce two NNs by loading their ONNX model files, followed by line 4 which sets the tolerance threshold $\delta$ to $0.125$. Lines 5-6 constraint the input vector to be of length 5 with elements in $[ 0.0, 1.0 ]$. Lines 7-8 then apply both NNs to this input using the @@ operator for model application. Finally, line 9 asserts that the absolute difference between the first output components of both networks is bounded by $\delta$.
  • Figure 5: Example of an ONNX graph, obtained using the netron tool. Gemm stands for GEneralized Matrix Multiplication, while Relu is the Rectified Linear Unit function.
  • ...and 4 more figures