Table of Contents
Fetching ...

The 6th International Verification of Neural Networks Competition (VNN-COMP 2025): Summary and Results

Konstantin Kaulen, Tobias Ladner, Stanley Bak, Christopher Brix, Hai Duong, Thomas Flinkow, Taylor T. Johnson, Lukas Koller, Edoardo Manino, ThanhVu H Nguyen, Haoze Wu

TL;DR

VNN-COMP 2025 advances fair comparison of neural network verifiers through standardized ONNX/VNN-LIB formats and AWS-based automatic evaluation. The competition emphasizes GPU-accelerated bound propagation with BaB-based search, highlighted by αβ-CROWN’s leading performance, and features a broad benchmark suite spanning vision, NLP, robotics, and safety domains. Results show strong gains from standardized pipelines and automated artifact handling, while issues around cross-hardware counterexample validity motivate future work on robust verification and soundness proofs. The event strengthens community collaboration and provides a platform for competition-driven progress toward scalable, certified NN verification in safety-critical settings.

Abstract

This report summarizes the 6th International Verification of Neural Networks Competition (VNN-COMP 2025), held as a part of the 8th International Symposium on AI Verification (SAIV), that was collocated with the 37th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2025 iteration, 8 teams participated on a diverse set of 16 regular and 9 extended benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.

The 6th International Verification of Neural Networks Competition (VNN-COMP 2025): Summary and Results

TL;DR

VNN-COMP 2025 advances fair comparison of neural network verifiers through standardized ONNX/VNN-LIB formats and AWS-based automatic evaluation. The competition emphasizes GPU-accelerated bound propagation with BaB-based search, highlighted by αβ-CROWN’s leading performance, and features a broad benchmark suite spanning vision, NLP, robotics, and safety domains. Results show strong gains from standardized pipelines and automated artifact handling, while issues around cross-hardware counterexample validity motivate future work on robust verification and soundness proofs. The event strengthens community collaboration and provides a platform for competition-driven progress toward scalable, certified NN verification in safety-critical settings.

Abstract

This report summarizes the 6th International Verification of Neural Networks Competition (VNN-COMP 2025), held as a part of the 8th International Symposium on AI Verification (SAIV), that was collocated with the 37th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2025 iteration, 8 teams participated on a diverse set of 16 regular and 9 extended benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.

Paper Structure

This paper contains 72 sections, 58 figures, 68 tables.

Figures (58)

  • Figure 1: Accuracy Efficient Architecture for GTSRB and Belgium dataset
  • Figure 2: Accuracy Efficient Architecture for Chinese dataset
  • Figure 3: XNOR(QConv) architecture
  • Figure 4: Generic approach to generating the NLP verification pipelines casadio2023antoniocasadio2024nlp deployed to obtain the safeNLP benchmark.
  • Figure 5: Cactus Plot for All Instances.
  • ...and 53 more figures