Table of Contents
Fetching ...

Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings

Berkant Turan, Suhrab Asadulla, David Steinmann, Kristian Kersting, Wolfgang Stammer, Sebastian Pokutta

TL;DR

The paper tackles the challenge of scaling verifiable Prover-Verifier Games (PVGs) to high-dimensional inputs like images by introducing Neural Concept Verifier (NCV), which performs PVG-style verification over interpretable concept encodings rather than raw pixels. NCV combines a weakly supervised concept extractor with a Merlin-Arthur-style nonlinear verifier, using two competing provers to select sparse concept subsets and enforce verifiable, robust decisions with a rejection option. Empirically, NCV scales to datasets from synthetic CLEVR variants to large-scale real-world benchmarks, achieving high completeness and near-perfect soundness, narrowing the interpretability–accuracy gap of Concept Bottleneck Models, and producing human-understandable concept-based explanations. The results indicate NCV as a promising path toward concept-level verifiability and robust, trustworthy AI in complex visual domains, while acknowledging dependencies on concept extractors and training costs as areas for future work.

Abstract

While Prover-Verifier Games (PVGs) offer a promising path toward verifiability in nonlinear classification models, they have not yet been applied to complex inputs such as high-dimensional images. Conversely, expressive concept encodings effectively allow to translate such data into interpretable concepts but are often utilised in the context of low-capacity linear predictors. In this work, we push towards real-world verifiability by combining the strengths of both approaches. We introduce Neural Concept Verifier (NCV), a unified framework combining PVGs for formal verifiability with concept encodings to handle complex, high-dimensional inputs in an interpretable way. NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs. A prover then selects a subset of these encodings, which a verifier, implemented as a nonlinear predictor, uses exclusively for decision-making. Our evaluations show that NCV outperforms classic concept-based models and pixel-based PVG classifier baselines on high-dimensional, logically complex datasets and helps mitigate shortcut behavior. Overall, we demonstrate NCV as a promising step toward concept-level, verifiable AI.

Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings

TL;DR

The paper tackles the challenge of scaling verifiable Prover-Verifier Games (PVGs) to high-dimensional inputs like images by introducing Neural Concept Verifier (NCV), which performs PVG-style verification over interpretable concept encodings rather than raw pixels. NCV combines a weakly supervised concept extractor with a Merlin-Arthur-style nonlinear verifier, using two competing provers to select sparse concept subsets and enforce verifiable, robust decisions with a rejection option. Empirically, NCV scales to datasets from synthetic CLEVR variants to large-scale real-world benchmarks, achieving high completeness and near-perfect soundness, narrowing the interpretability–accuracy gap of Concept Bottleneck Models, and producing human-understandable concept-based explanations. The results indicate NCV as a promising path toward concept-level verifiability and robust, trustworthy AI in complex visual domains, while acknowledging dependencies on concept extractors and training costs as areas for future work.

Abstract

While Prover-Verifier Games (PVGs) offer a promising path toward verifiability in nonlinear classification models, they have not yet been applied to complex inputs such as high-dimensional images. Conversely, expressive concept encodings effectively allow to translate such data into interpretable concepts but are often utilised in the context of low-capacity linear predictors. In this work, we push towards real-world verifiability by combining the strengths of both approaches. We introduce Neural Concept Verifier (NCV), a unified framework combining PVGs for formal verifiability with concept encodings to handle complex, high-dimensional inputs in an interpretable way. NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs. A prover then selects a subset of these encodings, which a verifier, implemented as a nonlinear predictor, uses exclusively for decision-making. Our evaluations show that NCV outperforms classic concept-based models and pixel-based PVG classifier baselines on high-dimensional, logically complex datasets and helps mitigate shortcut behavior. Overall, we demonstrate NCV as a promising step toward concept-level, verifiable AI.

Paper Structure

This paper contains 52 sections, 17 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Challenges of Prover-Verifier Games (PVGs) in image classification: (i) It is non-trivial to scale up for high-dimensional data. (ii) Furthermore, the learned explanation masks on the pixel level remain difficult for humans to understand.
  • Figure 2: Overview of the Neural Concept Verifier (NCV). The input image is first processed by a concept extractor to produce symbolic concept encodings. A prover–verifier game is then played over these encodings: a cooperative prover selects a sparse concept subset supporting the true class, while an adversarial prover selects misleading concepts. Finally, the nonlinear verifier makes a prediction based only on these selected concepts, ensuring verifiable and robust classification.
  • Figure 3: Comparison of explanations from NCV vs. Pixel-MAC. (top) Merlin–Arthur training on pixel space yields uninformative masks. (bottom) MAC on a concept-space via NCV translates into combinations of high-level concepts and, in turn, in an interpretable prediction.
  • Figure 4: Comparison of explanations from NCV vs. Pixel-MAC for CLEVR-Hans3 images of all three classes. (a) Merlin–Arthur training on pixel space yields uninformative masks. (b) NCV provides clear explanations by highlighting object features corresponding to the class rule. The single-object images are reconstructions from the respective slots selected by Merlin (prover).
  • Figure 5: Effect of the weighting parameter $\gamma$ on completeness and soundness for (a) CIFAR-100, (b) ImageNet-1k validation, and (c) COCOLogic validation (balanced metrics). All curves show means over 3 random seeds.