Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings
Berkant Turan, Suhrab Asadulla, David Steinmann, Kristian Kersting, Wolfgang Stammer, Sebastian Pokutta
TL;DR
The paper tackles the challenge of scaling verifiable Prover-Verifier Games (PVGs) to high-dimensional inputs like images by introducing Neural Concept Verifier (NCV), which performs PVG-style verification over interpretable concept encodings rather than raw pixels. NCV combines a weakly supervised concept extractor with a Merlin-Arthur-style nonlinear verifier, using two competing provers to select sparse concept subsets and enforce verifiable, robust decisions with a rejection option. Empirically, NCV scales to datasets from synthetic CLEVR variants to large-scale real-world benchmarks, achieving high completeness and near-perfect soundness, narrowing the interpretability–accuracy gap of Concept Bottleneck Models, and producing human-understandable concept-based explanations. The results indicate NCV as a promising path toward concept-level verifiability and robust, trustworthy AI in complex visual domains, while acknowledging dependencies on concept extractors and training costs as areas for future work.
Abstract
While Prover-Verifier Games (PVGs) offer a promising path toward verifiability in nonlinear classification models, they have not yet been applied to complex inputs such as high-dimensional images. Conversely, expressive concept encodings effectively allow to translate such data into interpretable concepts but are often utilised in the context of low-capacity linear predictors. In this work, we push towards real-world verifiability by combining the strengths of both approaches. We introduce Neural Concept Verifier (NCV), a unified framework combining PVGs for formal verifiability with concept encodings to handle complex, high-dimensional inputs in an interpretable way. NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs. A prover then selects a subset of these encodings, which a verifier, implemented as a nonlinear predictor, uses exclusively for decision-making. Our evaluations show that NCV outperforms classic concept-based models and pixel-based PVG classifier baselines on high-dimensional, logically complex datasets and helps mitigate shortcut behavior. Overall, we demonstrate NCV as a promising step toward concept-level, verifiable AI.
