Table of Contents
Fetching ...

Provenance Networks: End-to-End Exemplar-Based Explainability

Ali Kayyam, Anusha Madan Gopal, M. Anthony Lewis

TL;DR

Provenance networks address the opacity of deep models by embedding end-to-end, exemplar-based explainability directly into the architecture, enabling predictions to be traced to concrete training exemplars. The approach blends neural representations with KNN-like retrieval through single-branch and two-branch designs (class-independent and class-conditional), and extends to a scalable two-stage variant for large datasets. It systematically analyzes memorization versus generalization, robustness to distortions, data provenance, and membership inference, and demonstrates practical gains via subset-based scaling and multi-task learning with a VAE head for generation. The work highlights interpretability, data integrity, and robustness as core benefits while outlining scalability challenges and future directions, including applicability to larger modalities and deployment in real-world AI systems.

Abstract

We introduce provenance networks, a novel class of neural models designed to provide end-to-end, training-data-driven explainability. Unlike conventional post-hoc methods, provenance networks learn to link each prediction directly to its supporting training examples as part of the model's normal operation, embedding interpretability into the architecture itself. Conceptually, the model operates similarly to a learned KNN, where each output is justified by concrete exemplars weighted by relevance in the feature space. This approach facilitates systematic investigations of the trade-off between memorization and generalization, enables verification of whether a given input was included in the training set, aids in the detection of mislabeled or anomalous data points, enhances resilience to input perturbations, and supports the identification of similar inputs contributing to the generation of a new data point. By jointly optimizing the primary task and the explainability objective, provenance networks offer insights into model behavior that traditional deep networks cannot provide. While the model introduces additional computational cost and currently scales to moderately sized datasets, it provides a complementary approach to existing explainability techniques. In particular, it addresses critical challenges in modern deep learning, including model opaqueness, hallucination, and the assignment of credit to data contributors, thereby improving transparency, robustness, and trustworthiness in neural models.

Provenance Networks: End-to-End Exemplar-Based Explainability

TL;DR

Provenance networks address the opacity of deep models by embedding end-to-end, exemplar-based explainability directly into the architecture, enabling predictions to be traced to concrete training exemplars. The approach blends neural representations with KNN-like retrieval through single-branch and two-branch designs (class-independent and class-conditional), and extends to a scalable two-stage variant for large datasets. It systematically analyzes memorization versus generalization, robustness to distortions, data provenance, and membership inference, and demonstrates practical gains via subset-based scaling and multi-task learning with a VAE head for generation. The work highlights interpretability, data integrity, and robustness as core benefits while outlining scalability challenges and future directions, including applicability to larger modalities and deployment in real-world AI systems.

Abstract

We introduce provenance networks, a novel class of neural models designed to provide end-to-end, training-data-driven explainability. Unlike conventional post-hoc methods, provenance networks learn to link each prediction directly to its supporting training examples as part of the model's normal operation, embedding interpretability into the architecture itself. Conceptually, the model operates similarly to a learned KNN, where each output is justified by concrete exemplars weighted by relevance in the feature space. This approach facilitates systematic investigations of the trade-off between memorization and generalization, enables verification of whether a given input was included in the training set, aids in the detection of mislabeled or anomalous data points, enhances resilience to input perturbations, and supports the identification of similar inputs contributing to the generation of a new data point. By jointly optimizing the primary task and the explainability objective, provenance networks offer insights into model behavior that traditional deep networks cannot provide. While the model introduces additional computational cost and currently scales to moderately sized datasets, it provides a complementary approach to existing explainability techniques. In particular, it addresses critical challenges in modern deep learning, including model opaqueness, hallucination, and the assignment of credit to data contributors, thereby improving transparency, robustness, and trustworthiness in neural models.

Paper Structure

This paper contains 60 sections, 19 equations, 28 figures, 7 tables.

Figures (28)

  • Figure 1: Provenance network schematic.
  • Figure 2: Trade-off between generalization and memorization in the single-branch network, with test samples overlaid alongside their two most similar training examples.
  • Figure 3: Top (left): t-SNE visualization of the penultimate layer in the index branch of a two-branch class-conditional network. Top (right): Misattributed test samples alongside their five nearest training samples in the index branch. Bottom: t-SNE visualization of k-means clusters from the same layer, with corresponding training samples for digits 6 (left) and FashionMNIST dresses (right).
  • Figure 4: Accuracy per epoch for the index branch and class branch (insets) of Small (left) and XLarge (right) models. Each curve represents a different level of parameter sharing (on MNIST). See also Appendix \ref{['appx:layers']}.
  • Figure 5: Comparison of the single-branch index-prediction network with varying levels of label mixing against an isolated CNN. Plots show Top-1 and Top-5 accuracy under 9 distortion types plus a baseline without distortion. The variation in the isolated CNN (blue curves) across different index-mixing levels arises from the use of different test sets at each level. Intermediate levels of memorization improve robustness: for distortions like occlusion and blur, partial label mixing (20–30%) yields higher accuracy than the isolated CNN. Performance over remaining 5 distortions is shown in Appx. \ref{['appx:robust']}.
  • ...and 23 more figures