Provenance Networks: End-to-End Exemplar-Based Explainability
Ali Kayyam, Anusha Madan Gopal, M. Anthony Lewis
TL;DR
Provenance networks address the opacity of deep models by embedding end-to-end, exemplar-based explainability directly into the architecture, enabling predictions to be traced to concrete training exemplars. The approach blends neural representations with KNN-like retrieval through single-branch and two-branch designs (class-independent and class-conditional), and extends to a scalable two-stage variant for large datasets. It systematically analyzes memorization versus generalization, robustness to distortions, data provenance, and membership inference, and demonstrates practical gains via subset-based scaling and multi-task learning with a VAE head for generation. The work highlights interpretability, data integrity, and robustness as core benefits while outlining scalability challenges and future directions, including applicability to larger modalities and deployment in real-world AI systems.
Abstract
We introduce provenance networks, a novel class of neural models designed to provide end-to-end, training-data-driven explainability. Unlike conventional post-hoc methods, provenance networks learn to link each prediction directly to its supporting training examples as part of the model's normal operation, embedding interpretability into the architecture itself. Conceptually, the model operates similarly to a learned KNN, where each output is justified by concrete exemplars weighted by relevance in the feature space. This approach facilitates systematic investigations of the trade-off between memorization and generalization, enables verification of whether a given input was included in the training set, aids in the detection of mislabeled or anomalous data points, enhances resilience to input perturbations, and supports the identification of similar inputs contributing to the generation of a new data point. By jointly optimizing the primary task and the explainability objective, provenance networks offer insights into model behavior that traditional deep networks cannot provide. While the model introduces additional computational cost and currently scales to moderately sized datasets, it provides a complementary approach to existing explainability techniques. In particular, it addresses critical challenges in modern deep learning, including model opaqueness, hallucination, and the assignment of credit to data contributors, thereby improving transparency, robustness, and trustworthiness in neural models.
