Does a Neural Network Really Encode Symbolic Concepts?

Mingjie Li; Quanshi Zhang

Does a Neural Network Really Encode Symbolic Concepts?

Mingjie Li, Quanshi Zhang

TL;DR

This paper examines the trustworthiness of interaction concepts from four perspectives and verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition.

Abstract

Recently, a series of studies have tried to extract interactions between input variables modeled by a DNN and define such interactions as concepts encoded by the DNN. However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives. Extensive empirical studies have verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition.

Does a Neural Network Really Encode Symbolic Concepts?

TL;DR

Abstract

Paper Structure (26 sections, 3 equations, 13 figures, 6 tables)

This paper contains 26 sections, 3 equations, 13 figures, 6 tables.

Introduction
Related works
Understanding black-box representation of DNNs
Explainable AI (XAI) theories based on game-theoretic interactions
Emergence of transferable concepts
Preliminaries: representing network inferences using interaction concepts
Visualization of interaction concepts
Does a DNN really learn symbolic concepts?
Sparsity of the encoded concepts
Transferability over different samples
Transferability across different DNNs
Discrimination power of concepts
When DNNs do not learn transferable concepts
Conclusion, discussions, and future challenges
Axioms and theorems of the Harsanyi dividend
...and 11 more sections

Figures (13)

Figure 1: Visualization of interaction concepts $S$ extracted by PointNet on different samples in the ShapeNet dataset. The histograms show the distribution of interaction effects $I(S|\boldsymbol{x})$ over samples in the "motorbike" category, where $S$ is extracted as a salient concept.
Figure 2: Visualization of interaction concepts $S$ extracted by two MLP-5 networksfn:exp-setting, which are trained on (a) the wifi datasetfn:exp-setting and (b) the tic-tac-toe datasetfn:exp-setting. The histograms show (a) the distribution of interaction effects $I(S|\boldsymbol{x})$ over samples in the $4^{\text{th}}$ category, and (b) the distribution of interaction effects $I(S|\boldsymbol{x})$ over samples in sub-categoriesfn:specific-category with patterns $x_4\!=\!x_5\!=\!x_6\!=\!1$ and $x_3\!=\!x_6\!=\!x_9\!=\!1$.
Figure 3: Normalized strength of interaction effects of different concepts in a descending order. DNNs trained for different tasks all encode sparse salient concepts.
Figure 4: The change of the average explanation ratio $\rho(k)$ along with the size $k$ of the concept dictionary $\mathbf{D}_k$.
Figure 5: The average discrimination power of concepts in different frequency intervals, i.e.$\alpha\in(0.0,0.2],(0.2,0.4],...,(0.8,1.0]$. The weighted average discrimination power $\bar{\beta}$ over concepts of all frequencies is shown beside the curve.
...and 8 more figures

Does a Neural Network Really Encode Symbolic Concepts?

TL;DR

Abstract

Does a Neural Network Really Encode Symbolic Concepts?

Authors

TL;DR

Abstract

Table of Contents

Figures (13)