Table of Contents
Fetching ...

Does a Neural Network Really Encode Symbolic Concepts?

Mingjie Li, Quanshi Zhang

TL;DR

This paper examines the trustworthiness of interaction concepts from four perspectives and verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition.

Abstract

Recently, a series of studies have tried to extract interactions between input variables modeled by a DNN and define such interactions as concepts encoded by the DNN. However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives. Extensive empirical studies have verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition.

Does a Neural Network Really Encode Symbolic Concepts?

TL;DR

This paper examines the trustworthiness of interaction concepts from four perspectives and verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition.

Abstract

Recently, a series of studies have tried to extract interactions between input variables modeled by a DNN and define such interactions as concepts encoded by the DNN. However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives. Extensive empirical studies have verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition.
Paper Structure (26 sections, 3 equations, 13 figures, 6 tables)

This paper contains 26 sections, 3 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Visualization of interaction concepts $S$ extracted by PointNet on different samples in the ShapeNet dataset. The histograms show the distribution of interaction effects $I(S|\boldsymbol{x})$ over samples in the "motorbike" category, where $S$ is extracted as a salient concept.
  • Figure 2: Visualization of interaction concepts $S$ extracted by two MLP-5 networksfn:exp-setting, which are trained on (a) the wifi datasetfn:exp-setting and (b) the tic-tac-toe datasetfn:exp-setting. The histograms show (a) the distribution of interaction effects $I(S|\boldsymbol{x})$ over samples in the $4^{\text{th}}$ category, and (b) the distribution of interaction effects $I(S|\boldsymbol{x})$ over samples in sub-categoriesfn:specific-category with patterns $x_4\!=\!x_5\!=\!x_6\!=\!1$ and $x_3\!=\!x_6\!=\!x_9\!=\!1$.
  • Figure 3: Normalized strength of interaction effects of different concepts in a descending order. DNNs trained for different tasks all encode sparse salient concepts.
  • Figure 4: The change of the average explanation ratio $\rho(k)$ along with the size $k$ of the concept dictionary $\mathbf{D}_k$.
  • Figure 5: The average discrimination power of concepts in different frequency intervals, i.e.$\alpha\in(0.0,0.2],(0.2,0.4],...,(0.8,1.0]$. The weighted average discrimination power $\bar{\beta}$ over concepts of all frequencies is shown beside the curve.
  • ...and 8 more figures