Table of Contents
Fetching ...

Error-margin Analysis for Hidden Neuron Activation Labels

Abhilekha Dalal, Rushrukh Rayan, Pascal Hitzler

TL;DR

The paper addresses how to interpret high-level concept representations in neural networks by introducing error-margin analysis to quantify precision alongside recall for neuron-concept labels. It defines Non-TLA and TLA, computes activation percentages across multiple thresholds, and uses Concept Induction to assign candidate labels, with validation on Google Images and ADE20K augmented by MTurk annotations. Statistical evaluation with Mann-Whitney U and Wilcoxon tests shows ADE20K exhibits lower Non-TLA and robust significance across thresholds (e.g., aggregated $p=5.63\times10^{-7}$), supporting the reliability and generalizability of the approach. The work enhances explainable AI by providing probabilistic, dataset-aware neuron-label mappings that reduce false positives in concept-based interpretations of hidden activations.

Abstract

Understanding how high-level concepts are represented within artificial neural networks is a fundamental challenge in the field of artificial intelligence. While existing literature in explainable AI emphasizes the importance of labeling neurons with concepts to understand their functioning, they mostly focus on identifying what stimulus activates a neuron in most cases, this corresponds to the notion of recall in information retrieval. We argue that this is only the first-part of a two-part job, it is imperative to also investigate neuron responses to other stimuli, i.e., their precision. We call this the neuron labels error margin.

Error-margin Analysis for Hidden Neuron Activation Labels

TL;DR

The paper addresses how to interpret high-level concept representations in neural networks by introducing error-margin analysis to quantify precision alongside recall for neuron-concept labels. It defines Non-TLA and TLA, computes activation percentages across multiple thresholds, and uses Concept Induction to assign candidate labels, with validation on Google Images and ADE20K augmented by MTurk annotations. Statistical evaluation with Mann-Whitney U and Wilcoxon tests shows ADE20K exhibits lower Non-TLA and robust significance across thresholds (e.g., aggregated ), supporting the reliability and generalizability of the approach. The work enhances explainable AI by providing probabilistic, dataset-aware neuron-label mappings that reduce false positives in concept-based interpretations of hidden activations.

Abstract

Understanding how high-level concepts are represented within artificial neural networks is a fundamental challenge in the field of artificial intelligence. While existing literature in explainable AI emphasizes the importance of labeling neurons with concepts to understand their functioning, they mostly focus on identifying what stimulus activates a neuron in most cases, this corresponds to the notion of recall in information retrieval. We argue that this is only the first-part of a two-part job, it is imperative to also investigate neuron responses to other stimuli, i.e., their precision. We call this the neuron labels error margin.
Paper Structure (9 sections, 6 tables)