From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks
Jae Hee Lee, Sergio Lanza, Stefan Wermter
TL;DR
Neural networks are powerful but often opaque, motivating a survey of methods to explain concepts learned by networks and thereby bridge learning with reasoning. The paper categorizes explanations into neuron-level and layer-level approaches, detailing similarity-based dissection, causal analyses, Concept Activation Vectors (CAVs), probing, and Concept Bottleneck Models (CBMs), with examples like MILAN and CLIP-based mappings. It highlights how these methods support neuro-symbolic AI by exposing or injecting concepts, enabling grounded explanations, debugging, and potential symbolic reasoning. Overall, the survey maps a rapidly evolving landscape toward more transparent and controllable AI systems and emphasizes the need for empirical comparisons and integration across concept-explanation techniques.
Abstract
In this paper, we review recent approaches for explaining concepts in neural networks. Concepts can act as a natural link between learning and reasoning: once the concepts are identified that a neural learning system uses, one can integrate those concepts with a reasoning system for inference or use a reasoning system to act upon them to improve or enhance the learning system. On the other hand, knowledge can not only be extracted from neural networks but concept knowledge can also be inserted into neural network architectures. Since integrating learning and reasoning is at the core of neuro-symbolic AI, the insights gained from this survey can serve as an important step towards realizing neuro-symbolic AI based on explainable concepts.
