Artificial Neural Nets and the Representation of Human Concepts
Timo Freiesleben
TL;DR
This work questions the prevailing claim that artificial neural networks store and operate on human concepts within individual units. By introducing coactivation and functional role as criteria for concept representation, it surveys evidence from transfer learning, TCAV, and adversarial examples to assess whether ANNs learn human concepts and how they store them. The author concludes that ANNs do learn concepts and can perform complex tasks, but the evidence for single-unit, human-concept storage is weak and often mixed, pointing toward distributed representations and context-dependent features, including non-human concepts. The discussion highlights methodological risks, advocates for falsifiable hypotheses, and urges exploration beyond supervised learning to better understand when and how concepts emerge in AI systems and their implications for interpretability and safety.
Abstract
What do artificial neural networks (ANNs) learn? The machine learning (ML) community shares the narrative that ANNs must develop abstract human concepts to perform complex tasks. Some go even further and believe that these concepts are stored in individual units of the network. Based on current research, I systematically investigate the assumptions underlying this narrative. I conclude that ANNs are indeed capable of performing complex prediction tasks, and that they may learn human and non-human concepts to do so. However, evidence indicates that ANNs do not represent these concepts in individual units.
