Table of Contents
Fetching ...

What should a neuron aim for? Designing local objective functions based on information theory

Andreas C. Schneider, Valentin Neuhaus, David A. Ehrlich, Abdullah Makkeh, Alexander S. Ecker, Viola Priesemann, Michael Wibral

TL;DR

The paper tackles the opacity of neuron-level learning in globally trained networks by introducing infomorphic neurons that optimize local objectives derived from Partial Information Decomposition (PID). By structuring inputs into feedforward $F$, context $C$, and lateral $L$ signals, and formulating a per-neuron objective $G = \bm{\gamma}^T \mathbf{\Pi}$ over PID atoms, the approach enables self-organized learning with interpretable information-processing goals. The authors demonstrate both bivariate and trivariate instantiations, showing that trivariate networks can achieve MNIST-level classification performance close to backpropagation, while providing insights into which PID atoms drive learning via hyperparameter optimization. The work advances a principled, information-theoretic foundation for local learning with practical, interpretable dynamics, and releases code to reproduce the results.

Abstract

In modern deep neural networks, the learning dynamics of the individual neurons is often obscure, as the networks are trained via global optimization. Conversely, biological systems build on self-organized, local learning, achieving robustness and efficiency with limited global information. We here show how self-organization between individual artificial neurons can be achieved by designing abstract bio-inspired local learning goals. These goals are parameterized using a recent extension of information theory, Partial Information Decomposition (PID), which decomposes the information that a set of information sources holds about an outcome into unique, redundant and synergistic contributions. Our framework enables neurons to locally shape the integration of information from various input classes, i.e. feedforward, feedback, and lateral, by selecting which of the three inputs should contribute uniquely, redundantly or synergistically to the output. This selection is expressed as a weighted sum of PID terms, which, for a given problem, can be directly derived from intuitive reasoning or via numerical optimization, offering a window into understanding task-relevant local information processing. Achieving neuron-level interpretability while enabling strong performance using local learning, our work advances a principled information-theoretic foundation for local learning strategies.

What should a neuron aim for? Designing local objective functions based on information theory

TL;DR

The paper tackles the opacity of neuron-level learning in globally trained networks by introducing infomorphic neurons that optimize local objectives derived from Partial Information Decomposition (PID). By structuring inputs into feedforward , context , and lateral signals, and formulating a per-neuron objective over PID atoms, the approach enables self-organized learning with interpretable information-processing goals. The authors demonstrate both bivariate and trivariate instantiations, showing that trivariate networks can achieve MNIST-level classification performance close to backpropagation, while providing insights into which PID atoms drive learning via hyperparameter optimization. The work advances a principled, information-theoretic foundation for local learning with practical, interpretable dynamics, and releases code to reproduce the results.

Abstract

In modern deep neural networks, the learning dynamics of the individual neurons is often obscure, as the networks are trained via global optimization. Conversely, biological systems build on self-organized, local learning, achieving robustness and efficiency with limited global information. We here show how self-organization between individual artificial neurons can be achieved by designing abstract bio-inspired local learning goals. These goals are parameterized using a recent extension of information theory, Partial Information Decomposition (PID), which decomposes the information that a set of information sources holds about an outcome into unique, redundant and synergistic contributions. Our framework enables neurons to locally shape the integration of information from various input classes, i.e. feedforward, feedback, and lateral, by selecting which of the three inputs should contribute uniquely, redundantly or synergistically to the output. This selection is expressed as a weighted sum of PID terms, which, for a given problem, can be directly derived from intuitive reasoning or via numerical optimization, offering a window into understanding task-relevant local information processing. Achieving neuron-level interpretability while enabling strong performance using local learning, our work advances a principled information-theoretic foundation for local learning strategies.

Paper Structure

This paper contains 19 sections, 8 equations, 11 figures, 3 tables, 5 algorithms.

Figures (11)

  • Figure 1: Infomorphic neurons are abstract, information-theoretic neurons inspired by the structure of pyramidal neurons wibral2017goal-function. They are trained by adjusting their synaptic weights according to a PID-based goal function.A,B. Inspired by the distinction between apical and basal dendrites in cortical pyramidal neurons, infomorphic neurons are defined as computational units with separate feedforward ($F$) and contextual ($C$) input classes. C. Partial information decomposition (PID) allows one to dissect the total entropy of the neuron into explainable components. D. PID enables one to distinguish how much information comes uniquely from either the feedforward $F$ ($\Pi_\mathrm{unq,F}$) or contextual $C$ input ($\Pi_\mathrm{unq,C}$) and how much they contribute redundantly ($\Pi_\mathrm{red}$), or synergistically ($\Pi_\mathrm{syn}$). Classic information theory (top) cannot disentangle these information atoms: The classic entities cover several of the atoms, so that effectively one can only measure redundant minus synergistic information. E. Formulating goal functions $G_\mathrm{neuron}$ in terms of PID-atoms $\bm \Pi_{\mathrm{neuron}}$ enables one to formulate how strongly redundant, unique, or synergistic information should contribute to a neuron's output. Figure adapted from makkeh2023general.
  • Figure 2: Adding lateral connections as a third input class enables neurons to self-organize to encode relevant and unique information.A,B. Infomorphic neuron with three inputs - namely feed-forward ($F$), contextual ($C$) and lateral ($L$) inputs. C. With three input classes, the number of PID-atoms increases to 18, represented by different colors, plus the residual entropy $H_\mathrm{res}$ in the outer circle. Classical information-theoretic quantities such as the entropy $H(Y)$ and mutual information $I$ with individual sources are depicted by ovals, indicating how they can be built from PID atoms. D. Three input classes allow for the optimization of more complex goals based on 19 different terms (compared to only five in \ref{['fig:PID_neuron_model']}.C). This trivariate PID allows to combine two bivariate objective functions: In a supervised learning task, the intuitive goal is to maximize information in the neuron's output that is redundant between the feedforward input $F$ and label $C$, while simultaneously ensuring the neuron's output stays unique with respect to lateral neurons $L$. While bivariate goal functions only allow for optimizing one of these objectives at a time, both objectives can be combined to the goal of maximizing the single atom $\Pi_{\{F\}\{C\}}$ in the trivariate case.
  • Figure 3: Infomorphic networks with three input classes approach the performance of a similar network trained with backpropagation and outperform networks with two input classes on the MNIST handwritten digit classification task.A. In the hidden layer of the network, the infomorphic neurons receive feedforward and lateral connections as well as either the ground-truth label or feedback from the output layer, outlined in three setups. B. Networks with trivariate infomorphic neurons outperform models with bivariate neurons with both, heuristic and optimized goals for a hidden layer size of 100 neurons. Only the setup with a feedback signal instead of the label performs worse for this layer size but outperforms the bivariate models for larger hidden layers as shown in \ref{['fig:all_performances']}. C. Infomorphic networks achieve similar performance as the same network trained with backpropagation. For larger layers, using a sparse connectivity significantly improves performance (setup 2). The lines indicate mean values, with the intervals depicting the maximum and minimum of 10 runs. The goal function parameters have been optimized for networks with a hidden layers size of 100 neurons, indicated by the dashed line in C.
  • Figure 4: Given a set of optimized goal parameters, an ablation study allows for the identification and interpretation of the most critical neural subgoals for a given task.A. The heuristically defined goal function (see \ref{['fig:trivariate_neuron']}.D) shows 92.8% test accuracy on MNIST for $N_{hid}=100$; optimizing the goal function using hyperparameter optimization increases the test accuracy to 94.9%. The optimized goal parameters include the intuitive $\gamma_{\{F\}\{C\}}$, but also additional PID atoms to be maximized or minimized at the same time. B. For identifying the most important goal parameters, we performed an ablation study (individually setting $\gamma$ parameters to 0) and measured the change in validation accuracy compared to a network trained with the full goal. C. A successive ablation of parameters in order from lowest to highest individual effect identifies four parameters as being crucial for network performance (see \ref{['tab:goal_parameters']} for their definition). The lines indicate mean values, with the intervals depicting the maximum and minimum of 10 runs. D. To test whether more complex image classification tasks require different goal parameters, we perform a separate hyperparameter optimization for setup 1, 100 neurons in CIFAR-10 and reach a median test accuracy of $42.5 \%$ (compared to $42.2 \%$ using backprop and $41.1 \%$ using the goal function optimized for MNIST).
  • Figure 5: The median self-cosine distance of trivariate neurons for different layer sizes during the course of training in a dense (left) and a sparse (right) lateral connected setup.
  • ...and 6 more figures