Increasing biases can be more efficient than increasing weights
Carlo Metta, Marco Fantozzi, Andrea Papini, Gianluca Amato, Matteo Bergamaschi, Silvia Giulia Galfrè, Alessandro Marchetti, Michelangelo Vegliò, Maurizio Parton, Francesco Morandin
TL;DR
The paper tackles the inefficiency of solely increasing weights to boost neural network performance by introducing Dendrite-Activated Connections (DAC) that use unshared biases and pre-activation to preserve information as it flows between layers. DAC replaces the standard post-activation with per-connection biases: $y_{i,j} = \varphi(b_{i,j} + z_j)$ and $z_i = \sum_j w_{i,j} y_{i,j}$, enabling greater per-parameter expressivity. Empirically, DAC yields consistent accuracy gains across SGEMM regression, CIFAR-10/100, Imagenette/Imagewoof, and ISIC datasets, with modest parameter and FLOP increases, and ablation studies show pre-activation with unshared biases often outperforms alternatives that modify only activations or rely on replicated inputs. Theoretically, DAC enhances representational power (e.g., PL_k can be represented with 2k DAC parameters versus 3k+1 for standard nets) and enables granular gradient masking, supporting more efficient information propagation. Overall, the work demonstrates that increasing biases can be a more efficient route to performance gains than increasing weights, with broad implications for architecture design and information flow in neural networks.
Abstract
We introduce a novel computational unit for neural networks that features multiple biases, challenging the traditional perceptron structure. This unit emphasizes the importance of preserving uncorrupted information as it is passed from one unit to the next, applying activation functions later in the process with specialized biases for each unit. Through both empirical and theoretical analyses, we show that by focusing on increasing biases rather than weights, there is potential for significant enhancement in a neural network model's performance. This approach offers an alternative perspective on optimizing information flow within neural networks. See source code at https://github.com/CuriosAI/dac-dev.
