The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units
Oswaldo Ludwig
TL;DR
This work investigates using the condition number of neural weight tensors as a scale-invariant proxy for information encoding, linking anisotropic singular-value distributions to efficient, task-relevant information processing. It develops a theoretical framework under Gaussian linear units to relate singular values, entropy, and scale-invariance, and argues that high anisotropy (large κ) concentrates discriminative information while reducing overall output entropy. The authors validate the approach with KappaTune, a selective fine-tuning method that unfreezes low-kappa tensors, showing improved catastrophic forgetting mitigation in multimodal LLMs and ASR tasks without requiring pre-training data. The results suggest practical, data-free adaptation strategies guided by intrinsic properties of weight tensors and point to broader validation and generalization in future work.
Abstract
This paper explores the relationship between the condition number of a neural network's weight tensor and the extent of information encoded by the associated processing unit, viewed through the lens of information theory. It argues that a high condition number, though not sufficient for effective knowledge encoding, may indicate that the unit has learned to selectively amplify and compress information. This intuition is formalized for linear units with Gaussian inputs, linking the condition number and the transformation's log-volume scaling factor to the characteristics of the output entropy and the geometric properties of the learned transformation. The analysis demonstrates that for a fixed weight norm, a concentrated distribution of singular values (high condition number) corresponds to reduced overall information transfer, indicating a specialized and efficient encoding strategy. Furthermore, the linear stage entropy bound provides an upper limit on post-activation information for contractive, element-wise nonlinearities, supporting the condition number as a scale-invariant proxy for encoding capacity in practical neural networks. An empirical case study applies these principles to guide selective fine-tuning of Large Language Models for both a new task and a new input modality. The experiments show that the proposed method, named KappaTune, effectively mitigates catastrophic forgetting. Unlike many existing catastrophic forgetting mitigation methods that rely on access to pre-training statistics, which are often unavailable, this selective fine-tuning approach offers a way to bypass this common requirement.
