Transformers have emerged as the state of the art neural network architecture
for natural language processing and computer vision. In the foundation model
paradigm, large transformer models (BERT, GPT3/4, Bloom, ViT) are pre-trained
on self-supervised tasks such as word or image masking, and then, adapted
through fine-tuning for downstream user applications including instruction
following and Question Answering. While many approaches have been developed for
model fine-tuning including low-rank weight update strategies (eg. LoRA),
underlying mathematical principles that enable network adaptation without
knowledge loss remain poorly understood. Here, we introduce a differential
geometry framework, functionally invariant paths (FIP), that provides flexible
and continuous adaptation of neural networks for a range of machine learning
goals and network sparsification objectives. We conceptualize the weight space
of a neural network as a curved Riemannian manifold equipped with a metric
tensor whose spectrum defines low rank subspaces in weight space that
accommodate network adaptation without loss of prior knowledge. We formalize
adaptation as movement along a geodesic path in weight space while searching
for networks that accommodate secondary objectives. With modest computational
resources, the FIP algorithm achieves comparable to state of the art
performance on continual learning and sparsification tasks for language models
(BERT), vision transformers (ViT, DeIT), and the CNNs. Broadly, we
conceptualize a neural network as a mathematical object that can be iteratively
transformed into distinct configurations by the path-sampling algorithm to
define a sub-manifold of weight space that can be harnessed to achieve user
goals.