Inheritance Between Feedforward and Convolutional Networks via Model Projection
Nicolas Ewen, Jairo Diaz-Rodriguez, Kelly Ramsay
TL;DR
The paper unifies FFN and CNN representations in a tensor-compatible formalism, proving FFNs form a strict subset of GCNNs. It then introduces model projection, freezing pretrained per-input-channel filters and learning a single scalar per input channel $\gamma_{jk}$, yielding a parameter-efficient method that preserves adaptability. Projected CNNs adopt generalized FFN behavior, enabling the transfer of FFN techniques to CNNs even with inhomogeneous inputs. Empirical results across ImageNet-pretrained backbones and multiple datasets show that model projection is a strong, robust transfer-learning baseline, often matching or surpassing standard fine-tuning with far fewer trainable parameters. The work provides both theoretical guarantees and practical tooling for efficient cross-class inheritance between model families.
Abstract
Techniques for feedforward networks (FFNs) and convolutional networks (CNNs) are frequently reused across families, but the relationship between the underlying model classes is rarely made explicit. We introduce a unified node-level formalization with tensor-valued activations and show that generalized feedforward networks form a strict subset of generalized convolutional networks. Motivated by the mismatch in per-input parameterization between the two families, we propose model projection, a parameter-efficient transfer learning method for CNNs that freezes pretrained per-input-channel filters and learns a single scalar gate for each (output channel, input channel) contribution. Projection keeps all convolutional layers adaptable to downstream tasks while substantially reducing the number of trained parameters in convolutional layers. We prove that projected nodes take the generalized FFN form, enabling projected CNNs to inherit feedforward techniques that do not rely on homogeneous layer inputs. Experiments across multiple ImageNet-pretrained backbones and several downstream image classification datasets show that model projection is a strong transfer learning baseline under simple training recipes.
