Table of Contents
Fetching ...

Inheritance Between Feedforward and Convolutional Networks via Model Projection

Nicolas Ewen, Jairo Diaz-Rodriguez, Kelly Ramsay

TL;DR

The paper unifies FFN and CNN representations in a tensor-compatible formalism, proving FFNs form a strict subset of GCNNs. It then introduces model projection, freezing pretrained per-input-channel filters and learning a single scalar per input channel $\gamma_{jk}$, yielding a parameter-efficient method that preserves adaptability. Projected CNNs adopt generalized FFN behavior, enabling the transfer of FFN techniques to CNNs even with inhomogeneous inputs. Empirical results across ImageNet-pretrained backbones and multiple datasets show that model projection is a strong, robust transfer-learning baseline, often matching or surpassing standard fine-tuning with far fewer trainable parameters. The work provides both theoretical guarantees and practical tooling for efficient cross-class inheritance between model families.

Abstract

Techniques for feedforward networks (FFNs) and convolutional networks (CNNs) are frequently reused across families, but the relationship between the underlying model classes is rarely made explicit. We introduce a unified node-level formalization with tensor-valued activations and show that generalized feedforward networks form a strict subset of generalized convolutional networks. Motivated by the mismatch in per-input parameterization between the two families, we propose model projection, a parameter-efficient transfer learning method for CNNs that freezes pretrained per-input-channel filters and learns a single scalar gate for each (output channel, input channel) contribution. Projection keeps all convolutional layers adaptable to downstream tasks while substantially reducing the number of trained parameters in convolutional layers. We prove that projected nodes take the generalized FFN form, enabling projected CNNs to inherit feedforward techniques that do not rely on homogeneous layer inputs. Experiments across multiple ImageNet-pretrained backbones and several downstream image classification datasets show that model projection is a strong transfer learning baseline under simple training recipes.

Inheritance Between Feedforward and Convolutional Networks via Model Projection

TL;DR

The paper unifies FFN and CNN representations in a tensor-compatible formalism, proving FFNs form a strict subset of GCNNs. It then introduces model projection, freezing pretrained per-input-channel filters and learning a single scalar per input channel , yielding a parameter-efficient method that preserves adaptability. Projected CNNs adopt generalized FFN behavior, enabling the transfer of FFN techniques to CNNs even with inhomogeneous inputs. Empirical results across ImageNet-pretrained backbones and multiple datasets show that model projection is a strong, robust transfer-learning baseline, often matching or surpassing standard fine-tuning with far fewer trainable parameters. The work provides both theoretical guarantees and practical tooling for efficient cross-class inheritance between model families.

Abstract

Techniques for feedforward networks (FFNs) and convolutional networks (CNNs) are frequently reused across families, but the relationship between the underlying model classes is rarely made explicit. We introduce a unified node-level formalization with tensor-valued activations and show that generalized feedforward networks form a strict subset of generalized convolutional networks. Motivated by the mismatch in per-input parameterization between the two families, we propose model projection, a parameter-efficient transfer learning method for CNNs that freezes pretrained per-input-channel filters and learns a single scalar gate for each (output channel, input channel) contribution. Projection keeps all convolutional layers adaptable to downstream tasks while substantially reducing the number of trained parameters in convolutional layers. We prove that projected nodes take the generalized FFN form, enabling projected CNNs to inherit feedforward techniques that do not rely on homogeneous layer inputs. Experiments across multiple ImageNet-pretrained backbones and several downstream image classification datasets show that model projection is a strong transfer learning baseline under simple training recipes.
Paper Structure (20 sections, 7 theorems, 13 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 7 theorems, 13 equations, 4 figures, 5 tables, 1 algorithm.

Key Result

Theorem 3.5

GFFNs are a strict subset of GCNNs.

Figures (4)

  • Figure 1: (left) Standard CNN Node: Each input channel contributes through multiple learned weights due to the spatial extent of the kernels. (right) Model projection: The node has exactly one trainable weight per input channel. Spatial structure is preserved by fixed sub-functions, while channel interaction is reduced to a linear weighted sum. The resulting computation is structurally identical to an FFN node with processed tensors as inputs.
  • Figure 2: Select results from the first experiments. The solid lines show the single stage setup, while the dashed lines show the two stage setup. Each row corresponds to a particular convolutional base, organized from oldest to most recent. The columns show the results on CIFAR 10, CIFAR 100, and Oxford flowers respectively. In all charts, the x-axis is epochs, and the y-axis is test accuracy.
  • Figure 3: Results from the first set of experiments. The blue lines show the performance of model projection. The green and red lines show the performances of the single layer logistic regression, and the full fine tuning respectively. Each column corresponds to a particular convolutional base, organized from oldest to most recent. Each row shows the results on a particular dataset, organized from largest to smallest. In all charts, the x-axis is epochs, and the y-axis is test accuracy.
  • Figure 4: Results from the second set of experiments. The orange lines show the performance of 2-step fine tuning. The pink lines show the performance of 2-step fine tuning using projection in the first step, and the light blue lines represent the performance of the 2-steps using projection in both steps. Each column corresponds to a particular convolutional base, organized from oldest to most recent. Each row shows the results on a particular dataset, organized from largest to smallest. In all charts, the x-axis is epochs, and the y-axis is test accuracy.

Theorems & Definitions (21)

  • Definition 3.1
  • Remark 3.2
  • Definition 3.3
  • Definition 3.4
  • Theorem 3.5
  • Corollary 3.6
  • Definition 3.7
  • Definition 3.8
  • Definition 3.9
  • Definition 3.10
  • ...and 11 more