Table of Contents
Fetching ...

Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

Jy-yong Sohn, Dohyun Kwon, Seoyeon An, Kangwook Lee

TL;DR

The paper introduces Fine-Tuning Capacity (FTC) to quantify the memorization-like limits of additive fine-tuning on pre-trained networks. It analyzes two concrete architectures: 2-layer and 3-layer ReLU networks used as side nets $g$ added to a frozen $f$, deriving tight bounds on the required neuron count $m$ to arbitrarily adjust $N$ labels among a dataset of size $K$. Specifically, 2-layer ReLU fine-tuning requires $m = \Theta(N)$ neurons, independent of $K$, while 3-layer ReLU fine-tuning reduces this to $m = \Theta(\sqrt{N})$ with bounds that may depend on $K$. The results bridge memorization capacity and fine-tuning, providing constructive upper-bound networks, lower-bound limitations, and extensions to deeper architectures, complemented by synthetic experiments that validate the square-root scaling. Overall, FTC offers a theoretical framework to understand the resource-efficiency of fine-tuning large pre-trained models.

Abstract

Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neurons ($m$) needed to arbitrarily change $N$ labels among $K$ samples considered in the fine-tuning process. In essence, FTC extends the memorization capacity concept to the fine-tuning scenario. We analyze FTC for the additive fine-tuning scenario where the fine-tuned network is defined as the summation of the frozen pre-trained network $f$ and a neural network $g$ (with $m$ neurons) designed for fine-tuning. When $g$ is a ReLU network with either 2 or 3 layers, we obtain tight upper and lower bounds on FTC; we show that $N$ samples can be fine-tuned with $m=Θ(N)$ neurons for 2-layer networks, and with $m=Θ(\sqrt{N})$ neurons for 3-layer networks, no matter how large $K$ is. Our results recover the known memorization capacity results when $N = K$ as a special case.

Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

TL;DR

The paper introduces Fine-Tuning Capacity (FTC) to quantify the memorization-like limits of additive fine-tuning on pre-trained networks. It analyzes two concrete architectures: 2-layer and 3-layer ReLU networks used as side nets added to a frozen , deriving tight bounds on the required neuron count to arbitrarily adjust labels among a dataset of size . Specifically, 2-layer ReLU fine-tuning requires neurons, independent of , while 3-layer ReLU fine-tuning reduces this to with bounds that may depend on . The results bridge memorization capacity and fine-tuning, providing constructive upper-bound networks, lower-bound limitations, and extensions to deeper architectures, complemented by synthetic experiments that validate the square-root scaling. Overall, FTC offers a theoretical framework to understand the resource-efficiency of fine-tuning large pre-trained models.

Abstract

Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neurons () needed to arbitrarily change labels among samples considered in the fine-tuning process. In essence, FTC extends the memorization capacity concept to the fine-tuning scenario. We analyze FTC for the additive fine-tuning scenario where the fine-tuned network is defined as the summation of the frozen pre-trained network and a neural network (with neurons) designed for fine-tuning. When is a ReLU network with either 2 or 3 layers, we obtain tight upper and lower bounds on FTC; we show that samples can be fine-tuned with neurons for 2-layer networks, and with neurons for 3-layer networks, no matter how large is. Our results recover the known memorization capacity results when as a special case.
Paper Structure (19 sections, 9 theorems, 57 equations, 10 figures)

This paper contains 19 sections, 9 theorems, 57 equations, 10 figures.

Key Result

Theorem 4.1

Let $K \ge 3$. Thus,

Figures (10)

  • Figure 1: Additive fine-tuning scenario where the pre-trained network $f$ is fine-tuned to $f+g_{\theta}$, in order to fit the dataset $D=\{({\bm{x}}_i, y_i)\}_{i=1}^K$. Here, the pre-trained network already fits $N$ samples $\{({\bm{x}}_i, y_i) \}_{i \in [K]\setminus T}$, where $T \subseteq [K]$ is the set of indices where $y_i \ne f({\bm{x}}_i)$. We use $g_{\theta}$ to fill the gap between $f({\bm{x}}_i)$ and $y_i$, for $i \in T$.
  • Figure 2: Proving Theorem \ref{['thm:ftc_add_bound']} for $K=14, N=4$.
  • Figure 3: Proving Theorem \ref{['thm:ftc_add_bound']} for $K=9, N=4$.
  • Figure 4: Illustration of the neural network constructed in Theorem \ref{['lem:ftc_upp']} with $K=14$ and $T = \{4, 7, 9, 14\}$. $\mathcal{P}([K]\setminus T)$ is the partition $\mathcal{P}(I)$ given in Example \ref{['ex:l2']}. The gray points are the removed ones.
  • Figure 5: Visualization of FTC for 3-layer network in corollary \ref{['coro:ftc3NN']}. In this figure, $A = \left\lfloor\frac{m^2}{108}-\frac{2}{3}\right\rfloor$ and $B = \frac{m^2+m}{6}$.
  • ...and 5 more figures

Theorems & Definitions (26)

  • Definition 1.1: FTC
  • Definition 3.1: FTC, equivalent form
  • Definition 3.2: Memorization Capacity yun2019small
  • Remark 1
  • Definition 3.3: FTC, equivalent form, in terms of # neuron
  • Theorem 4.1
  • Remark 2
  • Corollary 4.2: FTC of 2-layer FC ReLU
  • proof
  • Definition 4.3
  • ...and 16 more