Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

Jy-yong Sohn; Dohyun Kwon; Seoyeon An; Kangwook Lee

Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

Jy-yong Sohn, Dohyun Kwon, Seoyeon An, Kangwook Lee

TL;DR

The paper introduces Fine-Tuning Capacity (FTC) to quantify the memorization-like limits of additive fine-tuning on pre-trained networks. It analyzes two concrete architectures: 2-layer and 3-layer ReLU networks used as side nets $g$ added to a frozen $f$, deriving tight bounds on the required neuron count $m$ to arbitrarily adjust $N$ labels among a dataset of size $K$. Specifically, 2-layer ReLU fine-tuning requires $m = \Theta(N)$ neurons, independent of $K$, while 3-layer ReLU fine-tuning reduces this to $m = \Theta(\sqrt{N})$ with bounds that may depend on $K$. The results bridge memorization capacity and fine-tuning, providing constructive upper-bound networks, lower-bound limitations, and extensions to deeper architectures, complemented by synthetic experiments that validate the square-root scaling. Overall, FTC offers a theoretical framework to understand the resource-efficiency of fine-tuning large pre-trained models.

Abstract

Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neurons ($m$) needed to arbitrarily change $N$ labels among $K$ samples considered in the fine-tuning process. In essence, FTC extends the memorization capacity concept to the fine-tuning scenario. We analyze FTC for the additive fine-tuning scenario where the fine-tuned network is defined as the summation of the frozen pre-trained network $f$ and a neural network $g$ (with $m$ neurons) designed for fine-tuning. When $g$ is a ReLU network with either 2 or 3 layers, we obtain tight upper and lower bounds on FTC; we show that $N$ samples can be fine-tuned with $m=Θ(N)$ neurons for 2-layer networks, and with $m=Θ(\sqrt{N})$ neurons for 3-layer networks, no matter how large $K$ is. Our results recover the known memorization capacity results when $N = K$ as a special case.

Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

TL;DR

added to a frozen

, deriving tight bounds on the required neuron count

to arbitrarily adjust

labels among a dataset of size

. Specifically, 2-layer ReLU fine-tuning requires

neurons, independent of

, while 3-layer ReLU fine-tuning reduces this to

with bounds that may depend on

. The results bridge memorization capacity and fine-tuning, providing constructive upper-bound networks, lower-bound limitations, and extensions to deeper architectures, complemented by synthetic experiments that validate the square-root scaling. Overall, FTC offers a theoretical framework to understand the resource-efficiency of fine-tuning large pre-trained models.

Abstract

) needed to arbitrarily change

labels among

samples considered in the fine-tuning process. In essence, FTC extends the memorization capacity concept to the fine-tuning scenario. We analyze FTC for the additive fine-tuning scenario where the fine-tuned network is defined as the summation of the frozen pre-trained network

and a neural network

(with

neurons) designed for fine-tuning. When

is a ReLU network with either 2 or 3 layers, we obtain tight upper and lower bounds on FTC; we show that

samples can be fine-tuned with

neurons for 2-layer networks, and with

neurons for 3-layer networks, no matter how large

is. Our results recover the known memorization capacity results when

as a special case.

Paper Structure (19 sections, 9 theorems, 57 equations, 10 figures)

This paper contains 19 sections, 9 theorems, 57 equations, 10 figures.

Introduction
Related Works
Fine-Tuning
Memorization
Fine-Tuning Capacity
FTC of 2-layer FC ReLU Networks
Proof of Lower Bound on $m^{\star}$
Proof of Upper Bound on $m^{\star}$
FTC of 3-layer ReLU Network
Proof of Lower Bound on $m$
Proof of Upper Bound on $m$
Proof of $m \le 4\sqrt{K}$:
Proof of $m \le 2\sqrt{K} + 3N$:
Proof of $m \le 6 \sqrt{3N+2}$:
Extension to other neural networks
...and 4 more sections

Key Result

Theorem 4.1

Let $K \ge 3$. Thus,

Figures (10)

Figure 1: Additive fine-tuning scenario where the pre-trained network $f$ is fine-tuned to $f+g_{\theta}$, in order to fit the dataset $D=\{({\bm{x}}_i, y_i)\}_{i=1}^K$. Here, the pre-trained network already fits $N$ samples $\{({\bm{x}}_i, y_i) \}_{i \in [K]\setminus T}$, where $T \subseteq [K]$ is the set of indices where $y_i \ne f({\bm{x}}_i)$. We use $g_{\theta}$ to fill the gap between $f({\bm{x}}_i)$ and $y_i$, for $i \in T$.
Figure 2: Proving Theorem \ref{['thm:ftc_add_bound']} for $K=14, N=4$.
Figure 3: Proving Theorem \ref{['thm:ftc_add_bound']} for $K=9, N=4$.
Figure 4: Illustration of the neural network constructed in Theorem \ref{['lem:ftc_upp']} with $K=14$ and $T = \{4, 7, 9, 14\}$. $\mathcal{P}([K]\setminus T)$ is the partition $\mathcal{P}(I)$ given in Example \ref{['ex:l2']}. The gray points are the removed ones.
Figure 5: Visualization of FTC for 3-layer network in corollary \ref{['coro:ftc3NN']}. In this figure, $A = \left\lfloor\frac{m^2}{108}-\frac{2}{3}\right\rfloor$ and $B = \frac{m^2+m}{6}$.
...and 5 more figures

Theorems & Definitions (26)

Definition 1.1: FTC
Definition 3.1: FTC, equivalent form
Definition 3.2: Memorization Capacity yun2019small
Remark 1
Definition 3.3: FTC, equivalent form, in terms of # neuron
Theorem 4.1
Remark 2
Corollary 4.2: FTC of 2-layer FC ReLU
proof
Definition 4.3
...and 16 more

Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

TL;DR

Abstract

Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (26)