Table of Contents
Fetching ...

COLORA: Efficient Fine-Tuning for Convolutional Models with a Study Case on Optical Coherence Tomography Image Classification

Mariano Rivera, Angello Hoyos

TL;DR

CoLoRA bridges LoRA and CNNs by factorizing convolutional updates into separable depthwise and pointwise components, maintaining a frozen backbone while training a small set of parameters. Updates are merged into the pretrained weights after each epoch to preserve inference cost, yielding a practical parameter-efficient fine-tuning approach. On OCTMNISTv2, CoLoRA applied to VGG16 and ResNet50v2 achieves competitive accuracy and AUC with significantly fewer trainable parameters and roughly 20% faster per-epoch training, illustrating strong applicability to medical image classification. The work highlights stability, deployability, and potential for broad adoption across CNN-based models and modalities, with future directions toward larger datasets, 1D/3D domains, and hybrid efficiency strategies.

Abstract

We introduce CoLoRA (Convolutional Low-Rank Adaptation), a parameter-efficient fine-tuning method for convolutional neural networks (CNNs). CoLoRA extends LoRA to convolutional layers by decomposing kernel updates into lightweight depthwise and pointwise components.This design reduces the number of trainable parameters to 0.2 compared to conventional fine-tuning, preserves the original model size, and allows merging updates into the pretrained weights after each epoch, keeping inference complexity unchanged. On OCTMNISTv2, CoLoRA applied to VGG16 and ResNet50 achieves up to 1 percent accuracy and 0.013 AUC improvements over strong baselines (Vision Transformers, state-space, and Kolmogorov Arnold models) while reducing per-epoch training time by nearly 20 percent. Results indicate that CoLoRA provides a stable and effective alternative to full fine-tuning for medical image classification.

COLORA: Efficient Fine-Tuning for Convolutional Models with a Study Case on Optical Coherence Tomography Image Classification

TL;DR

CoLoRA bridges LoRA and CNNs by factorizing convolutional updates into separable depthwise and pointwise components, maintaining a frozen backbone while training a small set of parameters. Updates are merged into the pretrained weights after each epoch to preserve inference cost, yielding a practical parameter-efficient fine-tuning approach. On OCTMNISTv2, CoLoRA applied to VGG16 and ResNet50v2 achieves competitive accuracy and AUC with significantly fewer trainable parameters and roughly 20% faster per-epoch training, illustrating strong applicability to medical image classification. The work highlights stability, deployability, and potential for broad adoption across CNN-based models and modalities, with future directions toward larger datasets, 1D/3D domains, and hybrid efficiency strategies.

Abstract

We introduce CoLoRA (Convolutional Low-Rank Adaptation), a parameter-efficient fine-tuning method for convolutional neural networks (CNNs). CoLoRA extends LoRA to convolutional layers by decomposing kernel updates into lightweight depthwise and pointwise components.This design reduces the number of trainable parameters to 0.2 compared to conventional fine-tuning, preserves the original model size, and allows merging updates into the pretrained weights after each epoch, keeping inference complexity unchanged. On OCTMNISTv2, CoLoRA applied to VGG16 and ResNet50 achieves up to 1 percent accuracy and 0.013 AUC improvements over strong baselines (Vision Transformers, state-space, and Kolmogorov Arnold models) while reducing per-epoch training time by nearly 20 percent. Results indicate that CoLoRA provides a stable and effective alternative to full fine-tuning for medical image classification.

Paper Structure

This paper contains 16 sections, 12 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Schematic of Rebuffi’s CNN adapter. A $1\times1$ adapter after each convolutional block mixes preceding activations for domain-specific adaptation while freezing backbone filters.
  • Figure 2: (a) LoRA on dense layers. (b) 2D convolutional kernel. (c) Depthwise–pointwise factorization of a 2D convolution.
  • Figure 3: CoLoRA layer: a trainable depthwise--pointwise residual is added to a frozen convolution; updates are merged after training to preserve inference complexity.
  • Figure 4: Inception-style CoLoRA: pointwise mixing precedes depthwise spatial filtering.
  • Figure 5: Operations of the CoLoRA layer (2D): $1{\times}1$ pointwise mixing per location followed by per-channel depthwise spatial convolution.
  • ...and 8 more figures