Table of Contents
Fetching ...

CAD: Memory Efficient Convolutional Adapter for Segment Anything

Joohyeok Kim, Joonhyeon Song, Seohwan Yun, Seongho Yoon, Sangmin Lee

TL;DR

This architecture connects in parallel with SAM's image encoder, eliminating the need to store activations and gradients of the image encoder during model training, and demonstrated competitive experimental results while using less than half the GPU memory compared to SAM Adapter.

Abstract

The Foundation model for image segmentation, Segment Anything (SAM), has been actively researched in various fields since its proposal. Various researches have been proposed to adapt SAM to specific domains, with one notable approach involving the addition and training of lightweight adapter modules. While adapter-based fine-tuning approaches have reported parameter efficiency and significant performance improvements, they face a often overlooked issue: the excessive consumption of GPU memory relative to the number of trainable parameters. Addressing this issue, this paper proposes a memory-efficient parallel convolutional adapter architecture. This architecture connects in parallel with SAM's image encoder, eliminating the need to store activations and gradients of the image encoder during model training. Our proposed architecture demonstrated competitive experimental results while using less than half the GPU memory compared to SAM Adapter, indicating its value as an alternative to simple decoder fine-tuning when hardware limitations preclude adapter-based learning. Our code implementation is available at our github.

CAD: Memory Efficient Convolutional Adapter for Segment Anything

TL;DR

This architecture connects in parallel with SAM's image encoder, eliminating the need to store activations and gradients of the image encoder during model training, and demonstrated competitive experimental results while using less than half the GPU memory compared to SAM Adapter.

Abstract

The Foundation model for image segmentation, Segment Anything (SAM), has been actively researched in various fields since its proposal. Various researches have been proposed to adapt SAM to specific domains, with one notable approach involving the addition and training of lightweight adapter modules. While adapter-based fine-tuning approaches have reported parameter efficiency and significant performance improvements, they face a often overlooked issue: the excessive consumption of GPU memory relative to the number of trainable parameters. Addressing this issue, this paper proposes a memory-efficient parallel convolutional adapter architecture. This architecture connects in parallel with SAM's image encoder, eliminating the need to store activations and gradients of the image encoder during model training. Our proposed architecture demonstrated competitive experimental results while using less than half the GPU memory compared to SAM Adapter, indicating its value as an alternative to simple decoder fine-tuning when hardware limitations preclude adapter-based learning. Our code implementation is available at our github.
Paper Structure (14 sections, 8 figures, 4 tables)

This paper contains 14 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: LoRA architecture.
  • Figure 2: SAM Adapter architecture.
  • Figure 3: Adapters in transformer block.
  • Figure 4: CAD architecture.
  • Figure 5: The process to generate high-frequency components.
  • ...and 3 more figures