Table of Contents
Fetching ...

Group Orthogonalization Regularization For Vision Models Adaptation and Robustness

Yoav Kurtz, Noga Bar, Raja Giryes

TL;DR

Group Orthogonalization Regularization (GOR) targets redundant parameter coupling by enforcing orthonormality within groups of filters in each layer, dramatically reducing computational burden compared with full-layer orthogonalization. By aligning filter-group partitions with normalization groups, GOR enhances expressivity and diversity, improving adaptation performance for Vision Transformers and diffusion models, and boosting robustness under adversarial training. The approach integrates smoothly with LoRA/AdaptFormer-style adapters, yielding consistent gains across CIFAR-10, downstream ViT tasks, and text-to-image generation, while maintaining efficiency as layer dimensionality grows. Overall, GOR offers a practical, scalable regularization that reduces redundancy, strengthens transfer and robustness, and complements existing normalization strategies.

Abstract

As neural networks become deeper, the redundancy within their parameters increases. This phenomenon has led to several methods that attempt to reduce the correlation between convolutional filters. We propose a computationally efficient regularization technique that encourages orthonormality between groups of filters within the same layer. Our experiments show that when incorporated into recent adaptation methods for diffusion models and vision transformers (ViTs), this regularization improves performance on downstream tasks. We further show improved robustness when group orthogonality is enforced during adversarial training. Our code is available at https://github.com/YoavKurtz/GOR.

Group Orthogonalization Regularization For Vision Models Adaptation and Robustness

TL;DR

Group Orthogonalization Regularization (GOR) targets redundant parameter coupling by enforcing orthonormality within groups of filters in each layer, dramatically reducing computational burden compared with full-layer orthogonalization. By aligning filter-group partitions with normalization groups, GOR enhances expressivity and diversity, improving adaptation performance for Vision Transformers and diffusion models, and boosting robustness under adversarial training. The approach integrates smoothly with LoRA/AdaptFormer-style adapters, yielding consistent gains across CIFAR-10, downstream ViT tasks, and text-to-image generation, while maintaining efficiency as layer dimensionality grows. Overall, GOR offers a practical, scalable regularization that reduces redundancy, strengthens transfer and robustness, and complements existing normalization strategies.

Abstract

As neural networks become deeper, the redundancy within their parameters increases. This phenomenon has led to several methods that attempt to reduce the correlation between convolutional filters. We propose a computationally efficient regularization technique that encourages orthonormality between groups of filters within the same layer. Our experiments show that when incorporated into recent adaptation methods for diffusion models and vision transformers (ViTs), this regularization improves performance on downstream tasks. We further show improved robustness when group orthogonality is enforced during adversarial training. Our code is available at https://github.com/YoavKurtz/GOR.
Paper Structure (23 sections, 3 equations, 13 figures, 8 tables)

This paper contains 23 sections, 3 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Qualitative comparisons on Pokemon-BLIP between a baseline fine-tuned model using LoRA (top) and a model fine-tuned along with GOR (bottom) using the same seed. The green rectangle is zoomed in by a factor of $1.5$. Note the improved quality of GOR
  • Figure 2: Visualization of GOR's group partition for $N=3$. GOR enforces orthonormal regularization on groups of weights in the network layers. Best viewed in color.
  • Figure 3: Qualitative comparisons on Oxford102 between baseline fine-tuned model and model fine-tuned along with GOR using the same seed. The green rectangle is zoomed in by a factor of $1.5$. For each of the two rows: Top is LoRA baseline. Bottom is LoRA with our method. For the generation of the flowers themselves, the two models are comparable with similar artifacts, while our model is more successful at generating the background grass. This may be explained by the fact that we encourage orthogonality in the weights, which helps support more details.
  • Figure 4: Qualitative comparisons on FS-COCO between baseline fine-tuned model and model fine-tuned along with GOR using the same seed. The green rectangle is zoomed in by a factor of $1.5$. For each of the two rows: Top is LoRA baseline. Bottom is LoRA with our method. Our method improves the generation quality by both aligning with the text prompt more closely (second image from the right) and by removing artifacts.
  • Figure 5: For different $N$ (group size) values, we report (a) runtime, (b) multiply-accumulate (MAC). GOR improves over SO in terms of MACs and memory while getting accuracy improvement.
  • ...and 8 more figures