Table of Contents
Fetching ...

GNN-MoE: Context-Aware Patch Routing using GNNs for Parameter-Efficient Domain Generalization

Mahmoud Soliman, Omar Abdelaziz, Ahmed Radwan, Anand, Mohamed Shehata

TL;DR

GNN-MoE addresses the challenge of domain generalization for Vision Transformers by introducing a graph-based, context-aware routing mechanism that directs patches to multiple Kronecker adapter experts. The method combines a GNN router operating on inter-patch graphs with parameter-efficient Kronecker adapters, achieving high performance with few trainable parameters. Empirically, it delivers state-of-the-art or competitive results on five DG benchmarks, significantly outperforming full fine-tuning and standard PEFT baselines while maintaining parameter efficiency. This graph-based routing strategy enhances robustness to domain shifts and can be extended to larger ViT families.

Abstract

Domain generalization (DG) seeks robust Vision Transformer (ViT) performance on unseen domains. Efficiently adapting pretrained ViTs for DG is challenging; standard fine-tuning is costly and can impair generalization. We propose GNN-MoE, enhancing Parameter-Efficient Fine-Tuning (PEFT) for DG with a Mixture-of-Experts (MoE) framework using efficient Kronecker adapters. Instead of token-based routing, a novel Graph Neural Network (GNN) router (GCN, GAT, SAGE) operates on inter-patch graphs to dynamically assign patches to specialized experts. This context-aware GNN routing leverages inter-patch relationships for better adaptation to domain shifts. GNN-MoE achieves state-of-the-art or competitive DG benchmark performance with high parameter efficiency, highlighting the utility of graph-based contextual routing for robust, lightweight DG.

GNN-MoE: Context-Aware Patch Routing using GNNs for Parameter-Efficient Domain Generalization

TL;DR

GNN-MoE addresses the challenge of domain generalization for Vision Transformers by introducing a graph-based, context-aware routing mechanism that directs patches to multiple Kronecker adapter experts. The method combines a GNN router operating on inter-patch graphs with parameter-efficient Kronecker adapters, achieving high performance with few trainable parameters. Empirically, it delivers state-of-the-art or competitive results on five DG benchmarks, significantly outperforming full fine-tuning and standard PEFT baselines while maintaining parameter efficiency. This graph-based routing strategy enhances robustness to domain shifts and can be extended to larger ViT families.

Abstract

Domain generalization (DG) seeks robust Vision Transformer (ViT) performance on unseen domains. Efficiently adapting pretrained ViTs for DG is challenging; standard fine-tuning is costly and can impair generalization. We propose GNN-MoE, enhancing Parameter-Efficient Fine-Tuning (PEFT) for DG with a Mixture-of-Experts (MoE) framework using efficient Kronecker adapters. Instead of token-based routing, a novel Graph Neural Network (GNN) router (GCN, GAT, SAGE) operates on inter-patch graphs to dynamically assign patches to specialized experts. This context-aware GNN routing leverages inter-patch relationships for better adaptation to domain shifts. GNN-MoE achieves state-of-the-art or competitive DG benchmark performance with high parameter efficiency, highlighting the utility of graph-based contextual routing for robust, lightweight DG.

Paper Structure

This paper contains 24 sections, 10 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: GNN-Routed Mixture-of-Experts Architecture for Domain Generalization. The architecture combines a frozen pretrained backbone with trainable GNN-based routing and domain-specific expert adapters. The GNN router analyzes input structure to generate routing weights, enabling adaptive combination of domain experts for robust cross-domain performance.
  • Figure 2: A representation of an old kettle of OfficeHome dataset broken down into graph nodes. An optimal GNN router should be capable of understanding this relational graph in order to route patches to the corresponding experts.
  • Figure 3: A t-SNE projection of the learned features of our model versus a baseline model on the ART domain of OfficeHome. Best viewed when zoomed in.