Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research

Qinglong Cao; Yuntian Chen; Lu Lu; Hao Sun; Zhenzhong Zeng; Xiaokang Yang; Dongxiao Zhang

Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research

Qinglong Cao, Yuntian Chen, Lu Lu, Hao Sun, Zhenzhong Zeng, Xiaokang Yang, Dongxiao Zhang

TL;DR

GDPL introduces Generalized Domain Prompt Learning to bridge the gap between powerful natural-domain VLMs and domain-specific research. By leveraging domain-specific foundation models, quaternion networks, and cross-modal low-rank adaptation, it propagates domain knowledge into both language and vision streams to create domain-aware VLMs with minimal data. Extensive experiments across remote sensing, medical imaging, geology, SAR, and fluid dynamics demonstrate consistent improvements over natural-domain prompt baselines and show robust cross-dataset and category-generalization gains. The framework promotes sustainable, equitable VLM research in academia by enabling effective domain transfer with limited resources and data. The approach has practical implications for accelerating domain-specific AI research while preserving vision-language alignment.

Abstract

Large-scale Vision-Language Models (VLMs) have demonstrated exceptional performance in natural vision tasks, motivating researchers across domains to explore domain-specific VLMs. However, the construction of powerful domain-specific VLMs demands vast amounts of annotated data, substantial electrical energy, and computing resources, primarily accessible to industry, yet hindering VLM research in academia. To address this challenge and foster sustainable and equitable VLM research, we present the Generalized Domain Prompt Learning (GDPL) framework. GDPL facilitates the transfer of VLMs' robust recognition capabilities from natural vision to specialized domains, without the need for extensive data or resources. By leveraging small-scale domain-specific foundation models and minimal prompt samples, GDPL empowers the language branch with domain knowledge through quaternion networks, uncovering cross-modal relationships between domain-specific vision features and natural vision-based contextual embeddings. Simultaneously, GDPL guides the vision branch into specific domains through hierarchical propagation of generated vision prompt features, grounded in well-matched vision-language relations. Furthermore, to fully harness the domain adaptation potential of VLMs, we introduce a novel low-rank adaptation approach. Extensive experiments across diverse domains like remote sensing, medical imaging, geology, Synthetic Aperture Radar, and fluid dynamics, validate the efficacy of GDPL, demonstrating its ability to achieve state-of-the-art domain recognition performance in a prompt learning paradigm. Our framework paves the way for sustainable and inclusive VLM research, transcending the barriers between academia and industry.

Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research

TL;DR

Abstract

Paper Structure (22 sections, 25 equations, 7 figures, 7 tables)

This paper contains 22 sections, 25 equations, 7 figures, 7 tables.

Introduction
Overview
Problem Setting
Generalized Domain Prompt Learning
Results
Datasets and Evaluation
Evaluation Results
Case Study.
Discussion
Methods
Preliminaries
Prompting Language Branch
Prompting Vision Branch
Cross-Modal Low-Rank Adaptation
Final Classification
...and 7 more sections

Figures (7)

Figure 1: The main concept of the proposed method. (A) Existing Large-scale vision-language models (VLMs) for the natural vision domain. (B) The same logic of (A) to acquire VLMs for specific domains is possible for the industry yet inaccessible for academia. (C) Domain-specific foundation models prompt VLMs into specific domains. HPC: High Performance Computing Center.
Figure 2: The network of our proposed generalized domain prompt learning. Utilizing the domain-specific foundation model to provide domain knowledge, the vision and language branch are prompted into the specific domain, Meanwhile, through our proposed cross-modal low-rank adaptation, the domain adaptation potentials of VLMs are mined in a cross-modal update manner.
Figure 3: Comparison between our method and SOTA methods in terms of average performance. Our method performs well over the compared methods for five domains. We use red frame to highlight our performance. HM denotes the harmonic mean score, and the others are the accuracy scores.
Figure 4: Visualization for different datasets. The scores denote the similarity scores. Higher similarity scores indicate that the sample is more likely to belong to the corresponding category.
Figure 5: Comparisons with SOTA methods for (A) cross-dataset generalization and (B) single-source multi-target domain generalization with the MLRSNet dataset as the source dataset. The proposed method achieves better performance than the compared methods.
...and 2 more figures

Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research

TL;DR

Abstract

Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research

Authors

TL;DR

Abstract

Table of Contents

Figures (7)