An overview of domain-specific foundation model: key technologies, applications and challenges

Haolong Chen; Hanzhi Chen; Zijian Zhao; Kaifeng Han; Guangxu Zhu; Yichen Zhao; Ying Du; Wei Xu; Qingjiang Shi

An overview of domain-specific foundation model: key technologies, applications and challenges

Haolong Chen, Hanzhi Chen, Zijian Zhao, Kaifeng Han, Guangxu Zhu, Yichen Zhao, Ying Du, Wei Xu, Qingjiang Shi

TL;DR

This paper surveys methodologies to tailor foundation models for domain-specific tasks by outlining a five-module multi-modality FM architecture, training and scaling principles, and a three-tier customization framework. It systematically analyzes plug-and-play versus fine-tuning strategies, pre-trained-module augmentation, and scratch-built designs, illustrated with concrete examples like ImageBind and CoDi-2. The article also reviews applications across diverse domains and identifies data, architectural, cost, and security challenges, offering actionable guidance for researchers and practitioners. By connecting modular design choices with practical deployment considerations, it provides a comprehensive reference for building domain-specific FMs that leverage cross-modal capabilities while remaining resource-conscious and secure.

Abstract

The impressive performance of ChatGPT and other foundation-model-based products in human language understanding has prompted both academia and industry to explore how these models can be tailored for specific industries and application scenarios. This process, known as the customization of domain-specific foundation models (FMs), addresses the limitations of general-purpose models, which may not fully capture the unique patterns and requirements of domain-specific data. Despite its importance, there is a notable lack of comprehensive overview papers on building domain-specific FMs, while numerous resources exist for general-purpose models. To bridge this gap, this article provides a timely and thorough overview of the methodology for customizing domain-specific FMs. It introduces basic concepts, outlines the general architecture, and surveys key methods for constructing domain-specific models. Furthermore, the article discusses various domains that can benefit from these specialized models and highlights the challenges ahead. Through this overview, we aim to offer valuable guidance and reference for researchers and practitioners from diverse fields to develop their own customized FMs.

An overview of domain-specific foundation model: key technologies, applications and challenges

TL;DR

Abstract

Paper Structure (28 sections, 6 equations, 7 figures, 6 tables)

This paper contains 28 sections, 6 equations, 7 figures, 6 tables.

Introduction
Preliminaries of multi-modality foundation models
Architecture of foundation models
Model training
Gain of scaling up
Performance comparison
Key technologies for building domain-specific foundation models
Domain-specific enhancement based on general-purpose foundation model
Plug-and-play domain-specific enhancement
Domain-specific enhancement based on fine-tuning
Customization of the foundation model based on pre-trained modules
Customization of modality encoder
Customization of backbone calculator
Customization of modality decoder
Construction of the foundation model without pre-trained modules
...and 13 more sections

Figures (7)

Figure 1: The organization structure of this article.
Figure 2: The framework of multi-modality FMs with language as the central modality.
Figure 3: The structure of autoencoder.
Figure 6: Plug-and-play domain-specific enhancement. (a) Invoking existing knowledge; (b) Knowledge embedding by prompts; (c) Knowledge embedding by external knowledge base.
Figure 8: Fine-tuning-based domain-specific enhancement. (a) Adapter-based tuning; (b) Low-rank-decomposition-based tuning; (c) Full fine-tuning.
...and 2 more figures

An overview of domain-specific foundation model: key technologies, applications and challenges

TL;DR

Abstract

An overview of domain-specific foundation model: key technologies, applications and challenges

Authors

TL;DR

Abstract

Table of Contents

Figures (7)