Table of Contents
Fetching ...

Uni-Retrieval: A Multi-Style Retrieval Framework for STEM's Education

Yanhao Jia, Xinyi Wu, Hao Li, Qinglin Zhang, Yuxiao Hu, Shuai Zhao, Wenqi Fan

TL;DR

Uni-Retrieval tackles multi-style retrieval in STEM education by introducing a Prototype Learning Module and a continuously updatable Prompt Bank, enabling style-conditioned prompt expansion for vision-language retrieval. The STEM Education Retrieval Dataset (SER) provides 24k diverse, multi-modal queries to train and evaluate the system. Empirical results show Uni-Retrieval outperforms existing baselines with minimal parameter updates and modest inference overhead, while remaining effective under unknown styles via prototype-based retrieval. The approach offers a practical, scalable solution for educators to access diverse, style-tailored resources across text, image, and audio modalities. Together, these contributions push toward adaptive, context-aware educational resource retrieval in real-world STEM settings.

Abstract

In AI-facilitated teaching, leveraging various query styles to interpret abstract text descriptions is crucial for ensuring high-quality teaching. However, current retrieval models primarily focus on natural text-image retrieval, making them insufficiently tailored to educational scenarios due to the ambiguities in the retrieval process. In this paper, we propose a diverse expression retrieval task tailored to educational scenarios, supporting retrieval based on multiple query styles and expressions. We introduce the STEM Education Retrieval Dataset (SER), which contains over 24,000 query pairs of different styles, and the Uni-Retrieval, an efficient and style-diversified retrieval vision-language model based on prompt tuning. Uni-Retrieval extracts query style features as prototypes and builds a continuously updated Prompt Bank containing prompt tokens for diverse queries. This bank can updated during test time to represent domain-specific knowledge for different subject retrieval scenarios. Our framework demonstrates scalability and robustness by dynamically retrieving prompt tokens based on prototype similarity, effectively facilitating learning for unknown queries. Experimental results indicate that Uni-Retrieval outperforms existing retrieval models in most retrieval tasks. This advancement provides a scalable and precise solution for diverse educational needs.

Uni-Retrieval: A Multi-Style Retrieval Framework for STEM's Education

TL;DR

Uni-Retrieval tackles multi-style retrieval in STEM education by introducing a Prototype Learning Module and a continuously updatable Prompt Bank, enabling style-conditioned prompt expansion for vision-language retrieval. The STEM Education Retrieval Dataset (SER) provides 24k diverse, multi-modal queries to train and evaluate the system. Empirical results show Uni-Retrieval outperforms existing baselines with minimal parameter updates and modest inference overhead, while remaining effective under unknown styles via prototype-based retrieval. The approach offers a practical, scalable solution for educators to access diverse, style-tailored resources across text, image, and audio modalities. Together, these contributions push toward adaptive, context-aware educational resource retrieval in real-world STEM settings.

Abstract

In AI-facilitated teaching, leveraging various query styles to interpret abstract text descriptions is crucial for ensuring high-quality teaching. However, current retrieval models primarily focus on natural text-image retrieval, making them insufficiently tailored to educational scenarios due to the ambiguities in the retrieval process. In this paper, we propose a diverse expression retrieval task tailored to educational scenarios, supporting retrieval based on multiple query styles and expressions. We introduce the STEM Education Retrieval Dataset (SER), which contains over 24,000 query pairs of different styles, and the Uni-Retrieval, an efficient and style-diversified retrieval vision-language model based on prompt tuning. Uni-Retrieval extracts query style features as prototypes and builds a continuously updated Prompt Bank containing prompt tokens for diverse queries. This bank can updated during test time to represent domain-specific knowledge for different subject retrieval scenarios. Our framework demonstrates scalability and robustness by dynamically retrieving prompt tokens based on prototype similarity, effectively facilitating learning for unknown queries. Experimental results indicate that Uni-Retrieval outperforms existing retrieval models in most retrieval tasks. This advancement provides a scalable and precise solution for diverse educational needs.

Paper Structure

This paper contains 22 sections, 9 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: This advancement provides a scalable and precise solution for diverse educational needs. (b). Previous retrieval models focus on text-query retrieval data. (c) Our style-diversified retrieval setting accommodates the various query styles preferred by real educational content.
  • Figure 2: Data construction pipeline. 1. STEM education knowledge base. 2. Data sources: from online resources and dataset researches. 3. Data processing: extracting essential information from collected data, using AIGC algorithms to generate diverse modalities. 4. Retrieval dataset: construct total 24,000 images and multi-modal STEM educational dataset.
  • Figure 3: The Uni-Retreival model's architechture.
  • Figure 4: The case study for our Uni-Retrieval and the FreestyleRet baseline.
  • Figure 5: The SER Dataset contains 24,000+ text captions and their corresponding queries with various styles, including Natural, Sketch, Art, Low-Resolution (Low-Res) images and audio clips from different STEM subjects.
  • ...and 2 more figures