Multi-Facet Blending for Faceted Query-by-Example Retrieval

Heejin Do; Sangwon Ryu; Jonghwi Kim; Gary Geunbae Lee

Multi-Facet Blending for Faceted Query-by-Example Retrieval

Heejin Do, Sangwon Ryu, Jonghwi Kim, Gary Geunbae Lee

TL;DR

This work tackles faceted query-by-example (QBE), where user intent is conveyed through facet constraints rather than whole-document similarity. FaBle introduces a modular, LLM-guided augmentation pipeline that decomposes documents into facet units, generates facet-specific similar and dissimilar fragments, and recomposes them into facet-conditioned training pairs without predefined facet labels. Through triplet-based fine-tuning of document embeddings, FaBle improves facet-aware retrieval, achieving notable gains on CSFCube and, importantly, transferring to educational items via the new FEIR benchmark, demonstrating domain generalization in data-scarce settings. The approach reduces reliance on large labeled datasets or domain-specific annotations and is complemented by releasing FEIR and accompanying code on GitHub for broader adoption and future research.

Abstract

With the growing demand to fit fine-grained user intents, faceted query-by-example (QBE), which retrieves similar documents conditioned on specific facets, has gained recent attention. However, prior approaches mainly depend on document-level comparisons using basic indicators like citations due to the lack of facet-level relevance datasets; yet, this limits their use to citation-based domains and fails to capture the intricacies of facet constraints. In this paper, we propose a multi-facet blending (FaBle) augmentation method, which exploits modularity by decomposing and recomposing to explicitly synthesize facet-specific training sets. We automatically decompose documents into facet units and generate (ir)relevant pairs by leveraging LLMs' intrinsic distinguishing capabilities; then, dynamically recomposing the units leads to facet-wise relevance-informed document pairs. Our modularization eliminates the need for pre-defined facet knowledge or labels. Further, to prove the FaBle's efficacy in a new domain beyond citation-based scientific paper retrieval, we release a benchmark dataset for educational exam item QBE. FaBle augmentation on 1K documents remarkably assists training in obtaining facet conditional embeddings.

Multi-Facet Blending for Faceted Query-by-Example Retrieval

TL;DR

Abstract

Multi-Facet Blending for Faceted Query-by-Example Retrieval

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)