CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation
Tongke Ni, Yang Fan, Junru Zhou, Xiangping Wu, Qingcai Chen
TL;DR
CrossFormer introduces the Cross-Segment Fusion Module (CSFM) to address cross-segment information loss in long-document text semantic segmentation. By computing per-segment representations $h_{ ext{seg}}$, deriving a global embedding $h_{ ext{global}}$, and fusing them as $h_{ ext{[FEA]}}$, the method predicts segmentation boundaries with improved coherence. CSFM also enables CrossFormer to function as a semantically aware RAG chunk splitter, improving retrieval-context quality and downstream answer generation. Empirically, CrossFormer achieves state-of-the-art results on public segmentation benchmarks (e.g., WIKI-727k, WIKI-zh) and yields notable gains in LongBench RAG tasks, demonstrating the practical impact of cross-segment semantic fusion for long-form document understanding.
Abstract
Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure. Traditional approaches have typically relied on preprocessing documents into segments to address input length constraints, resulting in the loss of critical semantic information across segments. To address this, we present CrossFormer, a transformer-based model featuring a novel cross-segment fusion module that dynamically models latent semantic dependencies across document segments, substantially elevating segmentation accuracy. Additionally, CrossFormer can replace rule-based chunk methods within the Retrieval-Augmented Generation (RAG) system, producing more semantically coherent chunks that enhance its efficacy. Comprehensive evaluations confirm CrossFormer's state-of-the-art performance on public text semantic segmentation datasets, alongside considerable gains on RAG benchmarks.
