Table of Contents
Fetching ...

CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation

Tongke Ni, Yang Fan, Junru Zhou, Xiangping Wu, Qingcai Chen

TL;DR

CrossFormer introduces the Cross-Segment Fusion Module (CSFM) to address cross-segment information loss in long-document text semantic segmentation. By computing per-segment representations $h_{ ext{seg}}$, deriving a global embedding $h_{ ext{global}}$, and fusing them as $h_{ ext{[FEA]}}$, the method predicts segmentation boundaries with improved coherence. CSFM also enables CrossFormer to function as a semantically aware RAG chunk splitter, improving retrieval-context quality and downstream answer generation. Empirically, CrossFormer achieves state-of-the-art results on public segmentation benchmarks (e.g., WIKI-727k, WIKI-zh) and yields notable gains in LongBench RAG tasks, demonstrating the practical impact of cross-segment semantic fusion for long-form document understanding.

Abstract

Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure. Traditional approaches have typically relied on preprocessing documents into segments to address input length constraints, resulting in the loss of critical semantic information across segments. To address this, we present CrossFormer, a transformer-based model featuring a novel cross-segment fusion module that dynamically models latent semantic dependencies across document segments, substantially elevating segmentation accuracy. Additionally, CrossFormer can replace rule-based chunk methods within the Retrieval-Augmented Generation (RAG) system, producing more semantically coherent chunks that enhance its efficacy. Comprehensive evaluations confirm CrossFormer's state-of-the-art performance on public text semantic segmentation datasets, alongside considerable gains on RAG benchmarks.

CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation

TL;DR

CrossFormer introduces the Cross-Segment Fusion Module (CSFM) to address cross-segment information loss in long-document text semantic segmentation. By computing per-segment representations , deriving a global embedding , and fusing them as , the method predicts segmentation boundaries with improved coherence. CSFM also enables CrossFormer to function as a semantically aware RAG chunk splitter, improving retrieval-context quality and downstream answer generation. Empirically, CrossFormer achieves state-of-the-art results on public segmentation benchmarks (e.g., WIKI-727k, WIKI-zh) and yields notable gains in LongBench RAG tasks, demonstrating the practical impact of cross-segment semantic fusion for long-form document understanding.

Abstract

Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure. Traditional approaches have typically relied on preprocessing documents into segments to address input length constraints, resulting in the loss of critical semantic information across segments. To address this, we present CrossFormer, a transformer-based model featuring a novel cross-segment fusion module that dynamically models latent semantic dependencies across document segments, substantially elevating segmentation accuracy. Additionally, CrossFormer can replace rule-based chunk methods within the Retrieval-Augmented Generation (RAG) system, producing more semantically coherent chunks that enhance its efficacy. Comprehensive evaluations confirm CrossFormer's state-of-the-art performance on public text semantic segmentation datasets, alongside considerable gains on RAG benchmarks.

Paper Structure

This paper contains 16 sections, 3 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: 1a illustrates methods that leverages neighboring sentences of the candidate segmentation boundary to harness the contextual information lukasik2020textsegmentationcrosssegment. 1b presents approaches that divide the document into segments, followed by the intra-segment semantic interaction. 1c introduces our proposed CrossFormer featuring CSFM to extract cross-segment semantic interaction depicted by green lines.
  • Figure 2: Architecture of CrossFormer. \ref{['fig:cross-former-process-flow']} illustrates the pipeline of CrossFormer for text semantic segmentation task and its architecture, which consists of a pre-trained language model, Cross-Segment Fusion Module (CSFM), and a linear classifier. \ref{['fig:document-segment-example']} shows an example of a preprocessed document segment as input to the model. \ref{['fig:cross-segment-fusion-module']} demonstrates the detailed structure of CSFM.
  • Figure 3: A flowchart depicting the integration of CrossFormer into the RAG system as a text chunk splitter.The process begins with document input, followed by CrossFormer-based chunking. The top-k semantically relevant chunks are retrieved, and a large language model generates the final answer using the retrieved chunks.
  • Figure 4: The influence of max input length of CrossFormer on the WIKI-50 dataset koshorek2018textsegmentationsupervisedlearning. Models without the prefix "CrossFormer" are ablation experiments that do not contain CSFM.