Table of Contents
Fetching ...

BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

Zhecheng Sheng, Tianhao Zhang, Chen Jiang, Dongyeop Kang

TL;DR

BBScore introduces a Brownian-bridge–based, reference-free metric for text coherence that captures both global and local coherence without end-to-end model training. A contrastively trained Brownian encoder maps text into a latent space, from which a domain-specific diffusion coefficient $\hat{\sigma}_m^2$ is estimated to define the BBScore $B(\mathbf{s}|\hat{\sigma}_m^2)$ via a Brownian-bridge likelihood; lower scores indicate higher coherence. Evaluated on WikiSection with global and local shuffle tasks and on downstream AI/LLM-discrimination tasks, BBScore (with or without a simple classifier) achieves competitive or superior results compared to baselines like Entity Grid and Unified Coherence, and demonstrates generalizability to cross-model and cross-domain settings. The work demonstrates the practicality of a training-light, coherence-oriented metric that can be integrated as a feature in diverse NLP tasks and offers insight into model-generated vs human-authored text in controlled and real-world contexts.

Abstract

Measuring the coherence of text is a vital aspect of evaluating the quality of written content. Recent advancements in neural coherence modeling have demonstrated their efficacy in capturing entity coreference and discourse relations, thereby enhancing coherence evaluation. However, many existing methods heavily depend on static embeddings or focus narrowly on nearby context, constraining their capacity to measure the overarching coherence of long texts. In this paper, we posit that coherent texts inherently manifest a sequential and cohesive interplay among sentences, effectively conveying the central theme, purpose, or standpoint. To explore this abstract relationship, we introduce the "BBScore," a novel reference-free metric grounded in Brownian bridge theory for assessing text coherence. Our findings showcase that when synergized with a simple additional classification component, this metric attains a performance level comparable to state-of-the-art techniques on standard artificial discrimination tasks. We also establish in downstream tasks that this metric effectively differentiates between human-written documents and text generated by large language models under a specific domain. Furthermore, we illustrate the efficacy of this approach in detecting written styles attributed to diverse large language models, underscoring its potential for generalizability. In summary, we present a novel Brownian bridge coherence metric capable of measuring both local and global text coherence, while circumventing the need for end-to-end model training. This flexibility allows for its application in various downstream tasks.

BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

TL;DR

BBScore introduces a Brownian-bridge–based, reference-free metric for text coherence that captures both global and local coherence without end-to-end model training. A contrastively trained Brownian encoder maps text into a latent space, from which a domain-specific diffusion coefficient is estimated to define the BBScore via a Brownian-bridge likelihood; lower scores indicate higher coherence. Evaluated on WikiSection with global and local shuffle tasks and on downstream AI/LLM-discrimination tasks, BBScore (with or without a simple classifier) achieves competitive or superior results compared to baselines like Entity Grid and Unified Coherence, and demonstrates generalizability to cross-model and cross-domain settings. The work demonstrates the practicality of a training-light, coherence-oriented metric that can be integrated as a feature in diverse NLP tasks and offers insight into model-generated vs human-authored text in controlled and real-world contexts.

Abstract

Measuring the coherence of text is a vital aspect of evaluating the quality of written content. Recent advancements in neural coherence modeling have demonstrated their efficacy in capturing entity coreference and discourse relations, thereby enhancing coherence evaluation. However, many existing methods heavily depend on static embeddings or focus narrowly on nearby context, constraining their capacity to measure the overarching coherence of long texts. In this paper, we posit that coherent texts inherently manifest a sequential and cohesive interplay among sentences, effectively conveying the central theme, purpose, or standpoint. To explore this abstract relationship, we introduce the "BBScore," a novel reference-free metric grounded in Brownian bridge theory for assessing text coherence. Our findings showcase that when synergized with a simple additional classification component, this metric attains a performance level comparable to state-of-the-art techniques on standard artificial discrimination tasks. We also establish in downstream tasks that this metric effectively differentiates between human-written documents and text generated by large language models under a specific domain. Furthermore, we illustrate the efficacy of this approach in detecting written styles attributed to diverse large language models, underscoring its potential for generalizability. In summary, we present a novel Brownian bridge coherence metric capable of measuring both local and global text coherence, while circumventing the need for end-to-end model training. This flexibility allows for its application in various downstream tasks.
Paper Structure (33 sections, 13 equations, 9 figures, 6 tables)

This paper contains 33 sections, 13 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Utterance flow in latent space
  • Figure 2: Procedure to generate the Brownian bridge score, $\textit{B} (\mathbf{s}| \hat{\sigma}_m^2)$.
  • Figure 3: Example window pair for local discrimination task. Left is from the locally shuffled article and the right is from the original article.
  • Figure 4: Left: BBScore distribution with different size of shuffle blocks. Right: Test set BBScore-based AUC with different size of shuffle blocks
  • Figure 5: Left and Right correspond to Train and Test BBScore-based AUC with different LLMs, respectively.
  • ...and 4 more figures