Table of Contents
Fetching ...

Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling

Anas Belfathi, Nicolas Hernandez, Laura Monceaux, Warren Bonnard, Mary Catherine Lavissiere, Christine Jacquin, Richard Dufour

TL;DR

This work introduces SCOTUS-Law, the first dataset of U.S. Supreme Court opinions annotated with rhetorical roles annotated with rhetorical roles at three levels of granularity: category, rhetorical function, and step, and proposes two prototype-based methods that integrate local context with global representations.

Abstract

Rhetorical Role Labeling (RRL) identifies the functional role of each sentence in a document, a key task for discourse understanding in domains such as law and medicine. While hierarchical models capture local dependencies effectively, they are limited in modeling global, corpus-level features. To address this limitation, we propose two prototype-based methods that integrate local context with global representations. Prototype-Based Regularization (PBR) learns soft prototypes through a distance-based auxiliary loss to structure the latent space, while Prototype-Conditioned Modulation (PCM) constructs corpus-level prototypes and injects them during training and inference. Given the scarcity of RRL resources, we introduce SCOTUS-Law, the first dataset of U.S. Supreme Court opinions annotated with rhetorical roles at three levels of granularity: category, rhetorical function, and step. Experiments on legal, medical, and scientific benchmarks show consistent improvements over strong baselines, with 4 Macro-F1 gains on low-frequency roles. We further analyze the implications in the era of Large Language Models and complement our findings with expert evaluation.

Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling

TL;DR

This work introduces SCOTUS-Law, the first dataset of U.S. Supreme Court opinions annotated with rhetorical roles annotated with rhetorical roles at three levels of granularity: category, rhetorical function, and step, and proposes two prototype-based methods that integrate local context with global representations.

Abstract

Rhetorical Role Labeling (RRL) identifies the functional role of each sentence in a document, a key task for discourse understanding in domains such as law and medicine. While hierarchical models capture local dependencies effectively, they are limited in modeling global, corpus-level features. To address this limitation, we propose two prototype-based methods that integrate local context with global representations. Prototype-Based Regularization (PBR) learns soft prototypes through a distance-based auxiliary loss to structure the latent space, while Prototype-Conditioned Modulation (PCM) constructs corpus-level prototypes and injects them during training and inference. Given the scarcity of RRL resources, we introduce SCOTUS-Law, the first dataset of U.S. Supreme Court opinions annotated with rhetorical roles at three levels of granularity: category, rhetorical function, and step. Experiments on legal, medical, and scientific benchmarks show consistent improvements over strong baselines, with 4 Macro-F1 gains on low-frequency roles. We further analyze the implications in the era of Large Language Models and complement our findings with expert evaluation.
Paper Structure (78 sections, 7 equations, 8 figures, 9 tables)

This paper contains 78 sections, 7 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Example of a document segments in SCOTUS-Law, annotated with discursive categories, rhetorical functions, and attributes, which together form the step annotation (full hierarchy in Figure \ref{['fig:final_scheme']}).
  • Figure 2: Illustration of our methods for injecting global representations into hierarchical architectures. PBR (left) learns soft prototypes jointly with the model to regularize the latent space. PCM (right) dynamically injects precomputed role prototypes during encoding via modulation mechanisms.
  • Figure 3: t-SNE projection of sentence embeddings under baseline, PBR, and PCM.
  • Figure 4: Topical, Temporal, and Authorial Diversity in our annotated corpus.
  • Figure 5: Distribution of Rhetorical Functions by Relative Position, revealing a structured rhetorical flow in judicial reasoning—from the initial announcement to the final resolution.
  • ...and 3 more figures