Table of Contents
Fetching ...

TOPol: Capturing and Explaining Multidimensional Semantic Polarity Fields and Vectors

Gabin Taibi, Lucia Gomez

TL;DR

This work tackles the limitation of unidimensional sentiment polarity by introducing TOPol, a semi-unsupervised framework that reconstructs multidimensional semantic polarity fields across HoTL contextual boundaries. It combines transformer-based embeddings, UMAP, and Leiden clustering to produce topic-centric polarity vectors, anchors them into a vector field, and uses a contrastive LLM-based explainability module to label and interpret the dimensions. Empirical results on macroeconomic speeches and Amazon reviews show TOPol captures both non-affective and affective polarity transitions, with varying dominance of sentiment across domains, and demonstrate robustness to perturbations in embedding space and clustering granularity. The approach offers a scalable, interpretable extension of polarity analysis beyond sentiment, enabling context-sensitive analysis of multidimensional narrative shifts with practical applicability to discourse analysis and domain-specific polarity research.

Abstract

Traditional approaches to semantic polarity in computational linguistics treat sentiment as a unidimensional scale, overlooking the multidimensional structure of language. This work introduces TOPol (Topic-Orientation POLarity), a semi-unsupervised framework for reconstructing and interpreting multidimensional narrative polarity fields under human-on-the-loop (HoTL) defined contextual boundaries (CBs). The framework embeds documents using a transformer-based large language model (tLLM), applies neighbor-tuned UMAP projection, and segments topics via Leiden partitioning. Given a CB between discourse regimes A and B, TOPol computes directional vectors between corresponding topic-boundary centroids, yielding a polarity field that quantifies fine-grained semantic displacement during regime shifts. This vectorial representation enables assessing CB quality and detecting polarity changes, guiding HoTL CB refinement. To interpret identified polarity vectors, the tLLM compares their extreme points and produces contrastive labels with estimated coverage. Robustness analyses show that only CB definitions (the main HoTL-tunable parameter) significantly affect results, confirming methodological stability. We evaluate TOPol on two corpora: (i) U.S. Central Bank speeches around a macroeconomic breakpoint, capturing non-affective semantic shifts, and (ii) Amazon product reviews across rating strata, where affective polarity aligns with NRC valence. Results demonstrate that TOPol consistently captures both affective and non-affective polarity transitions, providing a scalable, generalizable, and interpretable framework for context-sensitive multidimensional discourse analysis.

TOPol: Capturing and Explaining Multidimensional Semantic Polarity Fields and Vectors

TL;DR

This work tackles the limitation of unidimensional sentiment polarity by introducing TOPol, a semi-unsupervised framework that reconstructs multidimensional semantic polarity fields across HoTL contextual boundaries. It combines transformer-based embeddings, UMAP, and Leiden clustering to produce topic-centric polarity vectors, anchors them into a vector field, and uses a contrastive LLM-based explainability module to label and interpret the dimensions. Empirical results on macroeconomic speeches and Amazon reviews show TOPol captures both non-affective and affective polarity transitions, with varying dominance of sentiment across domains, and demonstrate robustness to perturbations in embedding space and clustering granularity. The approach offers a scalable, interpretable extension of polarity analysis beyond sentiment, enabling context-sensitive analysis of multidimensional narrative shifts with practical applicability to discourse analysis and domain-specific polarity research.

Abstract

Traditional approaches to semantic polarity in computational linguistics treat sentiment as a unidimensional scale, overlooking the multidimensional structure of language. This work introduces TOPol (Topic-Orientation POLarity), a semi-unsupervised framework for reconstructing and interpreting multidimensional narrative polarity fields under human-on-the-loop (HoTL) defined contextual boundaries (CBs). The framework embeds documents using a transformer-based large language model (tLLM), applies neighbor-tuned UMAP projection, and segments topics via Leiden partitioning. Given a CB between discourse regimes A and B, TOPol computes directional vectors between corresponding topic-boundary centroids, yielding a polarity field that quantifies fine-grained semantic displacement during regime shifts. This vectorial representation enables assessing CB quality and detecting polarity changes, guiding HoTL CB refinement. To interpret identified polarity vectors, the tLLM compares their extreme points and produces contrastive labels with estimated coverage. Robustness analyses show that only CB definitions (the main HoTL-tunable parameter) significantly affect results, confirming methodological stability. We evaluate TOPol on two corpora: (i) U.S. Central Bank speeches around a macroeconomic breakpoint, capturing non-affective semantic shifts, and (ii) Amazon product reviews across rating strata, where affective polarity aligns with NRC valence. Results demonstrate that TOPol consistently captures both affective and non-affective polarity transitions, providing a scalable, generalizable, and interpretable framework for context-sensitive multidimensional discourse analysis.

Paper Structure

This paper contains 20 sections, 11 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Architecture of TOPol
  • Figure 2: TOPol cluster projections under HoTL-defined (left) and randomized (right) contextual boundaries. Blue and red points correspond to documents from regimes A and B, respectively. White squares indicate cluster centroids; arrows represent semantic drift vectors between regime-specific centroids.