Table of Contents
Fetching ...

Probing BERT in Hyperbolic Spaces

Boli Chen, Yao Fu, Guangwei Xu, Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

TL;DR

This work introduces a Poincaré probe that projects BERT embeddings into a hyperbolic subspace to reveal hierarchical linguistic information. By jointly optimizing distances reflecting tree structure and word depths within the hyperbolic space, the approach demonstrates that syntax in BERT may be better captured in a hyperbolic geometry than in Euclidean space, especially for deeper trees and longer sentences, while also extending the framework to a sentiment subspace with lexically-controlled contextualization. Empirical results show the Poincaré probe is not a parser on non-contextual baselines but yields higher syntactic recoveries for deeper, more complex structures and reveals finer-grained sentiment organization. Visualizations and curvature analyses support the interpretation that BERT’s syntactic and semantic information can be organized in hyperbolic geometries, offering new avenues for hierarchical representation learning and probing.

Abstract

Recently, a variety of probing tasks are proposed to discover linguistic properties learned in contextualized word embeddings. Many of these works implicitly assume these embeddings lay in certain metric spaces, typically the Euclidean space. This work considers a family of geometrically special spaces, the hyperbolic spaces, that exhibit better inductive biases for hierarchical structures and may better reveal linguistic hierarchies encoded in contextualized representations. We introduce a Poincare probe, a structural probe projecting these embeddings into a Poincare subspace with explicitly defined hierarchies. We focus on two probing objectives: (a) dependency trees where the hierarchy is defined as head-dependent structures; (b) lexical sentiments where the hierarchy is defined as the polarity of words (positivity and negativity). We argue that a key desideratum of a probe is its sensitivity to the existence of linguistic structures. We apply our probes on BERT, a typical contextualized embedding model. In a syntactic subspace, our probe better recovers tree structures than Euclidean probes, revealing the possibility that the geometry of BERT syntax may not necessarily be Euclidean. In a sentiment subspace, we reveal two possible meta-embeddings for positive and negative sentiments and show how lexically-controlled contextualization would change the geometric localization of embeddings. We demonstrate the findings with our Poincare probe via extensive experiments and visualization. Our results can be reproduced at https://github.com/FranxYao/PoincareProbe.

Probing BERT in Hyperbolic Spaces

TL;DR

This work introduces a Poincaré probe that projects BERT embeddings into a hyperbolic subspace to reveal hierarchical linguistic information. By jointly optimizing distances reflecting tree structure and word depths within the hyperbolic space, the approach demonstrates that syntax in BERT may be better captured in a hyperbolic geometry than in Euclidean space, especially for deeper trees and longer sentences, while also extending the framework to a sentiment subspace with lexically-controlled contextualization. Empirical results show the Poincaré probe is not a parser on non-contextual baselines but yields higher syntactic recoveries for deeper, more complex structures and reveals finer-grained sentiment organization. Visualizations and curvature analyses support the interpretation that BERT’s syntactic and semantic information can be organized in hyperbolic geometries, offering new avenues for hierarchical representation learning and probing.

Abstract

Recently, a variety of probing tasks are proposed to discover linguistic properties learned in contextualized word embeddings. Many of these works implicitly assume these embeddings lay in certain metric spaces, typically the Euclidean space. This work considers a family of geometrically special spaces, the hyperbolic spaces, that exhibit better inductive biases for hierarchical structures and may better reveal linguistic hierarchies encoded in contextualized representations. We introduce a Poincare probe, a structural probe projecting these embeddings into a Poincare subspace with explicitly defined hierarchies. We focus on two probing objectives: (a) dependency trees where the hierarchy is defined as head-dependent structures; (b) lexical sentiments where the hierarchy is defined as the polarity of words (positivity and negativity). We argue that a key desideratum of a probe is its sensitivity to the existence of linguistic structures. We apply our probes on BERT, a typical contextualized embedding model. In a syntactic subspace, our probe better recovers tree structures than Euclidean probes, revealing the possibility that the geometry of BERT syntax may not necessarily be Euclidean. In a sentiment subspace, we reveal two possible meta-embeddings for positive and negative sentiments and show how lexically-controlled contextualization would change the geometric localization of embeddings. We demonstrate the findings with our Poincare probe via extensive experiments and visualization. Our results can be reproduced at https://github.com/FranxYao/PoincareProbe.

Paper Structure

This paper contains 20 sections, 9 equations, 19 figures, 10 tables.

Figures (19)

  • Figure 1: Visualization of different spaces. (A, B) Comparison between trees embedded in Euclidean space and hyperbolic space. We use geodesics, the analogy of straight lines in hyperbolic spaces, to connect nodes in (B). Line/geodesic segments connecting nodes are approximately of the same length in their corresponding spaces. Intuitively, nodes embedded in Euclidean space look more "crowded", while the hyperbolic space allows sufficient capacity to embed trees and enough distances between leaf nodes. (C) A syntax tree embedded in a Poincaré ball. Hierarchy levels correspond to syntactical depths. The higher level a word is in a syntax tree, the closer it is to the origin. (D) Sentiment words embedded in a Poincaré ball. Hierarchy is defined as the sentiment polarity. We assume two meta [POS] and [NEG] embeddings at the highest level. Words with stronger sentiments are closer to their corresponding meta-embeddings.
  • Figure 2: Comparison between the two probes. (A) Middle layered embeddings show richer syntactic information. (B) All probes recover syntax best at approximately rank 64 and Poincaré probes are especially better at low ranks. (C) Poincaré probes recover syntax better for longer sentences. (D) As the curvature goes closer to 0, Poincaré probes behave more similar to Euclidean probes.
  • Figure 3: Left: comparison of edge length distributions. Distribution of the Poincaré probe aligns better with the ground truth than the Euclidean probe. Right: edge prediction recall of top longest edge types. The Poincaré probe is especially better at recovering edges of longer average length.
  • Figure 4: PCA projection of dependency trees for the sentence it was a pretty wild day. Yellow lines/geodesics denote the ground truth and blue dashed lines/geodesics are predicted by the probe. Blue points denote root words of sentences. Word depths are clearly organized in the Poincar√© ball (D) than the Euclidean space (C). The closer a word is to the origin, the upper level it is in the tree.
  • Figure 5: (A, B) PCA projection of sentence a good-looking but ultimatelypointless political thriller with plenty of action and almost no substance. Words are connected to closer meta-embeddings. Words with dashed lines mean that the differences between their distances to two embeddings are not significant (neutral words). (C) Layerwise accuracy. Sentiment emerge at deeper layers (aroung layer 9) than syntax (around layer 7).
  • ...and 14 more figures