Table of Contents
Fetching ...

Axis Tour: Word Tour Determines the Order of Axes in ICA-transformed Embeddings

Hiroaki Yamagiwa, Yusuke Takase, Hidetoshi Shimodaira

TL;DR

Axis Tour addresses the arbitrariness of axis ordering in ICA-transformed word embeddings by proposing an axis-order optimization that maximizes semantic continuity. It constructs axis embeddings from the top-$k$ words per axis and solves a traveling salesman problem to order axes, then reduces dimensionality by projecting consecutive axes weighted by axis skewness. Empirical results across static and dynamic embeddings show improved axis continuity and competitive downstream performance in analogy, similarity, and categorization tasks, with GPT-model evaluations confirming more coherent axis relationships. This work enhances interpretability and usability of ICA-based word embeddings, offering a principled approach to preserve axis similarities in low-dimensional representations.

Abstract

Word embedding is one of the most important components in natural language processing, but interpreting high-dimensional embeddings remains a challenging problem. To address this problem, Independent Component Analysis (ICA) is identified as an effective solution. ICA-transformed word embeddings reveal interpretable semantic axes; however, the order of these axes are arbitrary. In this study, we focus on this property and propose a novel method, Axis Tour, which optimizes the order of the axes. Inspired by Word Tour, a one-dimensional word embedding method, we aim to improve the clarity of the word embedding space by maximizing the semantic continuity of the axes. Furthermore, we show through experiments on downstream tasks that Axis Tour yields better or comparable low-dimensional embeddings compared to both PCA and ICA.

Axis Tour: Word Tour Determines the Order of Axes in ICA-transformed Embeddings

TL;DR

Axis Tour addresses the arbitrariness of axis ordering in ICA-transformed word embeddings by proposing an axis-order optimization that maximizes semantic continuity. It constructs axis embeddings from the top- words per axis and solves a traveling salesman problem to order axes, then reduces dimensionality by projecting consecutive axes weighted by axis skewness. Empirical results across static and dynamic embeddings show improved axis continuity and competitive downstream performance in analogy, similarity, and categorization tasks, with GPT-model evaluations confirming more coherent axis relationships. This work enhances interpretability and usability of ICA-based word embeddings, offering a principled approach to preserve axis similarities in low-dimensional representations.

Abstract

Word embedding is one of the most important components in natural language processing, but interpreting high-dimensional embeddings remains a challenging problem. To address this problem, Independent Component Analysis (ICA) is identified as an effective solution. ICA-transformed word embeddings reveal interpretable semantic axes; however, the order of these axes are arbitrary. In this study, we focus on this property and propose a novel method, Axis Tour, which optimizes the order of the axes. Inspired by Word Tour, a one-dimensional word embedding method, we aim to improve the clarity of the word embedding space by maximizing the semantic continuity of the axes. Furthermore, we show through experiments on downstream tasks that Axis Tour yields better or comparable low-dimensional embeddings compared to both PCA and ICA.
Paper Structure (65 sections, 28 equations, 36 figures, 14 tables)

This paper contains 65 sections, 28 equations, 36 figures, 14 tables.

Figures (36)

  • Figure 1: Scatterplots of normalized ICA-transformed word embeddings whose axes are ordered by Axis Tour and Skewness Sort. In the upper part, Axis Tour is applied to 300-dimensional GloVe, with nine consecutive axes arranged counterclockwise. In the lower part, these nine axes are rearranged clockwise in descending order of skewness. The embeddings are projected onto two dimensions along these axes. The top five embeddings on each axis are labeled by their words. Each word is assigned the color of the axis with the highest value. In both cases, words that cross the horizontal axes are not displayed. Refer to Appendix \ref{['app:fig-explanation']} for more details.
  • Figure 2: Histogram of $\cos(\mathbf{v}_\ell, \mathbf{v}_{\ell+1})$. As an additional baseline, we sampled 300 random words from the Random Order embeddings and arranged them in random order. The dashed lines represent the average similarity for each method. The distribution for Axis Tour shifts towards a more positive mean, while the others roughly follow a normal distribution with means close to 0. For more details, refer to Appendix \ref{['app:dist-cos-sim']}.
  • Figure 3: Comparison of the number of related axes in the GPT models. In each model, Axis Tour exhibits a greater number of related axes than Skewness Sort.
  • Figure 4: The performance of dimensionality reduction for embeddings. Each value represents the average of 30 analogy tasks, 8 word similarity tasks, or 6 categorization tasks. See Appendix \ref{['app:experiments']} for detailed experimental results.
  • Figure 5: Relationship between the skewness $\gamma_\ell$ and the average of two consecutive cosines $(\cos(\mathbf{v}_{\ell-1}, \mathbf{v}_\ell) + \cos(\mathbf{v}_\ell, \mathbf{v}_{\ell+1}))/2$ for all the axes $\ell=1,\ldots, d$ in (a) Axis Tour and (b) Skewness Sort. The left plot shows the skewness and the average of two cosines on both $y$-axes, while the right plot shows the scatter plot of these values. Spearman's rank correlation is 0.43 for Axis Tour, while it is 0.04 for Skewness Sort.
  • ...and 31 more figures