Understanding Higher-Order Correlations Among Semantic Components in Embeddings

Momose Oyama; Hiroaki Yamagiwa; Hidetoshi Shimodaira

Understanding Higher-Order Correlations Among Semantic Components in Embeddings

Momose Oyama, Hiroaki Yamagiwa, Hidetoshi Shimodaira

TL;DR

This work quantified non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them, along with many words sharing common meanings with both components.

Abstract

Independent Component Analysis (ICA) offers interpretable semantic components of embeddings. While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them, along with many words sharing common meanings with both components. The entire structure of non-independencies was visualized using a maximum spanning tree of semantic components. These findings provide deeper insights into embeddings through ICA.

Understanding Higher-Order Correlations Among Semantic Components in Embeddings

TL;DR

Abstract

Paper Structure (41 sections, 4 equations, 12 figures, 9 tables)

This paper contains 41 sections, 4 equations, 12 figures, 9 tables.

Introduction
Review: ICA-Transformed Embeddings
Procedure of ICA.
Comparison of PCA and ICA.
Higher-Order Correlations Among Estimated Independent Components
Non-Independence in Real-World Data.
Higher-Order Correlation.
Interpretation of Higher-Order Correlations as Semantic Relevance
Degree of Semantic Relevance
Results: Top Row of Table \ref{['tab:example_energy']}.
Results: Bottom Row of Table \ref{['tab:example_energy']}.
Quantitative Evaluation via GPT-4o mini
Settings.
Results and Discussion.
Decomposition of Semantic Relevance
...and 26 more sections

Figures (12)

Figure 1: Heatmap visualization of 300-dimensional SGNS embeddings transformed by PCA and ICA, with axes sorted by variance and skewness, respectively. Each embedding has been normalized to have a norm of 1 for better visual interpretation. For each axis, the top 4 words (frequency $n_w \geq 100$ in text8) with largest component values were used. The first 100 axes are displayed in the top panels, and the first 5 axes with the word labels are displayed in the bottom panels. See Appendices \ref{['appendix:settings']}, \ref{['app:axis57']} and \ref{['appendix:experiment_results']} for details.
Figure 2: Scatterplots of normalized word embeddings along the 10th and 20th axes. The axes for PCA and ICA-transformed embeddings were arranged in descending order of variance and skewness, respectively. In both transformations, the components are uncorrelated.
Figure 3: Heatmaps of the correlation coefficient $\mathrm{E}(S_{i}S_{j})$ and the higher-order correlation $\mathrm{E}(S_{i}^2 S_{j}^2)$ of component pairs $(S_{i},S_{j})$ from ICA on 300-dimensional SGNS embeddings. See Appendix \ref{['app:higher-order-correlations']} for details.
Figure 4: Scatter plots of normalized word embeddings for axis pairs (10, 2) and (27, 64) with large values of higher-order correlations. Blue-labeled words are the top 4 words for each axis's component values, while red-labeled words are the top 6 words for the values of $\mathbf{S}_{t,i}^{2} \mathbf{S}_{t,j}^{2}$. See Appendix \ref{['app:higher-order-correlations']} for all the pairs in Table \ref{['tab:xy_words']}.
Figure 5: Subtrees of MST $T_{150}$ defined in Sec. \ref{['sec:visualization']}. Each node represents an independent component $S_k$ (i.e., Axis $k$) estimated by ICA. The label of each node is "$k$ : $\mathrm{TopWord}(k)$", where $\mathrm{TopWord}(k)$ is the word with the largest component value along axis $k$ among words with frequency $n_w \geq 100$ in the text8 corpus. The color of the edge between nodes $(i, j)$ represents the magnitude of the $\mathrm{E}(S_{i}^2 S_{j}^2)$ value between the components, with darker edge colors indicating larger values.
...and 7 more figures

Understanding Higher-Order Correlations Among Semantic Components in Embeddings

TL;DR

Abstract

Understanding Higher-Order Correlations Among Semantic Components in Embeddings

Authors

TL;DR

Abstract

Table of Contents

Figures (12)