Table of Contents
Fetching ...

Unsupervised categorization of similarity measures

Yoshiyuki Ohmura, Wataru Shimaya, Yasuo Kuniyoshi

TL;DR

The paper addresses unsupervised categorization of similarity measures across object features by formulating algebraic independence between neural network transformations and linking it to an invariant transformation equation. It proposes a dual-encoder/decoder architecture with two latent transformations operating on separate latent spaces, enforcing a $G_P$–$F$ commutativity to realize independent feature spaces. Through experiments on alphabets rendered in multiple colors and fonts, the method learns two distinct metric spaces—color and shape—demonstrating that single-feature transformations can be learned without supervision and that the latent spaces separate the respective invariances. The work provides a mathematical framework for unsupervised categorization of feature-specific similarity measures and discusses practical limitations and avenues for scaling to more features.

Abstract

In general, objects can be distinguished on the basis of their features, such as color or shape. In particular, it is assumed that similarity judgments about such features can be processed independently in different metric spaces. However, the unsupervised categorization mechanism of metric spaces corresponding to object features remains unknown. Here, we show that the artificial neural network system can autonomously categorize metric spaces through representation learning to satisfy the algebraic independence between neural networks, and project sensory information onto multiple high-dimensional metric spaces to independently evaluate the differences and similarities between features. Conventional methods often constrain the axes of the latent space to be mutually independent or orthogonal. However, the independent axes are not suitable for categorizing metric spaces. High-dimensional metric spaces that are independent of each other are not uniquely determined by the mutually independent axes, because any combination of independent axes can form mutually independent spaces. In other words, the mutually independent axes cannot be used to naturally categorize different feature spaces, such as color space and shape space. Therefore, constraining the axes to be mutually independent makes it difficult to categorize high-dimensional metric spaces. To overcome this problem, we developed a method to constrain only the spaces to be mutually independent and not the composed axes to be independent. Our theory provides general conditions for the unsupervised categorization of independent metric spaces, thus advancing the mathematical theory of functional differentiation of neural networks.

Unsupervised categorization of similarity measures

TL;DR

The paper addresses unsupervised categorization of similarity measures across object features by formulating algebraic independence between neural network transformations and linking it to an invariant transformation equation. It proposes a dual-encoder/decoder architecture with two latent transformations operating on separate latent spaces, enforcing a commutativity to realize independent feature spaces. Through experiments on alphabets rendered in multiple colors and fonts, the method learns two distinct metric spaces—color and shape—demonstrating that single-feature transformations can be learned without supervision and that the latent spaces separate the respective invariances. The work provides a mathematical framework for unsupervised categorization of feature-specific similarity measures and discusses practical limitations and avenues for scaling to more features.

Abstract

In general, objects can be distinguished on the basis of their features, such as color or shape. In particular, it is assumed that similarity judgments about such features can be processed independently in different metric spaces. However, the unsupervised categorization mechanism of metric spaces corresponding to object features remains unknown. Here, we show that the artificial neural network system can autonomously categorize metric spaces through representation learning to satisfy the algebraic independence between neural networks, and project sensory information onto multiple high-dimensional metric spaces to independently evaluate the differences and similarities between features. Conventional methods often constrain the axes of the latent space to be mutually independent or orthogonal. However, the independent axes are not suitable for categorizing metric spaces. High-dimensional metric spaces that are independent of each other are not uniquely determined by the mutually independent axes, because any combination of independent axes can form mutually independent spaces. In other words, the mutually independent axes cannot be used to naturally categorize different feature spaces, such as color space and shape space. Therefore, constraining the axes to be mutually independent makes it difficult to categorize high-dimensional metric spaces. To overcome this problem, we developed a method to constrain only the spaces to be mutually independent and not the composed axes to be independent. Our theory provides general conditions for the unsupervised categorization of independent metric spaces, thus advancing the mathematical theory of functional differentiation of neural networks.

Paper Structure

This paper contains 19 sections, 17 equations, 3 figures.

Figures (3)

  • Figure 1: Information flow. The network consists of two networks, $G_P$ and $G_N$, where $G_P$ is an encoder from the sensory input to the latent space, and $G_N$ is a decoder from the latent space to the sensory input. Transformations $f_0$ and $f_1$ in the latent space transform latent vector $x$ into latent vector $y$.
  • Figure 2: A brief timeline of the algebraic independence learning. (a) Results with commutative learning. $F_0 X$ converged to the shape transformation. $F_1 X$ converged to the color transformation. (b) A typical example of an ablation study. Both color and shape transformations were conducted by only $F_1 X$. (c) Learning changes in invariance in the control condition. (d) Same changes in the ablation condition. Blue, color invariance of $F_0 X$; red, color invariance of $F_1 X$; cyan, shape invariance of $F_0 X$; and magenta, shape invariance of $F_1 X$. (e) Median invariances in the control and ablation groups were 0.8919 and 0.473; the distributions in the two groups differed significantly (Mann–Whitney U = 10000, n = 100, P $<$$2.5 \times 10^{-34}$, two-tailed). (f) Median commutative loss after training in the control and ablation groups were 1.275e-4 and 4.934e-4; the distributions in the two groups also differed significantly (Mann–Whitney U = 1869, n = 100, P $<$$2.025 \times 10^{-4}$, two-tailed).
  • Figure 3: Learned metric spaces. (a) Metric space corresponding to the color transformation. (b) Metric space corresponding to the shape transformation. Sixteen different letters in 32 different colors were plotted to the learned metric spaces. Each space was mapped in 2D using PCA for visualization. Contribution rate: 99.3% in the color space, 64.8% in the shape space. For visualization, the background color was changed from black to white.