Table of Contents
Fetching ...

Dynamic Latent Separation for Deep Learning

Yi-Lin Tuan, Zih-Yun Chiu, William Yang Wang

TL;DR

The paper introduces atom modeling, a general, latent-space–free method to enhance expressiveness by learning token importances for sub-components within data and dynamically distancing data samples according to intra-sample structure. A Coulomb-inspired loss $\mathcal{L}_\mathcal{A}$ governs inter-sample separation via atom-like quantities $\bar{d}_{AB}$ and $d_{ij}$, with a soft constraint system to stabilize training. The approach yields improved performance across synthetic classification, GAN-based image generation, CNN-based image classification, and Transformer-based text classification, while also offering partial interpretability through token importances. This method is model-agnostic, scales to large architectures, and provides a practical path to richer representations and output diversity without latent-space supervision.

Abstract

A core problem in machine learning is to learn expressive latent variables for model prediction on complex data that involves multiple sub-components in a flexible and interpretable fashion. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications. The key idea is to dynamically distance data samples in the latent space and thus enhance the output diversity. Our dynamic latent separation method, inspired by atomic physics, relies on the jointly learned structures of each data sample, which also reveal the importance of each sub-component for distinguishing data samples. This approach, atom modeling, requires no supervision of the latent space and allows us to learn extra partially interpretable representations besides the original goal of a model. We empirically demonstrate that the algorithm also enhances the performance of small to larger-scale models in various classification and generation problems.

Dynamic Latent Separation for Deep Learning

TL;DR

The paper introduces atom modeling, a general, latent-space–free method to enhance expressiveness by learning token importances for sub-components within data and dynamically distancing data samples according to intra-sample structure. A Coulomb-inspired loss governs inter-sample separation via atom-like quantities and , with a soft constraint system to stabilize training. The approach yields improved performance across synthetic classification, GAN-based image generation, CNN-based image classification, and Transformer-based text classification, while also offering partial interpretability through token importances. This method is model-agnostic, scales to large architectures, and provides a practical path to richer representations and output diversity without latent-space supervision.

Abstract

A core problem in machine learning is to learn expressive latent variables for model prediction on complex data that involves multiple sub-components in a flexible and interpretable fashion. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications. The key idea is to dynamically distance data samples in the latent space and thus enhance the output diversity. Our dynamic latent separation method, inspired by atomic physics, relies on the jointly learned structures of each data sample, which also reveal the importance of each sub-component for distinguishing data samples. This approach, atom modeling, requires no supervision of the latent space and allows us to learn extra partially interpretable representations besides the original goal of a model. We empirically demonstrate that the algorithm also enhances the performance of small to larger-scale models in various classification and generation problems.
Paper Structure (26 sections, 6 theorems, 27 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 26 sections, 6 theorems, 27 equations, 9 figures, 8 tables, 1 algorithm.

Key Result

Lemma 1

A G-Lipschitz function $o(\cdot)$ and a K-Lipschitz inverse function of $o(\cdot)$ returns the output space distance such that: where $\mathbf{v}$ and $\mathbf{u}$ are any vector in the latent space.

Figures (9)

  • Figure 1: Illustration of atom modeling use case. Consider a model $f_\theta = o(\ell(\mathbf{z}))$; data samples are transformed into the latent space and their latent representations are distanced using atom modeling associated with the training criterion for output $y$. The colors labeled on each image in the latent space present the learned token importance that indicates which part is more crucial to identify data samples.
  • Figure 2: $\mathcal{L}_\mathcal{A}$ with varied atomic structure similarity $k$. The distance having the minimum loss depends on the intra-sample structures. As the structures are more similar (decayed $k$), the minimum loss distance becomes larger. Simultaneously, the distance cannot be zero.
  • Figure 3: (a) Visualization of the latent space of synthetic data by only cross-entropy training loss or integrated with hinge losses (L1 and L2), SimCLR, or Atom Modeling. Blue and red indicates the two ground-truth classes. The dashed circles annotate the overlaps of the learned representaions from different classes, which is correlated with the easiness to classify the samples. Atom modeling separates representations with a gap using no latent supervision. (b) Visualization of token importance.
  • Figure 4: Comparison among cross-entropy, p-norm distance, SimCLR, and atom modeling.
  • Figure 5: Examples of generated images and the learned token importance by atom modeling on unconditional image generation. The distributions show that importance score close to one indicates it is a crucial part of the image to distinguish from others.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Lemma 1
  • Theorem 1
  • Theorem 2