Dynamic Latent Separation for Deep Learning
Yi-Lin Tuan, Zih-Yun Chiu, William Yang Wang
TL;DR
The paper introduces atom modeling, a general, latent-space–free method to enhance expressiveness by learning token importances for sub-components within data and dynamically distancing data samples according to intra-sample structure. A Coulomb-inspired loss $\mathcal{L}_\mathcal{A}$ governs inter-sample separation via atom-like quantities $\bar{d}_{AB}$ and $d_{ij}$, with a soft constraint system to stabilize training. The approach yields improved performance across synthetic classification, GAN-based image generation, CNN-based image classification, and Transformer-based text classification, while also offering partial interpretability through token importances. This method is model-agnostic, scales to large architectures, and provides a practical path to richer representations and output diversity without latent-space supervision.
Abstract
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data that involves multiple sub-components in a flexible and interpretable fashion. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications. The key idea is to dynamically distance data samples in the latent space and thus enhance the output diversity. Our dynamic latent separation method, inspired by atomic physics, relies on the jointly learned structures of each data sample, which also reveal the importance of each sub-component for distinguishing data samples. This approach, atom modeling, requires no supervision of the latent space and allows us to learn extra partially interpretable representations besides the original goal of a model. We empirically demonstrate that the algorithm also enhances the performance of small to larger-scale models in various classification and generation problems.
