Structured Probabilistic Coding

Dou Hu; Lingwei Wei; Yaxin Liu; Wei Zhou; Songlin Hu

Structured Probabilistic Coding

Dou Hu, Lingwei Wei, Yaxin Liu, Wei Zhou, Songlin Hu

TL;DR

Structured Probabilistic Coding (SPC) introduces an encoder-only probabilistic coding framework that maps inputs to Gaussian latent variables and jointly optimizes task prediction with a probabilistic encoding objective. A structured regularization term derived from the target label space promotes uniformity across classes in the latent space, enabling better task-specific information capture while preserving Gaussian structure. Empirical results across 12 natural language understanding tasks show SPC improves classification and regression performance, enhances generalization to limited data and out-of-distribution domains, and yields more compact, cluster-friendly representations. The approach offers a robust, encoder-centric alternative to encoder-decoder IB methods, with practical impact for improving pre-trained language models under noise, data scarcity, and domain shifts.

Abstract

This paper presents a new supervised representation learning framework, namely structured probabilistic coding (SPC), to learn compact and informative representations from input related to the target task. SPC is an encoder-only probabilistic coding technology with a structured regularization from the target space. It can enhance the generalization ability of pre-trained language models for better language understanding. Specifically, our probabilistic coding simultaneously performs information encoding and task prediction in one module to more fully utilize the effective information from input data. It uses variational inference in the output space to reduce randomness and uncertainty. Besides, to better control the learning process of probabilistic representations, a structured regularization is proposed to promote uniformity across classes in the latent space. With the regularization term, SPC can preserve the Gaussian structure of the latent code and achieve better coverage of the hidden space with class uniformly. Experimental results on 12 natural language understanding tasks demonstrate that our SPC effectively improves the performance of pre-trained language models for classification and regression. Extensive experiments show that SPC can enhance the generalization capability, robustness to label noise, and clustering quality of output representations.

Structured Probabilistic Coding

TL;DR

Abstract

Paper Structure (32 sections, 5 equations, 5 figures, 8 tables)

This paper contains 32 sections, 5 equations, 5 figures, 8 tables.

Introduction
Methodology
Probabilistic Coding
Structured Regularization
Structured Probabilistic Coding
Applications for Downstream Tasks
Experiments
Experimental Setups
Datasets and Downstream Tasks
Comparison Methods
Evaluation Metrics
Implementation Details
Overall Results
Performance on Classification Tasks
Performance on Regression Tasks
...and 17 more sections

Figures (5)

Figure 1: Comparison of our SPC with existing deterministic embedding and probabilistic embedding methods.
Figure 2: Results of different methods against different sizes of training set with RoBERTa backbone.
Figure 3: Clustering performances of the output representations learned by different optimization objectives. Silhouette coefficient (SC) and adjusted rand index (ARI) are used to measure data-related and task-related clustering abilities, respectively. We experiment with RoBERTa backbone.
Figure 4: Performance against different trade-off weights $\beta$ of probabilistic coding for classification tasks. The experiments are conducted with RoBERTa backbone. The grey line represents the results of CE baseline.
Figure 5: Performance of the optimal trade-off weight $\gamma$ for classification tasks. We experiment with RoBERTa backbone. The Y-axis refers to relative improvements between SPC and its variant removing the structured regularization.

Structured Probabilistic Coding

TL;DR

Abstract

Structured Probabilistic Coding

Authors

TL;DR

Abstract

Table of Contents

Figures (5)