Table of Contents
Fetching ...

Improving Deep Representation Learning via Auxiliary Learnable Target Coding

Kangjun Liu, Ke Chen, Kui Jia, Yaowei Wang

TL;DR

Addresses the limitation of fixed target codes in supervised deep learning under class imbalance by introducing Learnable Target Coding (LTC) as an auxiliary regularizer that learns class-specific target codes and enforces geometric properties via a margin-based triplet loss and a correlation-consistency loss. The method encompasses both a Hadamard-target-code regularization (HTC) baseline and a fully learnable LTC variant with a learnable weight matrix and STE-based binarization. Extensive experiments on fine-grained classification and retrieval, plus imbalanced-data benchmarks, demonstrate consistent improvements over baselines and competitive results against transformer-based approaches; results and code are provided.

Abstract

Deep representation learning is a subfield of machine learning that focuses on learning meaningful and useful representations of data through deep neural networks. However, existing methods for semantic classification typically employ pre-defined target codes such as the one-hot and the Hadamard codes, which can either fail or be less flexible to model inter-class correlation. In light of this, this paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning, which can not only incorporate latent dependency across classes but also impose geometric properties of target codes into representation space. Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations owing to enlarging between-class margins in representation space and favoring equal semantic correlation of learnable target codes respectively. Experimental results on several popular visual classification and retrieval benchmarks can demonstrate the effectiveness of our method on improving representation learning, especially for imbalanced data. Source codes are made publicly available at \href{https://github.com/AkonLau/LTC}{https://github.com/AkonLau/LTC}.

Improving Deep Representation Learning via Auxiliary Learnable Target Coding

TL;DR

Addresses the limitation of fixed target codes in supervised deep learning under class imbalance by introducing Learnable Target Coding (LTC) as an auxiliary regularizer that learns class-specific target codes and enforces geometric properties via a margin-based triplet loss and a correlation-consistency loss. The method encompasses both a Hadamard-target-code regularization (HTC) baseline and a fully learnable LTC variant with a learnable weight matrix and STE-based binarization. Extensive experiments on fine-grained classification and retrieval, plus imbalanced-data benchmarks, demonstrate consistent improvements over baselines and competitive results against transformer-based approaches; results and code are provided.

Abstract

Deep representation learning is a subfield of machine learning that focuses on learning meaningful and useful representations of data through deep neural networks. However, existing methods for semantic classification typically employ pre-defined target codes such as the one-hot and the Hadamard codes, which can either fail or be less flexible to model inter-class correlation. In light of this, this paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning, which can not only incorporate latent dependency across classes but also impose geometric properties of target codes into representation space. Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations owing to enlarging between-class margins in representation space and favoring equal semantic correlation of learnable target codes respectively. Experimental results on several popular visual classification and retrieval benchmarks can demonstrate the effectiveness of our method on improving representation learning, especially for imbalanced data. Source codes are made publicly available at \href{https://github.com/AkonLau/LTC}{https://github.com/AkonLau/LTC}.
Paper Structure (18 sections, 9 equations, 8 figures, 11 tables, 1 algorithm)

This paper contains 18 sections, 9 equations, 8 figures, 11 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparison with the one-hot code, the Hadamard code, and the proposed Learnable code. The projected representations on the top are extracted from the same last block of ResNet backbone of all three methods by the t-SNE Maaten2008VisualizingDU, where the classes id 141-147 represent seven different kinds of Terns from the CUB benchmark wah2011caltech. It is observed that the model trained with our proposed LTC can gain better representations owing to discovering inter-class relation (see \ref{['subsec:LTC']}) and also imposing geometric properties of target codes (see \ref{['subsec:semantic-correlation']}), which thus demonstrates the effectiveness of our proposed LTC method in making sample features more compact within the same category while increasing the difference between different classes. Note that, the symbol "$v$" and "$s$" represent the semantic and target code vectors, respectively.
  • Figure 2: The pipeline of regularization with auxiliary Hadamard target codes. The lower is a cascade of representation learning and classification, while the upper is the HTC regularization module, which is not limited to specific classification networks and can be readily applied to other classifiers. The semantic encoder aims to incorporate latent correlation across classes by the Hadamard codes into semantic representations.
  • Figure 3: The module of our proposed LTC regularizer, which can replace the pre-defined Hadamard target codes with learnable parameters. For imposing geometric properties, the learnable codes can be further constrained with a margin-based triplet loss for pulling samples from the same category closer and pushing samples from different categories farther. Moreover, a correlation consistency loss can ensure that the learnable target codes of different categories are orthogonally optimized with consistent semantic correlation. The semantic encoder and the main branch are similar to those shown in \ref{['fig:framework1']}.
  • Figure 4: Examples of four different Tern species from the CUB wah2011caltech. One species in each row is with different instances. We can find that there are usually existing small inter-class variations among different fine-grained species and large intra-class variations in the same class.
  • Figure 5: Effects of hyper-parameters for the HTC constraint with an auxiliary MSE loss, and the margin-based triplet loss $L_\text{triplet}$ and the target correlation consistency loss $L_\text{corr}$ in our proposed LTC method.
  • ...and 3 more figures