Table of Contents
Fetching ...

Latent Enhancing AutoEncoder for Occluded Image Classification

Ketan Kotwal, Tanay Deshmukh, Preeti Gopal

TL;DR

This paper tackles occlusion-robust image classification where large, unseen occluders induce significant OOD shifts. It introduces LEARN, an autoencoder that operates on the backbone's latent features and is inserted before the classifier head without modifying the backbone weights, trained with occluded and clean data. LEARN uses a multi-term latent-space loss: $\mathcal{L}_{\text{rec-f}}$ for reconstructing occluded features, $\mathcal{L}_{\text{intra-z}}$ to keep same-class latents close, $\mathcal{L}_{\text{inter-z}}$ to separate different classes, and $\mathcal{L}_{\text{cls-o}}$ for alignment with the classifier, combined as $\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{rec-f}} + \lambda_{\text{intra-z}} \mathcal{L}_{\text{intra-z}} + \lambda_{\text{inter-z}} \mathcal{L}_{\text{inter-z}} + \lambda_{\text{cls-o}} \mathcal{L}_{\text{cls-o}}$. Experiments on OccludedPASCAL3D+ and MS-COCO Occluded Vehicles show large gains over baselines and competitive state-of-the-art methods, with strong cross-dataset transfer and preserved accuracy on clean data, while using a small parameter count. This approach offers practical, plug-in robustness for real-world systems with minimal backbone modification.

Abstract

Large occlusions result in a significant decline in image classification accuracy. During inference, diverse types of unseen occlusions introduce out-of-distribution data to the classification model, leading to accuracy dropping as low as 50%. As occlusions encompass spatially connected regions, conventional methods involving feature reconstruction are inadequate for enhancing classification performance. We introduce LEARN: Latent Enhancing feAture Reconstruction Network -- An auto-encoder based network that can be incorporated into the classification model before its classifier head without modifying the weights of classification model. In addition to reconstruction and classification losses, training of LEARN effectively combines intra- and inter-class losses calculated over its latent space -- which lead to improvement in recovering latent space of occluded data, while preserving its class-specific discriminative information. On the OccludedPASCAL3D+ dataset, the proposed LEARN outperforms standard classification models (VGG16 and ResNet-50) by a large margin and up to 2% over state-of-the-art methods. In cross-dataset testing, our method improves the average classification accuracy by more than 5% over the state-of-the-art methods. In every experiment, our model consistently maintains excellent accuracy on in-distribution data.

Latent Enhancing AutoEncoder for Occluded Image Classification

TL;DR

This paper tackles occlusion-robust image classification where large, unseen occluders induce significant OOD shifts. It introduces LEARN, an autoencoder that operates on the backbone's latent features and is inserted before the classifier head without modifying the backbone weights, trained with occluded and clean data. LEARN uses a multi-term latent-space loss: for reconstructing occluded features, to keep same-class latents close, to separate different classes, and for alignment with the classifier, combined as . Experiments on OccludedPASCAL3D+ and MS-COCO Occluded Vehicles show large gains over baselines and competitive state-of-the-art methods, with strong cross-dataset transfer and preserved accuracy on clean data, while using a small parameter count. This approach offers practical, plug-in robustness for real-world systems with minimal backbone modification.

Abstract

Large occlusions result in a significant decline in image classification accuracy. During inference, diverse types of unseen occlusions introduce out-of-distribution data to the classification model, leading to accuracy dropping as low as 50%. As occlusions encompass spatially connected regions, conventional methods involving feature reconstruction are inadequate for enhancing classification performance. We introduce LEARN: Latent Enhancing feAture Reconstruction Network -- An auto-encoder based network that can be incorporated into the classification model before its classifier head without modifying the weights of classification model. In addition to reconstruction and classification losses, training of LEARN effectively combines intra- and inter-class losses calculated over its latent space -- which lead to improvement in recovering latent space of occluded data, while preserving its class-specific discriminative information. On the OccludedPASCAL3D+ dataset, the proposed LEARN outperforms standard classification models (VGG16 and ResNet-50) by a large margin and up to 2% over state-of-the-art methods. In cross-dataset testing, our method improves the average classification accuracy by more than 5% over the state-of-the-art methods. In every experiment, our model consistently maintains excellent accuracy on in-distribution data.
Paper Structure (5 sections, 5 equations, 3 figures, 4 tables)

This paper contains 5 sections, 5 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Examples of clean and occluded images from OccludedPASCAL3D+ dataset. The clean images are at the top row, and their occluded versions with patches of noise (of level 5), texture (of level 5) and random objects (of level 9) are shown in subsequent rows respectively.
  • Figure 2: The schematic of the proposed LEARN: (a) shows the overall training pipeline along with loss functions, and (b) provides simple illustration of individual loss components. The green block depicts LEARN in the form of AutoEncoder.
  • Figure 3: t-SNE plots of the features inputted to the classifier head of the VGG16 backbone. Images with object occlusions of level L3 (60--80%) from the Pascal test dataset are considered.