Table of Contents
Fetching ...

Learning Robust Representations by Projecting Superficial Statistics Out

Haohan Wang, Zexue He, Zachary C. Lipton, Eric P. Xing

TL;DR

The paper tackles distribution shift and domain generalization by separating superficial texture statistics from semantic content. It introduces a differentiable neural gray-level co-occurrence matrix (NGLCM) to capture texture and a HEX framework to project out this texture information, forcing the model to rely on semantics. Two training strategies are proposed: a reverse-gradient adversarial variant and a projection-based orthogonalization, enabling robust out-of-domain performance without access to target-domain data. Across synthetic tests, nuisance-background tasks, pattern-attached MNIST, MNIST-rotation, and PACS, HEX achieves competitive or superior generalization compared to methods that require target-domain samples, while acknowledging limitations such as imperfect texture suppression and potential trivial solutions.

Abstract

Despite impressive performance as evaluated on i.i.d. holdout data, deep neural networks depend heavily on superficial statistics of the training data and are liable to break under distribution shift. For example, subtle changes to the background or texture of an image can break a seemingly powerful classifier. Building on previous work on domain generalization, we hope to produce a classifier that will generalize to previously unseen domains, even when domain identifiers are not available during training. This setting is challenging because the model may extract many distribution-specific (superficial) signals together with distribution-agnostic (semantic) signals. To overcome this challenge, we incorporate the gray-level co-occurrence matrix (GLCM) to extract patterns that our prior knowledge suggests are superficial: they are sensitive to the texture but unable to capture the gestalt of an image. Then we introduce two techniques for improving our networks' out-of-sample performance. The first method is built on the reverse gradient method that pushes our model to learn representations from which the GLCM representation is not predictable. The second method is built on the independence introduced by projecting the model's representation onto the subspace orthogonal to GLCM representation's. We test our method on the battery of standard domain generalization data sets and, interestingly, achieve comparable or better performance as compared to other domain generalization methods that explicitly require samples from the target distribution for training.

Learning Robust Representations by Projecting Superficial Statistics Out

TL;DR

The paper tackles distribution shift and domain generalization by separating superficial texture statistics from semantic content. It introduces a differentiable neural gray-level co-occurrence matrix (NGLCM) to capture texture and a HEX framework to project out this texture information, forcing the model to rely on semantics. Two training strategies are proposed: a reverse-gradient adversarial variant and a projection-based orthogonalization, enabling robust out-of-domain performance without access to target-domain data. Across synthetic tests, nuisance-background tasks, pattern-attached MNIST, MNIST-rotation, and PACS, HEX achieves competitive or superior generalization compared to methods that require target-domain samples, while acknowledging limitations such as imperfect texture suppression and potential trivial solutions.

Abstract

Despite impressive performance as evaluated on i.i.d. holdout data, deep neural networks depend heavily on superficial statistics of the training data and are liable to break under distribution shift. For example, subtle changes to the background or texture of an image can break a seemingly powerful classifier. Building on previous work on domain generalization, we hope to produce a classifier that will generalize to previously unseen domains, even when domain identifiers are not available during training. This setting is challenging because the model may extract many distribution-specific (superficial) signals together with distribution-agnostic (semantic) signals. To overcome this challenge, we incorporate the gray-level co-occurrence matrix (GLCM) to extract patterns that our prior knowledge suggests are superficial: they are sensitive to the texture but unable to capture the gestalt of an image. Then we introduce two techniques for improving our networks' out-of-sample performance. The first method is built on the reverse gradient method that pushes our model to learn representations from which the GLCM representation is not predictable. The second method is built on the independence introduced by projecting the model's representation onto the subspace orthogonal to GLCM representation's. We test our method on the battery of standard domain generalization data sets and, interestingly, achieve comparable or better performance as compared to other domain generalization methods that explicitly require samples from the target distribution for training.

Paper Structure

This paper contains 22 sections, 13 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Example illustration of train/validation/test data. The first row is "happiness" sentiment and the second row is "sadness" sentiment. The background and sentiment labels are correlated in training and validation set, but independent in testing set.
  • Figure 2: Introduction of Neural Gray-level Co-occurrence Matrix (NGLCM) and HEX.
  • Figure 3: Averaged testing accuracy and standard deviation of five repeated experiments with different correlation level on sentiment with nuisance background data. Notations: baseline CNN (B), Ablation Tests (M (replacing NGLCM with MLP) and N (training without HEX projection)), ADVE (E), ADV (A), HEX (H), HEX-ADV (V), HEX-ALL (L), DANN (D), and InfoDropout (I).
  • Figure 4: Averaged testing accuracy and standard deviation of five repeated experiments with different strategies of attaching patterns to MNIST data. Notations: baseline CNN (B), Ablation Tests (M (replacing NGLCM with MLP) and N (training without HEX projection)), ADVE (E), ADV (A), HEX (H), HEX-ADV (V), HEX-ALL (L), DANN (D), and InfoDropout (I).
  • Figure A1: A closer look of Office data set, we visualize the first 10 images of each data set. We show 12 labels out of 31 labels, but the story of the rest labels are similar to what we have shown here. From the images, we can clearly see that many images of DSLR and Webcam share the similar background, while the images of Amazon have a distinct background. Top row: Amazon, middle row: DSLR, bottom row: Webcam
  • ...and 2 more figures