MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning

Jingshan Hong; Haigen Hu; Huihuang Zhang; Qianwei Zhou; Zhao Li

MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning

Jingshan Hong, Haigen Hu, Huihuang Zhang, Qianwei Zhou, Zhao Li

TL;DR

MaskAnyNet addresses the loss of information caused by masking in supervised learning by introducing a dual-branch architecture that reuses masked regions as auxiliary knowledge. The mask region reuse branch reconstructs and reintegrates masked content, coupled with a feature fusion and alignment module, enabling both global semantics and local detail learning across CNNs and Transformers. Across CIFAR, ImageNet, and downstream detection/segmentation tasks, MaskAnyNet yields consistent Top-1 gains and ablation studies confirm the contributions of masking, reuse, and fusion. This approach enhances semantic diversity and pixel utilization, offering a practical path to stronger generalization with masked inputs.

Abstract

In supervised learning, traditional image masking faces two key issues: (i) discarded pixels are underutilized, leading to a loss of valuable contextual information; (ii) masking may remove small or critical features, especially in fine-grained tasks. In contrast, masked image modeling (MIM) has demonstrated that masked regions can be reconstructed from partial input, revealing that even incomplete data can exhibit strong contextual consistency with the original image. This highlights the potential of masked regions as sources of semantic diversity. Motivated by this, we revisit the image masking approach, proposing to treat masked content as auxiliary knowledge rather than ignored. Based on this, we propose MaskAnyNet, which combines masking with a relearning mechanism to exploit both visible and masked information. It can be easily extended to any model with an additional branch to jointly learn from the recomposed masked region. This approach leverages the semantic diversity of the masked regions to enrich features and preserve fine-grained details. Experiments on CNN and Transformer backbones show consistent gains across multiple benchmarks. Further analysis confirms that the proposed method improves semantic diversity through the reuse of masked content.

MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning

TL;DR

Abstract

MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)