IMC-Net: A Lightweight Content-Conditioned Encoder with Multi-Pass Processing for Image Classification
YiZhou Li
TL;DR
IMC-Net addresses inefficiency in fixed-depth encoders by introducing content-conditioned multi-pass processing driven by region-wise scores. A single lightweight core block is re-applied selectively, with a percentile-based mask and a compact representation cache to deliver input-conditioned depth while maintaining a minimal architectural footprint. The approach yields competitive accuracy on ImageNet and transfer tasks with substantially reduced parameters and FLOPs, alongside higher throughput, without relying on distillation or large-scale pretraining. This deployment-friendly design demonstrates robust generalization and offers a practical path toward scalable, resource-efficient visual recognition.
Abstract
We present a compact encoder for image categorization that emphasizes computation economy through content-conditioned multi-pass processing. The model employs a single lightweight core block that can be re-applied a small number of times, while a simple score-based selector decides whether further passes are beneficial for each region unit in the feature map. This design provides input-conditioned depth without introducing heavy auxiliary modules or specialized pretraining. On standard benchmarks, the approach attains competitive accuracy with reduced parameters, lower floating-point operations, and faster inference compared to similarly sized baselines. The method keeps the architecture minimal, implements module reuse to control footprint, and preserves stable training via mild regularization on selection scores. We discuss implementation choices for efficient masking, pass control, and representation caching, and show that the multi-pass strategy transfers well to several datasets without requiring task-specific customization.
