LeMoRe: Learn More Details for Lightweight Semantic Segmentation
Mian Muhammad Naeem Abid, Nancy Mehta, Zongwei Wu, Radu Timofte
TL;DR
LeMoRe tackles the efficiency–accuracy trade-off in semantic segmentation by integrating explicit Cartesian views with implicitly learned views via Nested Attention. It introduces three components—Cartesian Encoder, Nested Attention, and a Gated Fusion Module—to enable multiview feature modeling with reduced computation and memory. Across ADE20K, CityScapes, PASCAL Context, and COCO-Stuff, LeMoRe delivers competitive accuracy while achieving substantial GFLOPs and parameter reductions, outperforming many lightweight baselines. This explicit–implicit multiview approach offers a practical path toward real-time segmentation on resource-constrained devices.
Abstract
Lightweight semantic segmentation is essential for many downstream vision tasks. Unfortunately, existing methods often struggle to balance efficiency and performance due to the complexity of feature modeling. Many of these existing approaches are constrained by rigid architectures and implicit representation learning, often characterized by parameter-heavy designs and a reliance on computationally intensive Vision Transformer-based frameworks. In this work, we introduce an efficient paradigm by synergizing explicit and implicit modeling to balance computational efficiency with representational fidelity. Our method combines well-defined Cartesian directions with explicitly modeled views and implicitly inferred intermediate representations, efficiently capturing global dependencies through a nested attention mechanism. Extensive experiments on challenging datasets, including ADE20K, CityScapes, Pascal Context, and COCO-Stuff, demonstrate that LeMoRe strikes an effective balance between performance and efficiency.
