Table of Contents
Fetching ...

DARN: Dynamic Adaptive Regularization Networks for Efficient and Robust Foundation Model Adaptation

Dhenenjay Yadav, Rohan Sawai

TL;DR

DARN addresses the challenge of adapting geospatial foundation models by replacing fixed regularization in decoders with a per-sample adaptive scheme. It introduces three components—Task Complexity Predictor ($c_i$ via TCP), Adaptive Dropout Modulation ($p_i$), and Dynamic Capacity Gating (DCG)—embedded in a Pyramid Decoder backbone to tailor regularization and capacity to each input. The approach is theoretically grounded (convergence to stationary points; information-bottleneck perspective) and empirically validated, achieving state-of-the-art mean IoU on GeoBench under full fine-tuning and near-SOTA performance with frozen backbones, while delivering notable OOD generalization and robustness gains. Collectively, DARN offers a more intelligent, robust, and deployment-friendly pathway for leveraging foundation models in critical geospatial tasks, with tangible improvements on minority classes and real-world reliability metrics.

Abstract

Foundation models (FMs) offer powerful representations for geospatial analysis, but adapting them effectively remains challenging. Standard adaptation methods, whether full fine-tuning or efficient frozen-backbone approaches, typically employ decoders with fixed regularization strategies, failing to account for the significant heterogeneity in satellite imagery. We introduce Dynamic Adaptive Regularization Networks (DARN), a novel decoder architecture designed to address this limitation. DARN integrates three key innovations: (1) a lightweight Task Complexity Predictor (TCP) that estimates per-sample difficulty, (2) Adaptive Dropout Modulation (ADM), dynamically adjusting dropout rates (from 0.1 to 0.5) based on predicted complexity, and (3) Dynamic Capacity Gating (DCG) that modulates channel activation. We provide theoretical justifications linking DARN's optimization to stationary point convergence and its mechanism to adaptive information bottlenecks. Empirically, DARN demonstrates exceptional performance across both major adaptation paradigms. In full fine-tuning (unfrozen backbone), DARN achieves a new state-of-the-art on the multi-task GeoBench benchmark (86.66% mIoU, +5.56 pp over prior SOTA). In efficient adaptation (frozen backbone), DARN achieves SOTA-competitive accuracy (90.5% mIoU on Sen1Floods11) while delivering substantial advantages crucial for real-world deployment: superior out-of-distribution (OOD) generalization (+9.5 pp mIoU on AI4SmallFarms), enhanced robustness (17% relative reduction in corruption error), and improved performance on minority classes. DARN offers a more intelligent, robust, and efficient approach to leveraging FMs in critical geospatial applications.

DARN: Dynamic Adaptive Regularization Networks for Efficient and Robust Foundation Model Adaptation

TL;DR

DARN addresses the challenge of adapting geospatial foundation models by replacing fixed regularization in decoders with a per-sample adaptive scheme. It introduces three components—Task Complexity Predictor ( via TCP), Adaptive Dropout Modulation (), and Dynamic Capacity Gating (DCG)—embedded in a Pyramid Decoder backbone to tailor regularization and capacity to each input. The approach is theoretically grounded (convergence to stationary points; information-bottleneck perspective) and empirically validated, achieving state-of-the-art mean IoU on GeoBench under full fine-tuning and near-SOTA performance with frozen backbones, while delivering notable OOD generalization and robustness gains. Collectively, DARN offers a more intelligent, robust, and deployment-friendly pathway for leveraging foundation models in critical geospatial tasks, with tangible improvements on minority classes and real-world reliability metrics.

Abstract

Foundation models (FMs) offer powerful representations for geospatial analysis, but adapting them effectively remains challenging. Standard adaptation methods, whether full fine-tuning or efficient frozen-backbone approaches, typically employ decoders with fixed regularization strategies, failing to account for the significant heterogeneity in satellite imagery. We introduce Dynamic Adaptive Regularization Networks (DARN), a novel decoder architecture designed to address this limitation. DARN integrates three key innovations: (1) a lightweight Task Complexity Predictor (TCP) that estimates per-sample difficulty, (2) Adaptive Dropout Modulation (ADM), dynamically adjusting dropout rates (from 0.1 to 0.5) based on predicted complexity, and (3) Dynamic Capacity Gating (DCG) that modulates channel activation. We provide theoretical justifications linking DARN's optimization to stationary point convergence and its mechanism to adaptive information bottlenecks. Empirically, DARN demonstrates exceptional performance across both major adaptation paradigms. In full fine-tuning (unfrozen backbone), DARN achieves a new state-of-the-art on the multi-task GeoBench benchmark (86.66% mIoU, +5.56 pp over prior SOTA). In efficient adaptation (frozen backbone), DARN achieves SOTA-competitive accuracy (90.5% mIoU on Sen1Floods11) while delivering substantial advantages crucial for real-world deployment: superior out-of-distribution (OOD) generalization (+9.5 pp mIoU on AI4SmallFarms), enhanced robustness (17% relative reduction in corruption error), and improved performance on minority classes. DARN offers a more intelligent, robust, and efficient approach to leveraging FMs in critical geospatial applications.

Paper Structure

This paper contains 36 sections, 9 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Comparative Segmentation Results of DARN V6 vs. Baseline UPerNet20. This figure displays Ground Truth, Baseline UPerNet predictions (with IoU), and DARN V6 predictions (with IoU) for five diverse remote sensing samples (m-SA-crop-type, m-nz-cattle, m-pv4ger-seg, m-cashew-plant, and m-chesapeake). DARN V6 consistently achieves higher IoU scores, demonstrating superior segmentation performance.
  • Figure 2: Conceptual diagram of the DARN decoder architecture. It takes features $F_1..F_4$ from a frozen encoder. The TCP predicts complexity $c$ from $F_1$, which modulates channel gating in DCG and sets adaptive dropout in ADM layers within the pyramid decoder backbone.
  • Figure 3: GeoBench overall mean mIoU leaderboard. DARN outperforms prior models when fully fine-tuned. Competitor scores are from official GeoBench data [9]. Our result (86.66%) is from a single run.