Table of Contents
Fetching ...

MIDAS: Mosaic Input-Specific Differentiable Architecture Search

Konstanty Subbotko

TL;DR

MIDAS addresses instability and lack of input specificity in differentiable NAS by turning architecture parameters into input-conditioned decisions via patchwise self-attention. It introduces a mosaic architecture that localizes per-input choices and a topology-aware, parameter-free search space to select edge pairs, while maintaining DARTS-like efficiency. Across NAS-Bench-201, DARTS, and RDARTS, MIDAS achieves state-of-the-art or near-optimal results on CIFAR-10, CIFAR-100, and transfers to ImageNet, with analyses showing stable, unimodal, and class-aware architecture distributions. This approach offers a scalable, robust path to automated architecture search that leverages local context and topology without additional parameter overhead, with strong implications for hardware-aware NAS and future extension to broader search spaces and tasks.

Abstract

Differentiable Neural Architecture Search (NAS) provides efficient, gradient-based methods for automatically designing neural networks, yet its adoption remains limited in practice. We present MIDAS, a novel approach that modernizes DARTS by replacing static architecture parameters with dynamic, input-specific parameters computed via self-attention. To improve robustness, MIDAS (i) localizes the architecture selection by computing it separately for each spatial patch of the activation map, and (ii) introduces a parameter-free, topology-aware search space that models node connectivity and simplifies selecting the two incoming edges per node. We evaluate MIDAS on the DARTS, NAS-Bench-201, and RDARTS search spaces. In DARTS, it reaches 97.42% top-1 on CIFAR-10 and 83.38% on CIFAR-100. In NAS-Bench-201, it consistently finds globally optimal architectures. In RDARTS, it sets the state of the art on two of four search spaces on CIFAR-10. We further analyze why MIDAS works, showing that patchwise attention improves discrimination among candidate operations, and the resulting input-specific parameter distributions are class-aware and predominantly unimodal, providing reliable guidance for decoding.

MIDAS: Mosaic Input-Specific Differentiable Architecture Search

TL;DR

MIDAS addresses instability and lack of input specificity in differentiable NAS by turning architecture parameters into input-conditioned decisions via patchwise self-attention. It introduces a mosaic architecture that localizes per-input choices and a topology-aware, parameter-free search space to select edge pairs, while maintaining DARTS-like efficiency. Across NAS-Bench-201, DARTS, and RDARTS, MIDAS achieves state-of-the-art or near-optimal results on CIFAR-10, CIFAR-100, and transfers to ImageNet, with analyses showing stable, unimodal, and class-aware architecture distributions. This approach offers a scalable, robust path to automated architecture search that leverages local context and topology without additional parameter overhead, with strong implications for hardware-aware NAS and future extension to broader search spaces and tasks.

Abstract

Differentiable Neural Architecture Search (NAS) provides efficient, gradient-based methods for automatically designing neural networks, yet its adoption remains limited in practice. We present MIDAS, a novel approach that modernizes DARTS by replacing static architecture parameters with dynamic, input-specific parameters computed via self-attention. To improve robustness, MIDAS (i) localizes the architecture selection by computing it separately for each spatial patch of the activation map, and (ii) introduces a parameter-free, topology-aware search space that models node connectivity and simplifies selecting the two incoming edges per node. We evaluate MIDAS on the DARTS, NAS-Bench-201, and RDARTS search spaces. In DARTS, it reaches 97.42% top-1 on CIFAR-10 and 83.38% on CIFAR-100. In NAS-Bench-201, it consistently finds globally optimal architectures. In RDARTS, it sets the state of the art on two of four search spaces on CIFAR-10. We further analyze why MIDAS works, showing that patchwise attention improves discrimination among candidate operations, and the resulting input-specific parameter distributions are class-aware and predominantly unimodal, providing reliable guidance for decoding.
Paper Structure (38 sections, 17 equations, 12 figures, 6 tables)

This paper contains 38 sections, 17 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Computing input-specific architecture with attention. For a given node, each candidate operation $o^{(j)}$ applied to an incoming feature $x^{(i)}$ produces an activation map $F^{(i,j)} = o^{(j)}(x^{(i)})$. We project the node's concatenated input into a query and the candidate activation maps into keys, and apply dot-product attention to obtain architecture weights. In the mosaic variant, we partition each activation map into $P^2$ patches, compute attention independently within each patch, and average the patch-level distributions to obtain an image-level architecture. For clarity, the topology-aware search space is not illustrated.
  • Figure 2: Learned input-specific architecture parameters in the first two cells in the DARTS search space on CIFAR-10, averaged over four runs. We compare three variants: no patch (global average pooling only), PS=4 (patch size $4\times4$), and PS=8 (patch size $8\times8$). The horizontal line denotes uniform importance across operations. We observe that no patch fails to discriminate among learnable operations, essentially assigning the same weights to all four.
  • Figure 3: (Left) Histogram of $p$-values from Hartigan's dip test across all architecture parameters. The vertical line marks the $0.05$ rejection threshold. Most $p$-values exceed $0.05$, so unimodality is not rejected for the majority, indicating predominantly unimodal behaviour. (Right) Distribution of an example input-specific parameter, with the vertical line marking the mean value.
  • Figure 4: Cosine similarity between CIFAR-10 classes derived from input-specific architecture parameters (higher is more similar). Best viewed in color.
  • Figure 5: Standard deviation of input-specific architecture parameters $p^{(i, j)}$, averaged over candidate edges, nodes, and four seeds.
  • ...and 7 more figures