Intriguing Properties of Dynamic Sampling Networks
Dario Morle, Reid Zaffino
TL;DR
This work addresses the theoretical fragmentation around dynamic sampling in vision models by introducing warping, a minimal forward operator $y(t)=x(t+\varepsilon(t))$ that unifies deformable convolutions, STNs, and ACUs. It develops both discrete and continuous analyses, revealing an intrinsic forward–backward asymmetry and two key instability mechanisms, then derives practical stability conditions and demonstrates warping as an orthogonal-like transform distinct from conventional convolutions. The study combines statistical (IID and random-field) analyses with continuous transforms, ablation studies, and loss-landscape visualizations to show how to stabilize training and understand learning dynamics in dynamic sampling networks. The results have implications for designing more stable, expressive architectures that leverage input-adaptive sampling and long-range dependencies while maintaining training efficiency. Overall, warping provides a cohesive theoretical lens for dynamic sampling and guides practical stabilization strategies for future architectures.
Abstract
Dynamic sampling mechanisms in deep learning architectures have demonstrated utility across many computer vision models, though the theoretical analysis of these structures has not yet been unified. In this paper we connect the various dynamic sampling methods by developing and analyzing a novel operator which generalizes existing methods, which we term "warping". Warping provides a minimal implementation of dynamic sampling which is amenable to analysis, and can be used to reconstruct existing architectures including deformable convolutions, active convolutional units, and spatial transformer networks. Using our formalism, we provide statistical analysis of the operator by modeling the inputs as both IID variables and homogeneous random fields. Extending this analysis, we discover a unique asymmetry between the forward and backward pass of the model training. We demonstrate that these mechanisms represent an entirely different class of orthogonal operators to the traditional translationally invariant operators defined by convolutions. With a combination of theoretical analysis and empirical investigation, we find the conditions necessary to ensure stable training of dynamic sampling networks. In addition, statistical analysis of discretization effects are studied. Finally, we introduce a novel loss landscape visualization which utilizes gradient update information directly, to better understand learning behavior.
