Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex
Zhizhen Wu, Zhe Cao, Yuchi Huo
TL;DR
This work tackles the high computational cost of applying large, complex convolution kernels on resource-limited devices by introducing a differentiable kernel decomposition that represents a dense kernel as a sequence of optimized sparse layers. The approach enables end-to-end gradient-based learning of sparse kernel samples and introduces a filter-space interpolation scheme to decouple kernel synthesis from image resolution for spatially varying filtering. It achieves higher fidelity than heuristic methods and significantly lower runtime compared with low-rank decompositions, enabling real-time mobile imaging and integration into learning pipelines. By combining robust initialization, differentiable optimization, and a compact basis for per-pixel filters, the method provides a practical, scalable solution for advanced image filtering in graphics and vision tasks.
Abstract
Image convolution with complex kernels is a fundamental operation in photography, scientific imaging, and animation effects, yet direct dense convolution is computationally prohibitive on resource-limited devices. Existing approximations, such as simulated annealing or low-rank decompositions, either lack efficiency or fail to capture non-convex kernels. We introduce a differentiable kernel decomposition framework that represents a target spatially-variant, dense, complex kernel using a set of sparse kernel samples. Our approach features (i) a decomposition that enables differentiable optimization of sparse kernels, (ii) a dedicated initialization strategy for non-convex shapes to avoid poor local minima, and (iii) a kernel-space interpolation scheme that extends single-kernel filtering to spatially varying filtering without retraining and additional runtime overhead. Experiments on Gaussian and non-convex kernels show that our method achieves higher fidelity than simulated annealing and significantly lower cost than low-rank decompositions. Our approach provides a practical solution for mobile imaging and real-time rendering, while remaining fully differentiable for integration into broader learning pipelines.
