Table of Contents
Fetching ...

RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation

Weiying Xie, Zixuan Wang, Jitao Ma, Daixun Li, Yunsong Li

TL;DR

RS-DGC addresses the high communication costs of distributed RS learning by introducing Neighborhood Statistics Indicator (NSI) and a dynamic layer-wise compression policy. NSI uses the mean and standard deviation within a $p\times p$ neighborhood to quantify gradient importance, enabling informative gradient selection beyond large-magnitude gradients. The dynamic per-layer compression adapts to training progress, improving accuracy while maintaining high compression ratios. Extensive experiments on RS image classification and multi-modal scene classification show RS-DGC improves accuracy (e.g., $0.51\%$ on NWPU-RESISC45 with VGG-19) and achieves over $50\times$ reduction in communication, demonstrating practical benefits for RS big-data scenarios.

Abstract

Distributed deep learning has recently been attracting more attention in remote sensing (RS) applications due to the challenges posed by the increased amount of open data that are produced daily by Earth observation programs. However, the high communication costs of sending model updates among multiple nodes are a significant bottleneck for scalable distributed learning. Gradient sparsification has been validated as an effective gradient compression (GC) technique for reducing communication costs and thus accelerating the training speed. Existing state-of-the-art gradient sparsification methods are mostly based on the "larger-absolute-more-important" criterion, ignoring the importance of small gradients, which is generally observed to affect the performance. Inspired by informative representation of manifold structures from neighborhood information, we propose a simple yet effective dynamic gradient compression scheme leveraging neighborhood statistics indicator for RS image interpretation, termed RS-DGC. We first enhance the interdependence between gradients by introducing the gradient neighborhood to reduce the effect of random noise. The key component of RS-DGC is a Neighborhood Statistical Indicator (NSI), which can quantify the importance of gradients within a specified neighborhood on each node to sparsify the local gradients before gradient transmission in each iteration. Further, a layer-wise dynamic compression scheme is proposed to track the importance changes of each layer in real time. Extensive downstream tasks validate the superiority of our method in terms of intelligent interpretation of RS images. For example, we achieve an accuracy improvement of 0.51% with more than 50 times communication compression on the NWPU-RESISC45 dataset using VGG-19 network.

RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation

TL;DR

RS-DGC addresses the high communication costs of distributed RS learning by introducing Neighborhood Statistics Indicator (NSI) and a dynamic layer-wise compression policy. NSI uses the mean and standard deviation within a neighborhood to quantify gradient importance, enabling informative gradient selection beyond large-magnitude gradients. The dynamic per-layer compression adapts to training progress, improving accuracy while maintaining high compression ratios. Extensive experiments on RS image classification and multi-modal scene classification show RS-DGC improves accuracy (e.g., on NWPU-RESISC45 with VGG-19) and achieves over reduction in communication, demonstrating practical benefits for RS big-data scenarios.

Abstract

Distributed deep learning has recently been attracting more attention in remote sensing (RS) applications due to the challenges posed by the increased amount of open data that are produced daily by Earth observation programs. However, the high communication costs of sending model updates among multiple nodes are a significant bottleneck for scalable distributed learning. Gradient sparsification has been validated as an effective gradient compression (GC) technique for reducing communication costs and thus accelerating the training speed. Existing state-of-the-art gradient sparsification methods are mostly based on the "larger-absolute-more-important" criterion, ignoring the importance of small gradients, which is generally observed to affect the performance. Inspired by informative representation of manifold structures from neighborhood information, we propose a simple yet effective dynamic gradient compression scheme leveraging neighborhood statistics indicator for RS image interpretation, termed RS-DGC. We first enhance the interdependence between gradients by introducing the gradient neighborhood to reduce the effect of random noise. The key component of RS-DGC is a Neighborhood Statistical Indicator (NSI), which can quantify the importance of gradients within a specified neighborhood on each node to sparsify the local gradients before gradient transmission in each iteration. Further, a layer-wise dynamic compression scheme is proposed to track the importance changes of each layer in real time. Extensive downstream tasks validate the superiority of our method in terms of intelligent interpretation of RS images. For example, we achieve an accuracy improvement of 0.51% with more than 50 times communication compression on the NWPU-RESISC45 dataset using VGG-19 network.
Paper Structure (27 sections, 12 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 27 sections, 12 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: An illustration of the compressing criterion for absolute-based approach and the proposed method. We interpret a layer's gradients as a matrix. Coordinate values are color coded (positive, negative), where deeper color denotes larger absolute value of the gradient. For the previous criterion, only the gradients with the largest absolute value are kept. In contrast, RS-DGC selects gradient neighborhoods that contribute more to network updating.
  • Figure 2: Main workflow of RS-DGC. Firstly, utilizing the dynamic compression ratio strategy, each node calculates the compression ratio for every layer of the local network. Then, each node undertakes independent local training using its own data. After the backward propagation, gradient compression is performed based on the NSI of each neighborhood and the corresponding compression ratio corresponding to this layer.
  • Figure 3: Framework of dynamic gradient compression method. We assess neuron significance based on the absolute magnitude of weights, postulating that layers with greater average weights hold more significance. The sparsity of each layer within the model is established through cross-layer global importance ranking, determining the layer's dynamic compression ratio.
  • Figure 4: Example images from the UCML-21 dataset containing 21 land-use scene categories: agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golf course, harbor, intersection, medium residential, mobile homepark, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis court.
  • Figure 5: Example images from the NWPU-45 dataset containing 45 land-use scene categories: airplane, airport, baseball diamond, basketball court, beach, bridge, chaparral, church, circular farmland, cloud, commercial area, dense residential, desert, forest, freeway, golf course, ground track field, harbor, industrial area, intersection, island, lake, meadow, medium residential, mobile home park, mountain, overpass, palace, parking lot, railway, railway station, rectangular farmland, river, roundabout, runway, sea ice, ship, snowberg, sparse residential, stadium, storage tank, tennis court, terrace, thermal power station, and wetland.
  • ...and 3 more figures