Table of Contents
Fetching ...

Neighborhood Feature Pooling for Remote Sensing Image Classification

Fahimeh Orvati Nia, Amirmohammad Mohammadi, Salim Al Kharsa, Pragati Naikare, Zigfried Hampel-Arias, Joshua Peeples

TL;DR

This work introduces Neighborhood Feature Pooling (NFP), a texture-aware pooling mechanism that computes local neighborhood similarities to preserve fine-grained spatial structure in remote sensing images. By forming an affinity stack through center-neighbor comparisons $S_n = d(I_n(mbda), I_c(mbda))$ and fusing with GAP, NFP enhances feature representations with minimal parameter cost. Extensive experiments across five datasets and three backbones show NFP consistently improves accuracy over GAP and other texture pooling methods, particularly on texture-rich datasets like GTOS-Mobile, and provides more interpretable Grad-CAM and t-SNE visualizations. The approach demonstrates robustness and efficiency, with potential applications beyond classification to detection and segmentation in remote sensing domains.

Abstract

In this work, we propose neighborhood feature pooling (NFP) as a novel texture feature extraction method for remote sensing image classification. The NFP layer captures relationships between neighboring inputs and efficiently aggregates local similarities across feature dimensions. Implemented using convolutional layers, NFP can be seamlessly integrated into any network. Results comparing the baseline models and the NFP method indicate that NFP consistently improves performance across diverse datasets and architectures while maintaining minimal parameter overhead.

Neighborhood Feature Pooling for Remote Sensing Image Classification

TL;DR

This work introduces Neighborhood Feature Pooling (NFP), a texture-aware pooling mechanism that computes local neighborhood similarities to preserve fine-grained spatial structure in remote sensing images. By forming an affinity stack through center-neighbor comparisons and fusing with GAP, NFP enhances feature representations with minimal parameter cost. Extensive experiments across five datasets and three backbones show NFP consistently improves accuracy over GAP and other texture pooling methods, particularly on texture-rich datasets like GTOS-Mobile, and provides more interpretable Grad-CAM and t-SNE visualizations. The approach demonstrates robustness and efficiency, with potential applications beyond classification to detection and segmentation in remote sensing domains.

Abstract

In this work, we propose neighborhood feature pooling (NFP) as a novel texture feature extraction method for remote sensing image classification. The NFP layer captures relationships between neighboring inputs and efficiently aggregates local similarities across feature dimensions. Implemented using convolutional layers, NFP can be seamlessly integrated into any network. Results comparing the baseline models and the NFP method indicate that NFP consistently improves performance across diverse datasets and architectures while maintaining minimal parameter overhead.

Paper Structure

This paper contains 15 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: NFP illustration. (a) Overview of the Neighborhood Feature Pooling framework. (b) Example with $r=1$, showing the input patch, directional kernels, and the resulting similarity maps that encode local texture relationships. Together these demonstrate how NFP captures local structural information that is later integrated with global features (\ref{['fig:architecture']}).
  • Figure 2: Full architecture illustration of the proposed model using MobileNetV3 howard2019searching with NFP. Each block represents a stage in the feature extraction pipeline. After feature maps are extracted from the input, the features are aggregated through two branches: global average pooling (GAP) and NFP. The NFP branch first extract the similarity maps then the similarity values are aggregated through GAP. The average NFP features are then upsampled to the same dimension using a $1 \times 1$ convolution. The final step is for the GAP and NFP feature vectors to be multiplied before being passed into the output classification layer.
  • Figure 3: Visualization of feature representations from the first layer of MobileNetV3 howard2019searching on the UC Merced dataset Yang2010UCMerced, class "intersection." (b) The standard feature map highlights low-level edges and textures, (c) NFP (cosine similarity) enhances local neighborhood structures, and (d) lacunarity pooling emphasizes spatial gaps and texture distributions. Feature map visualizations are channel-averaged and normalized to the [0,1] range for comparison.
  • Figure 4: Grad-CAM selvaraju2017gradcam visualizations for an "Airplane" sample from the UC Merced dataset Yang2010UCMerced using MobileNetV3 howard2019searching with different pooling strategies: (a) Original image, (b) GAP (Baseline), (c) NFP (Ours), (d) DeepTEN, (e) Fractal, (f) Lacunarity, and (g) RADAM. Brighter regions indicate higher model attention. NFP consistently yields more focused and semantically meaningful activations, localizing key object regions (e.g., airplane body and wings) better than GAP and other texture pooling methods. These results demonstrate NFP's superior ability to preserve fine-grained, class-relevant spatial structure.
  • Figure 5: t-SNE visualizations of penultimate-layer features extracted from models trained on GTOS-Mobile (31 classes) with different pooling methods: GAP, NFP (Ours), DeepTEN, Fractal, Lacunarity, and RADAM. Each point represents a test image, colored by ground-truth class. The Silhouette Score (computed on the original feature space) is shown beneath each panel, quantifying the compactness and separability of class clusters. NFP achieves the highest Silhouette Score, indicating its superior ability to produce discriminative and well-separated feature embeddings for texture classification. All runs used a shared random seed for fair comparison.