Table of Contents
Fetching ...

G-ZAP: A Generalizable Zero-Shot Framework for Arbitrary-Scale Pansharpening

Zhiqi Yang, Shan Yin, Jingze Liang, Liang-Jian Deng

Abstract

Pansharpening aims to fuse a high-resolution panchromatic (PAN) image and a low-resolution multispectral (LRMS) image to produce a high-resolution multispectral (HRMS) image. Recent deep models have achieved strong performance, yet they typically rely on large-scale pretraining and often generalize poorly to unseen real-world image pairs.Prior zero-shot approaches improve real-scene generalization but require per-image optimization, hindering weight reuse, and the above methods are usually limited to a fixed scale.To address this issue, we propose G-ZAP, a generalizable zero-shot framework for arbitrary-scale pansharpening, designed to handle cross-resolution, cross-scene, and cross-sensor generalization.G-ZAP adopts a feature-based implicit neural representation (INR) fusion network as the backbone and introduces a multi-scale, semi-supervised training scheme to enable robust generalization.Extensive experiments on multiple real-world datasets show that G-ZAP achieves state-of-the-art results under PAN-scale fusion in both visual quality and quantitative metrics.Notably, G-ZAP supports weight reuse across image pairs while maintaining competitiveness with per-pair retraining, demonstrating strong potential for efficient real-world deployment.

G-ZAP: A Generalizable Zero-Shot Framework for Arbitrary-Scale Pansharpening

Abstract

Pansharpening aims to fuse a high-resolution panchromatic (PAN) image and a low-resolution multispectral (LRMS) image to produce a high-resolution multispectral (HRMS) image. Recent deep models have achieved strong performance, yet they typically rely on large-scale pretraining and often generalize poorly to unseen real-world image pairs.Prior zero-shot approaches improve real-scene generalization but require per-image optimization, hindering weight reuse, and the above methods are usually limited to a fixed scale.To address this issue, we propose G-ZAP, a generalizable zero-shot framework for arbitrary-scale pansharpening, designed to handle cross-resolution, cross-scene, and cross-sensor generalization.G-ZAP adopts a feature-based implicit neural representation (INR) fusion network as the backbone and introduces a multi-scale, semi-supervised training scheme to enable robust generalization.Extensive experiments on multiple real-world datasets show that G-ZAP achieves state-of-the-art results under PAN-scale fusion in both visual quality and quantitative metrics.Notably, G-ZAP supports weight reuse across image pairs while maintaining competitiveness with per-pair retraining, demonstrating strong potential for efficient real-world deployment.
Paper Structure (26 sections, 10 equations, 10 figures, 4 tables)

This paper contains 26 sections, 10 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Comparison of traditional deep learning methods, traditional zero-shot methods, feature-based INR methods, and the proposed G-ZAP framework in terms of data requirement, efficiency, and available scale. G-ZAP achieves arbitrary-scale pansharpening with high efficiency using only test data through an INR-based zero-shot training strategy.
  • Figure 2: The framework adopts a three-level collaborative training strategy. All INRConv modules share the same parameters across different levels. By jointly optimizing constructed supervisory losses across multiple cross-resolution settings and unsupervised losses at full resolution, the model learns resolution-consistent and generalizable representations. After training, the learned INR-based fusion model can be seamlessly extended to inference at arbitrary spatial resolutions.
  • Figure 3: Target coordinates are generated according to the specified scale factor. For each target point, the MLP takes relative coordinates, cell size, and neighboring feature-map values as input, and computes the query value via area-weighted aggregation. The resulting target image is then refined by two lightweight convolutional layers to produce the final output.
  • Figure 4: Visual Fusion Image and HQNR Map on a full resolution WV3 example
  • Figure 5: Visual Fusion Image and HQNR Map on a full resolution WV2 example
  • ...and 5 more figures