Table of Contents
Fetching ...

Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images

Jiuchen Chen, Xinyu Yan, Qizhi Xu, Kaiqi Li

TL;DR

This work tackles the challenge of haze removal in ultra-high-resolution imagery by reducing memory constraints through patch-based tokenization and a global-context Bottleneck, enabling end-to-end inference up to $10240 \times 10240$ on FP16. It introduces DehazeXL, a three-component architecture (Encoder, Bottleneck, Decoder) that fuses global context with local features, and a Dehazing Attribution Map (DAM) for interpreting regional contributions to dehazing performance. To address the lack of high-resolution data, the authors present 8KDehaze, an 8192×8192 aerial image dataset with 10,000 hazy/clear pairs. Empirical results show state-of-the-art PSNR/SSIM and favorable memory/compute profiles on 8KDehaze, 4KID, and O-HAZE, with DAM shedding light on the importance of global information for coherent restoration.

Abstract

Global contextual information and local detail features are essential for haze removal tasks. Deep learning models perform well on small, low-resolution images, but they encounter difficulties with large, high-resolution ones due to GPU memory limitations. As a compromise, they often resort to image slicing or downsampling. The former diminishes global information, while the latter discards high-frequency details. To address these challenges, we propose DehazeXL, a haze removal method that effectively balances global context and local feature extraction, enabling end-to-end modeling of large images on mainstream GPU hardware. Additionally, to evaluate the efficiency of global context utilization in haze removal performance, we design a visual attribution method tailored to the characteristics of haze removal tasks. Finally, recognizing the lack of benchmark datasets for haze removal in large images, we have developed an ultra-high-resolution haze removal dataset (8KDehaze) to support model training and testing. It includes 10000 pairs of clear and hazy remote sensing images, each sized at 8192 $\times$ 8192 pixels. Extensive experiments demonstrate that DehazeXL can infer images up to 10240 $\times$ 10240 pixels with only 21 GB of memory, achieving state-of-the-art results among all evaluated methods. The source code and experimental dataset are available at https://github.com/CastleChen339/DehazeXL.

Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images

TL;DR

This work tackles the challenge of haze removal in ultra-high-resolution imagery by reducing memory constraints through patch-based tokenization and a global-context Bottleneck, enabling end-to-end inference up to on FP16. It introduces DehazeXL, a three-component architecture (Encoder, Bottleneck, Decoder) that fuses global context with local features, and a Dehazing Attribution Map (DAM) for interpreting regional contributions to dehazing performance. To address the lack of high-resolution data, the authors present 8KDehaze, an 8192×8192 aerial image dataset with 10,000 hazy/clear pairs. Empirical results show state-of-the-art PSNR/SSIM and favorable memory/compute profiles on 8KDehaze, 4KID, and O-HAZE, with DAM shedding light on the importance of global information for coherent restoration.

Abstract

Global contextual information and local detail features are essential for haze removal tasks. Deep learning models perform well on small, low-resolution images, but they encounter difficulties with large, high-resolution ones due to GPU memory limitations. As a compromise, they often resort to image slicing or downsampling. The former diminishes global information, while the latter discards high-frequency details. To address these challenges, we propose DehazeXL, a haze removal method that effectively balances global context and local feature extraction, enabling end-to-end modeling of large images on mainstream GPU hardware. Additionally, to evaluate the efficiency of global context utilization in haze removal performance, we design a visual attribution method tailored to the characteristics of haze removal tasks. Finally, recognizing the lack of benchmark datasets for haze removal in large images, we have developed an ultra-high-resolution haze removal dataset (8KDehaze) to support model training and testing. It includes 10000 pairs of clear and hazy remote sensing images, each sized at 8192 8192 pixels. Extensive experiments demonstrate that DehazeXL can infer images up to 10240 10240 pixels with only 21 GB of memory, achieving state-of-the-art results among all evaluated methods. The source code and experimental dataset are available at https://github.com/CastleChen339/DehazeXL.

Paper Structure

This paper contains 12 sections, 2 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Comparison between different methods for handling large images in haze removal tasks. (a) Downsampling approach, which reduces the image size but loses critical high-frequency details. (b) Image slicing technique, which processes larger inputs but compromises global contextual information and object coherence. (c) The proposed method, which aims to effectively balance global context and local feature extraction to enhance haze removal performance in high-resolution images.
  • Figure 2: Comparison of GPU memory usage across various models. DehazeXL demonstrates a reduction in memory usage by approximately 65%-80% when processing large images compared to other methods. Notably, when employing FP16 format for inference, DehazeXL can process 10,240 $\times$ 10,240 pixel images with only 21 GB of memory.
  • Figure 3: Overall architecture of the proposed model. It begins by partitioning the hazy image into uniform-sized patches, which are then encoded into tokens by the Encoder. The Bottleneck injects global information into each token, enhancing the contextual representation. Subsequently, the Decoder reconstructs the tokens back into image patches, forming the final output image. Notably, to minimize memory consumption, both the Encoder and Decoder employ an asynchronous processing strategy, handling the input in multiple mini-batches sequentially rather than simultaneously. This design optimizes memory efficiency while ensuring effective haze removal.
  • Figure 4: Illustration of the baseline image and the path function. The region enclosed by the red box indicates the attribution area.
  • Figure 5: Dehazed results on the 8KDehaze dataset. The patches for comparison are marked with red boxes in the original images. PSNR / SSIM is calculated based on the patches to better reflect the performance difference. The proposed DehazeXL can directly infer images with a resolution of 8192 $\times$ 8192 without the need for slicing inference. Compared to other methods, the proposed method effectively eliminates segmentation artifacts and achieves superior visual quality.
  • ...and 3 more figures