Table of Contents
Fetching ...

Parameter-Inverted Image Pyramid Networks

Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai

TL;DR

Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and single-branch networks, while reducing computational cost.

Abstract

Image pyramids are commonly used in modern computer vision tasks to obtain multi-scale features for precise understanding of images. However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid, thereby balancing computational efficiency and performance. Specifically, the input to PIIP is a set of multi-scale images, where higher resolution images are processed by smaller networks. We further propose a feature interaction mechanism to allow features of different resolutions to complement each other and effectively integrate information from different spatial scales. Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and single-branch networks, while reducing computational cost. Notably, when applying our method on a large-scale vision foundation model InternViT-6B, we improve its performance by 1%-2% on detection and segmentation with only 40%-60% of the original computation. These results validate the effectiveness of the PIIP approach and provide a new technical direction for future vision computing tasks. Our code and models are available at https://github.com/OpenGVLab/PIIP.

Parameter-Inverted Image Pyramid Networks

TL;DR

Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and single-branch networks, while reducing computational cost.

Abstract

Image pyramids are commonly used in modern computer vision tasks to obtain multi-scale features for precise understanding of images. However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid, thereby balancing computational efficiency and performance. Specifically, the input to PIIP is a set of multi-scale images, where higher resolution images are processed by smaller networks. We further propose a feature interaction mechanism to allow features of different resolutions to complement each other and effectively integrate information from different spatial scales. Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and single-branch networks, while reducing computational cost. Notably, when applying our method on a large-scale vision foundation model InternViT-6B, we improve its performance by 1%-2% on detection and segmentation with only 40%-60% of the original computation. These results validate the effectiveness of the PIIP approach and provide a new technical direction for future vision computing tasks. Our code and models are available at https://github.com/OpenGVLab/PIIP.
Paper Structure (19 sections, 2 equations, 5 figures, 15 tables)

This paper contains 19 sections, 2 equations, 5 figures, 15 tables.

Figures (5)

  • Figure 1: Different parameter-resolution designs of image pyramid networks.(a) Plain network which lacks multi-scale features. (b)(c) Inefficient image pyramid networks (shared weights / separate weights with interactions) using equivalently large networks for all scales. (d) Parameter-direct image pyramid network which processes high-resolution images with large models, leading to high computational cost. (e) Our efficient parameter-inverted image pyramid network (PIIP), which pairs models of increasing parameter sizes inversely with images of decreasing resolution. It delivers better performance than those of (b)(c)(d) with much lower computational cost.
  • Figure 2: Overall architecture of our method. We use multi-resolution branches to process images of different resolutions, where larger images are handled by smaller models. Interaction Units build connections between branches. Branch merging combines the features of all branches to form the final output. Our architecture can leverage pre-trained models with different model sizes to build efficient image pyramids.
  • Figure 3: Structure of an interaction unit.
  • Figure 4: Performance of different PIIP variants by adjusting input resolutions. Detailed resolution configuration and results are provided in the appendix.
  • Figure 5: Ablation on model variants and number of interactions.