Table of Contents
Fetching ...

Parallax to Align Them All: An OmniParallax Attention Mechanism for Distributed Multi-View Image Compression

Haotian Zhang, Feiyue Long, Yixin Yu, Jian Xue, Haocheng Tang, Tongda Xu, Zhenning Shi, Yan Wang, Siwei Ma, Jiaqi Zhang

TL;DR

This work proposes a novel OPAM, which is a general mechanism for explicitly modeling correlations and aligned features between arbitrary pairs of information sources, and proposes a Parallax Multi Information Fusion Module (PMIFM) to adaptively integrate information from different sources.

Abstract

Multi-view image compression (MIC) aims to achieve high compression efficiency by exploiting inter-image correlations, playing a crucial role in 3D applications. As a subfield of MIC, distributed multi-view image compression (DMIC) offers performance comparable to MIC while eliminating the need for inter-view information at the encoder side. However, existing methods in DMIC typically treat all images equally, overlooking the varying degrees of correlation between different views during decoding, which leads to suboptimal coding performance. To address this limitation, we propose a novel $\textbf{OmniParallax Attention Mechanism}$ (OPAM), which is a general mechanism for explicitly modeling correlations and aligned features between arbitrary pairs of information sources. Building upon OPAM, we propose a Parallax Multi Information Fusion Module (PMIFM) to adaptively integrate information from different sources. PMIFM is incorporated into both the joint decoder and the entropy model to construct our end-to-end DMIC framework, $\textbf{ParaHydra}$. Extensive experiments demonstrate that $\textbf{ParaHydra}$ is $\textbf{the first DMIC method}$ to significantly surpass state-of-the-art MIC codecs, while maintaining low computational overhead. Performance gains become more pronounced as the number of input views increases. Compared with LDMIC, $\textbf{ParaHydra}$ achieves bitrate savings of $\textbf{19.72%}$ on WildTrack(3) and up to $\textbf{24.18%}$ on WildTrack(6), while significantly improving coding efficiency (as much as $\textbf{65}\times$ in decoding and $\textbf{34}\times$ in encoding).

Parallax to Align Them All: An OmniParallax Attention Mechanism for Distributed Multi-View Image Compression

TL;DR

This work proposes a novel OPAM, which is a general mechanism for explicitly modeling correlations and aligned features between arbitrary pairs of information sources, and proposes a Parallax Multi Information Fusion Module (PMIFM) to adaptively integrate information from different sources.

Abstract

Multi-view image compression (MIC) aims to achieve high compression efficiency by exploiting inter-image correlations, playing a crucial role in 3D applications. As a subfield of MIC, distributed multi-view image compression (DMIC) offers performance comparable to MIC while eliminating the need for inter-view information at the encoder side. However, existing methods in DMIC typically treat all images equally, overlooking the varying degrees of correlation between different views during decoding, which leads to suboptimal coding performance. To address this limitation, we propose a novel (OPAM), which is a general mechanism for explicitly modeling correlations and aligned features between arbitrary pairs of information sources. Building upon OPAM, we propose a Parallax Multi Information Fusion Module (PMIFM) to adaptively integrate information from different sources. PMIFM is incorporated into both the joint decoder and the entropy model to construct our end-to-end DMIC framework, . Extensive experiments demonstrate that is to significantly surpass state-of-the-art MIC codecs, while maintaining low computational overhead. Performance gains become more pronounced as the number of input views increases. Compared with LDMIC, achieves bitrate savings of on WildTrack(3) and up to on WildTrack(6), while significantly improving coding efficiency (as much as in decoding and in encoding).
Paper Structure (21 sections, 19 equations, 7 figures, 3 tables)

This paper contains 21 sections, 19 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Visualization of OPAM correlations.Row 1: Input views. Row 2: Difference maps. Row 3: Correlations, where consistent regions (green) are prioritized over occlusions (red).
  • Figure 2: (a) The proposed ParaHydra framework.$\hat{y}_{\mathcal{K}\setminus \{k\}}$ and ${f}_{\mathcal{K}\setminus \{k\}}$ denote the sets of all view features excluding the $k$-th view feature ${\hat{y}}_k$ and ${f}_k$, respectively. (b) Parallax Multi Information Fusion Module (PMIFM). PMIFM integrates the aligned feature $f_k^{t}$ based on the semantic relevance $C_k$ between each side source $f_k$ and the main source $f_i$. (c) Parallax Entropy Model. The figure illustrates the decoding process for the slice $\hat{y}^i_k$. Each latent slice $\hat{y}^i_k$ is partitioned into anchor $\hat{y}^i_{k,\text{ac}}$ and non-anchor $\hat{y}^i_{k,\text{na}}$ parts. Anchor $\hat{y}^i_{k,\text{ac}}$ is decoded first using Gaussian parameters $(\mu^i_{k,\text{ac}}, \sigma^i_{k,\text{ac}})$ predicted from the channel context $\Phi^i_{k,\text{ch}}$ (provided by previous slices $\hat{y}^{<i}_k$) and the hyperprior $\Phi_{H_k}$. Non-anchor $\hat{y}^i_{k,\text{na}}$ is decoded next using Gaussian parameters $(\mu^i_{k,\text{na}}, \sigma^i_{k,\text{na}})$ predicted from the local context $\Phi^i_{k,\text{lc}}$ (derived from anchor $\hat{y}^i_{k,\text{ac}}$), the channel context $\Phi^i_{k,\text{ch}}$, the global context $\Phi^i_{k,\text{gc}}$, and the hyperprior $\Phi_{H_k}$.
  • Figure 3: Overview of OmniParallax Attention Mechanism (OPAM).Left:Parallax attention.Middle:Two-stage parallax attention in OPAM. OPAM applies horizontal (red) and vertical (blue) parallax attention sequentially to capture the full 2D spatial context. Right:Receptive fields of the aligned features. Each position in $f_l^{{hor}}$ attends to one row of $f_r$, and each position in $f_l^{{ver}}$ attends to one column of $f_l^{{hor}}$, allowing each position in $f_l^{{ver}}$ to attend to the entire 2D spatial domain of $f_r$.
  • Figure 4: Rate-distortion curves of ParaHydra compared against various baselines.
  • Figure 5: Ablation study on the WildTrack with 3 input views.
  • ...and 2 more figures