Asynchronous Feedback Network for Perceptual Point Cloud Quality Assessment
Yujie Zhang, Qi Yang, Ziyu Shan, Yiling Xu
TL;DR
This work addresses NR-PCQA for point clouds by introducing AFQ-Net, which mimics human visual processing with a dual-branch architecture that enables asynchronous global-to-local guidance. The global branch uses a Vision Transformer to extract attention maps from multi-view texture and depth projections, which are fused via occupancy-weighted fusion to form a global feature $f_g$. The attention maps guide a region-aware local feature extractor using dynamic convolution with region-specific masks $M$ and filters $W$, yielding $f_l$, and a coarse-to-fine regression combines $f_g$ and $f_l$ through two heads, with losses including $L_{reg}$, $L_{dis}$, and $L_{rank}$ to promote progressive refinement. Extensive experiments on three PCQA datasets show AFQ-Net achieving state-of-the-art correlations with subjective MOS and robustness across distortions and cross-dataset settings, highlighting its practical impact for NR-PCQA in real-world pipelines, including compression scenarios.
Abstract
Recent years have witnessed the success of the deep learning-based technique in research of no-reference point cloud quality assessment (NR-PCQA). For a more accurate quality prediction, many previous studies have attempted to capture global and local features in a bottom-up manner, but ignored the interaction and promotion between them. To solve this problem, we propose a novel asynchronous feedback quality prediction network (AFQ-Net). Motivated by human visual perception mechanisms, AFQ-Net employs a dual-branch structure to deal with global and local features, simulating the left and right hemispheres of the human brain, and constructs a feedback module between them. Specifically, the input point clouds are first fed into a transformer-based global encoder to generate the attention maps that highlight these semantically rich regions, followed by being merged into the global feature. Then, we utilize the generated attention maps to perform dynamic convolution for different semantic regions and obtain the local feature. Finally, a coarse-to-fine strategy is adopted to merge the two features into the final quality score. We conduct comprehensive experiments on three datasets and achieve superior performance over the state-of-the-art approaches on all of these datasets. The code will be available at The code will be available at https://github.com/zhangyujie-1998/AFQ-Net.
