Table of Contents
Fetching ...

MS-SCANet: A Multiscale Transformer-Based Architecture with Dual Attention for No-Reference Image Quality Assessment

Mayesha Maliha R. Mithila, Mylene C. Q. Farias

TL;DR

MS-SCANet addresses no-reference image quality assessment by leveraging a dual-branch, multiscale transformer that jointly processes fine- and coarse-scale image features. It introduces a cross-branch attention mechanism to fuse multiscale representations and two consistency losses—Cross-Branch Consistency Loss and Adaptive Pooling Consistency Loss—to preserve spatial integrity during scale changes, formalized within a total loss $\mathcal{L}_{total} = \mathcal{L}_{L1} + \mathcal{L}_{CB} + \mathcal{L}_{AP}$. Empirical results on KonIQ-10k, LIVE, LIVE Challenge, and CSIQ show strong correlations with human judgments and robust cross-dataset generalization, while maintaining computational efficiency via window-based attention with complexity $O(N_w^2 \cdot d)$. The work advances NR-IQA by integrating multiscale transformer features with targeted consistency losses and provides an open-source implementation for broader adoption.

Abstract

We present the Multi-Scale Spatial Channel Attention Network (MS-SCANet), a transformer-based architecture designed for no-reference image quality assessment (IQA). MS-SCANet features a dual-branch structure that processes images at multiple scales, effectively capturing both fine and coarse details, an improvement over traditional single-scale methods. By integrating tailored spatial and channel attention mechanisms, our model emphasizes essential features while minimizing computational complexity. A key component of MS-SCANet is its cross-branch attention mechanism, which enhances the integration of features across different scales, addressing limitations in previous approaches. We also introduce two new consistency loss functions, Cross-Branch Consistency Loss and Adaptive Pooling Consistency Loss, which maintain spatial integrity during feature scaling, outperforming conventional linear and bilinear techniques. Extensive evaluations on datasets like KonIQ-10k, LIVE, LIVE Challenge, and CSIQ show that MS-SCANet consistently surpasses state-of-the-art methods, offering a robust framework with stronger correlations with subjective human scores.

MS-SCANet: A Multiscale Transformer-Based Architecture with Dual Attention for No-Reference Image Quality Assessment

TL;DR

MS-SCANet addresses no-reference image quality assessment by leveraging a dual-branch, multiscale transformer that jointly processes fine- and coarse-scale image features. It introduces a cross-branch attention mechanism to fuse multiscale representations and two consistency losses—Cross-Branch Consistency Loss and Adaptive Pooling Consistency Loss—to preserve spatial integrity during scale changes, formalized within a total loss . Empirical results on KonIQ-10k, LIVE, LIVE Challenge, and CSIQ show strong correlations with human judgments and robust cross-dataset generalization, while maintaining computational efficiency via window-based attention with complexity . The work advances NR-IQA by integrating multiscale transformer features with targeted consistency losses and provides an open-source implementation for broader adoption.

Abstract

We present the Multi-Scale Spatial Channel Attention Network (MS-SCANet), a transformer-based architecture designed for no-reference image quality assessment (IQA). MS-SCANet features a dual-branch structure that processes images at multiple scales, effectively capturing both fine and coarse details, an improvement over traditional single-scale methods. By integrating tailored spatial and channel attention mechanisms, our model emphasizes essential features while minimizing computational complexity. A key component of MS-SCANet is its cross-branch attention mechanism, which enhances the integration of features across different scales, addressing limitations in previous approaches. We also introduce two new consistency loss functions, Cross-Branch Consistency Loss and Adaptive Pooling Consistency Loss, which maintain spatial integrity during feature scaling, outperforming conventional linear and bilinear techniques. Extensive evaluations on datasets like KonIQ-10k, LIVE, LIVE Challenge, and CSIQ show that MS-SCANet consistently surpasses state-of-the-art methods, offering a robust framework with stronger correlations with subjective human scores.
Paper Structure (5 sections, 8 equations, 3 figures, 3 tables)

This paper contains 5 sections, 8 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Architecture of the proposed Multi-Scale SCANet (MS-SCANet) framework
  • Figure 2: Scatter plots and linear regression fit lines of our proposed MS-SCANet method on benchmark datasets.
  • Figure 3: Box plot for SROCC for cross-dataset combination for different models: WaDIQaM bosse2018deep, DBCNN zhang2018blind, HyperIQA su2020blindly, and MS-SCANet across three train-test combinations. The label on the x-axis, dataset1-dataset2, indicates that training was conducted on dataset1 while testing was performed on dataset2.