Table of Contents
Fetching ...

BiRQA: Bidirectional Robust Quality Assessment for Images

Aleksandr Gushchin, Dmitriy S. Vatolin, Anastasia Antsiferova

TL;DR

BiRQA is the only FR IQA model combining competitive accuracy with real-time throughput and strong adversarial resilience, and to the authors' knowledge is the only FR IQA model combining competitive accuracy with real-time throughput and strong adversarial resilience.

Abstract

Full-Reference image quality assessment (FR IQA) is important for image compression, restoration and generative modeling, yet current neural metrics remain slow and vulnerable to adversarial perturbations. We present BiRQA, a compact FR IQA metric model that processes four fast complementary features within a bidirectional multiscale pyramid. A bottom-up attention module injects fine-scale cues into coarse levels through an uncertainty-aware gate, while a top-down cross-gating block routes semantic context back to high resolution. To enhance robustness, we introduce Anchored Adversarial Training, a theoretically grounded strategy that uses clean "anchor" samples and a ranking loss to bound pointwise prediction error under attacks. On five public FR IQA benchmarks BiRQA outperforms or matches the previous state of the art (SOTA) while running ~3x faster than previous SOTA models. Under unseen white-box attacks it lifts SROCC from 0.30-0.57 to 0.60-0.84 on KADID-10k, demonstrating substantial robustness gains. To our knowledge, BiRQA is the only FR IQA model combining competitive accuracy with real-time throughput and strong adversarial resilience.

BiRQA: Bidirectional Robust Quality Assessment for Images

TL;DR

BiRQA is the only FR IQA model combining competitive accuracy with real-time throughput and strong adversarial resilience, and to the authors' knowledge is the only FR IQA model combining competitive accuracy with real-time throughput and strong adversarial resilience.

Abstract

Full-Reference image quality assessment (FR IQA) is important for image compression, restoration and generative modeling, yet current neural metrics remain slow and vulnerable to adversarial perturbations. We present BiRQA, a compact FR IQA metric model that processes four fast complementary features within a bidirectional multiscale pyramid. A bottom-up attention module injects fine-scale cues into coarse levels through an uncertainty-aware gate, while a top-down cross-gating block routes semantic context back to high resolution. To enhance robustness, we introduce Anchored Adversarial Training, a theoretically grounded strategy that uses clean "anchor" samples and a ranking loss to bound pointwise prediction error under attacks. On five public FR IQA benchmarks BiRQA outperforms or matches the previous state of the art (SOTA) while running ~3x faster than previous SOTA models. Under unseen white-box attacks it lifts SROCC from 0.30-0.57 to 0.60-0.84 on KADID-10k, demonstrating substantial robustness gains. To our knowledge, BiRQA is the only FR IQA model combining competitive accuracy with real-time throughput and strong adversarial resilience.
Paper Structure (27 sections, 1 theorem, 20 equations, 6 figures, 10 tables, 1 algorithm)

This paper contains 27 sections, 1 theorem, 20 equations, 6 figures, 10 tables, 1 algorithm.

Key Result

Theorem 1.1

Assume: Then Moreover, if the worst-error index $j^\star\in\arg\max_j|\tilde{y}_j-y_j|$ happens to be an anchor ($j^\star\in\mathcal{S}$), then $E\le \varepsilon$.

Figures (6)

  • Figure 1: Overall scheme of BiRQA. A reference–distorted pair yields four feature maps per pyramid level. Cross-Scale Residual Attention Module (CSRAM) and Spatial Cross-Gating Block (SCGB) allow the model to pass information in both directions between scales. A Reliability-Aware Head (GeM + dual MLPs) estimates per-level impact and reliability.
  • Figure 2: Scheme of the Cross-Scale Residual Attention Module (CSRAM) that lifts high-resolution cues to the next scale and injects them via uncertainty-aware gated residuals (strength $\alpha$, confidence $\rho$, and residual $R$) to refine the lower-resolution features.
  • Figure 3: Computational efficiency (FPS) vs. Performance (PLCC) comparison on PDAP-HDDS dataset with image size of $1920 \times 1080$ pixels. Our model achieves comparable PLCC to SOTA method TOPIQ, while being $\sim3$ times faster and having a notably smaller number of parameters.
  • Figure 4: SROCC and inference FPS on KADID-10k for different feature sets. The chosen quartet lies on the Pareto frontier, offering the best accuracy. FPS was measured on images with $1920\times1080$ resolution.
  • Figure 5: (a): Convergence of the Anchored Ranking Loss over 1,000 iterations. (b): Comparison of bounds, provided by Theorem 1 with empirical values of maximum pointwise errors.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1.1: Pointwise $\ell_\infty$ bound from max-near anchored hinge
  • proof