Enhancing Content Representation for AR Image Quality Assessment Using Knowledge Distillation

Aymen Sekhri; Seyed Ali Amirshahi; Mohamed-Chaker Larabi

Enhancing Content Representation for AR Image Quality Assessment Using Knowledge Distillation

Aymen Sekhri, Seyed Ali Amirshahi, Mohamed-Chaker Larabi

TL;DR

This paper tackles AR image quality assessment under visual confusion by introducing TransformAR, a lightweight transformer-based FR-IQA framework that leverages content-aware encoders, shift representations, and cross-attention decoders to capture distortion-related quality information. The approach is enhanced with knowledge distillation, ground-truth class supervision, label smoothing, and elastic-net regularization, yielding three variants: TransformAR, TransformAR-KD, and TransformAR-KD+. Evaluations on the ARIQA dataset show state-of-the-art performance, with TransformAR-KD+ achieving the best metrics and ablation studies highlighting the contribution of each component. The work advances AR-IQA by addressing data scarcity and visual confusion through explicit content representation and distortion-driven reasoning, with implications for QoE optimization and dataset development for AR technologies.

Abstract

Augmented Reality (AR) is a major immersive media technology that enriches our perception of reality by overlaying digital content (the foreground) onto physical environments (the background). It has far-reaching applications, from entertainment and gaming to education, healthcare, and industrial training. Nevertheless, challenges such as visual confusion and classical distortions can result in user discomfort when using the technology. Evaluating AR quality of experience becomes essential to measure user satisfaction and engagement, facilitating the refinement necessary for creating immersive and robust experiences. Though, the scarcity of data and the distinctive characteristics of AR technology render the development of effective quality assessment metrics challenging. This paper presents a deep learning-based objective metric designed specifically for assessing image quality for AR scenarios. The approach entails four key steps, (1) fine-tuning a self-supervised pre-trained vision transformer to extract prominent features from reference images and distilling this knowledge to improve representations of distorted images, (2) quantifying distortions by computing shift representations, (3) employing cross-attention-based decoders to capture perceptual quality features, and (4) integrating regularization techniques and label smoothing to address the overfitting problem. To validate the proposed approach, we conduct extensive experiments on the ARIQA dataset. The results showcase the superior performance of our proposed approach across all model variants, namely TransformAR, TransformAR-KD, and TransformAR-KD+ in comparison to existing state-of-the-art methods.

Enhancing Content Representation for AR Image Quality Assessment Using Knowledge Distillation

TL;DR

Abstract

Enhancing Content Representation for AR Image Quality Assessment Using Knowledge Distillation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)