Improvement Of Audiovisual Quality Estimation Using A Nonlinear Autoregressive Exogenous Neural Network And Bitstream Parameters
Koffi Kossi, Stephane Coulombe, Christian Desrosiers, Ghyslain Gagnon
TL;DR
The paper addresses no-reference audiovisual quality estimation for videoconferencing under varying network conditions. It proposes a nonlinear autoregressive exogenous (NARX) neural network that ingests QoS parameters and outputs the mean opinion score (MOS), capturing temporal dynamics through delayed inputs and outputs. On the INRS Bitstream Audiovisual Dataset, the NARX model outperforms state-of-the-art methods (e.g., MLP, Random Forest, Bagging) with a best configuration achieving $\text{MSE}=0.144$ and $R=0.932$, using $d_y= d_u=3$. The approach offers a practical, real-time QoS-driven AV quality estimation method with a single hidden layer, suggesting significant potential for network-adaptive videoconferencing optimization; future work may extend to additional QoS parameters.
Abstract
With the increasing demand for audiovisual services, telecom service providers and application developers are compelled to ensure that their services provide the best possible user experience. Particularly, services such as videoconferencing are very sensitive to network conditions. Therefore, their performance should be monitored in real time in order to adjust parameters to any network perturbation. In this paper, we developed a parametric model for estimating the perceived audiovisual quality in videoconference services. Our model is developed with the nonlinear autoregressive exogenous (NARX) recurrent neural network and estimates the perceived quality in terms of mean opinion score (MOS). We validate our model using the publicly available INRS bitstream audiovisual quality dataset. This dataset contains bitstream parameters such as loss per frame, bit rate and video duration. We compare the proposed model against state-of-the-art methods based on machine learning and show our model to outperform these methods in terms of mean square error (MSE=0.150) and Pearson correlation coefficient (R=0.931)
