Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective

Chen Chen; Xiaolou Li; Zehua Liu; Lantian Li; Dong Wang

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective

Chen Chen, Xiaolou Li, Zehua Liu, Lantian Li, Dong Wang

TL;DR

A quantitative analysis based on information theory, focusing on information intersection between different modalities, is presented, showing that this analysis is valuable for understanding the difficulties of audio-visual processing tasks as well as the benefits that could be obtained by modality integration.

Abstract

In the field of spoken language processing, audio-visual speech processing is receiving increasing research attention. Key components of this research include tasks such as lip reading, audio-visual speech recognition, and visual-to-speech synthesis. Although significant success has been achieved, theoretical analysis is still insufficient for audio-visual tasks. This paper presents a quantitative analysis based on information theory, focusing on information intersection between different modalities. Our results show that this analysis is valuable for understanding the difficulties of audio-visual processing tasks as well as the benefits that could be obtained by modality integration.

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective

TL;DR

Abstract

Paper Structure (15 sections, 6 equations, 1 figure, 4 tables, 1 algorithm)

This paper contains 15 sections, 6 equations, 1 figure, 4 tables, 1 algorithm.

Introduction
Related Works
Multivariate Information Analysis
Information-Theoretical Analysis in Machine Learning
Method
Entropy-based Quantitative Multimodal Information Model
Clustering-based Entropy Estimation
Experiments
Data
Experimental Settings
Main results
Ablation study
Number of clusters
Raw and deep features
Conclusions

Figures (1)

Figure 1: Information diagram computed based on CNVSRC-Multi, using deep features. Note that only the information in the black box is related to the purpose of conversion/speech. The auditory and visual signals partly represent the purpose but also involve some subtle information that is not clearly shown.

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective

TL;DR

Abstract

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (1)