Table of Contents
Fetching ...

Fast Audio Codec Identification Using Overlapping LCS

Farzane Jafari

TL;DR

The paper tackles fast audio codec identification in network transmissions by introducing features derived from overlapped longest common subsequences and substrings, combined with random and chaotic statistics. It leverages PCA for feature selection and classifiers such as SVM and Decision Tree, achieving about 97% accuracy on 8 KB packets. A key practical contribution is fragmenting 8 KB packets into fifteen overlapping 1 KB sub-packets, yielding roughly an 8x speedup in LCS feature extraction without compromising performance. This approach supports real-time codec identification in security- and privacy-conscious network environments, with potential impact on intrusion detection, traffic filtering, and secure communications monitoring.

Abstract

Audio data are widely exchanged over telecommunications networks. Due to the limitations of network resources, these data are typically compressed before transmission. Various methods are available for compressing audio data. To access such audio information, it is first necessary to identify the codec used for compression. One of the most effective approaches for audio codec identification involves analyzing the content of received packets. In these methods, statistical features extracted from the packets are utilized to determine the codec employed. This paper proposes a novel method for audio codec classification based on features derived from the overlapped longest common sub-string and sub-sequence (LCS). The simulation results, which achieved an accuracy of 97% for 8 KB packets, demonstrate the superiority of the proposed method over conventional approaches. This method divides each 8 KB packet into fifteen 1 KB packets with a 50% overlap. The results indicate that this division has no significant impact on the simulation outcomes, while significantly speeding up the feature extraction, being eight times faster than the traditional method for extracting LCS features.

Fast Audio Codec Identification Using Overlapping LCS

TL;DR

The paper tackles fast audio codec identification in network transmissions by introducing features derived from overlapped longest common subsequences and substrings, combined with random and chaotic statistics. It leverages PCA for feature selection and classifiers such as SVM and Decision Tree, achieving about 97% accuracy on 8 KB packets. A key practical contribution is fragmenting 8 KB packets into fifteen overlapping 1 KB sub-packets, yielding roughly an 8x speedup in LCS feature extraction without compromising performance. This approach supports real-time codec identification in security- and privacy-conscious network environments, with potential impact on intrusion detection, traffic filtering, and secure communications monitoring.

Abstract

Audio data are widely exchanged over telecommunications networks. Due to the limitations of network resources, these data are typically compressed before transmission. Various methods are available for compressing audio data. To access such audio information, it is first necessary to identify the codec used for compression. One of the most effective approaches for audio codec identification involves analyzing the content of received packets. In these methods, statistical features extracted from the packets are utilized to determine the codec employed. This paper proposes a novel method for audio codec classification based on features derived from the overlapped longest common sub-string and sub-sequence (LCS). The simulation results, which achieved an accuracy of 97% for 8 KB packets, demonstrate the superiority of the proposed method over conventional approaches. This method divides each 8 KB packet into fifteen 1 KB packets with a 50% overlap. The results indicate that this division has no significant impact on the simulation outcomes, while significantly speeding up the feature extraction, being eight times faster than the traditional method for extracting LCS features.

Paper Structure

This paper contains 23 sections, 10 equations, 6 tables.