Converting Time Series Data to Numeric Representations Using Alphabetic Mapping and k-mer strategy
Sarwan Ali, Tamkanat E Ali, Imdad Ullah Khan, Murray Patterson
TL;DR
This work addresses the challenge of extracting meaningful patterns from complex time series by transforming signals into alphabetic sequences using a $26$-range mapping and applying $k$-mer based, bioinformatics-inspired analysis. The proposed pipeline flattens time series, computes range boundaries, maps values to letters, and builds a spectrum of $k$-mer counts to form a numeric embedding suitable for conventional classifiers. Experimental results on a smartphone-based human activity dataset show the approach improves age-prediction performance over a baseline and outperforms several deep-learning baselines, with statistically significant gains. The method offers a resource-efficient, interpretable alternative for time series classification and opens avenues for transfer learning from biological sequence analysis to time series domains.
Abstract
In the realm of data analysis and bioinformatics, representing time series data in a manner akin to biological sequences offers a novel approach to leverage sequence analysis techniques. Transforming time series signals into molecular sequence-type representations allows us to enhance pattern recognition by applying sophisticated sequence analysis techniques (e.g. $k$-mers based representation) developed in bioinformatics, uncovering hidden patterns and relationships in complex, non-linear time series data. This paper proposes a method to transform time series signals into biological/molecular sequence-type representations using a unique alphabetic mapping technique. By generating 26 ranges corresponding to the 26 letters of the English alphabet, each value within the time series is mapped to a specific character based on its range. This conversion facilitates the application of sequence analysis algorithms, typically used in bioinformatics, to analyze time series data. We demonstrate the effectiveness of this approach by converting real-world time series signals into character sequences and performing sequence classification. The resulting sequences can be utilized for various sequence-based analysis techniques, offering a new perspective on time series data representation and analysis.
