Table of Contents
Fetching ...

An Adaptive CMSA for Solving the Longest Filled Common Subsequence Problem with an Application in Audio Querying

Marko Djukanovic, Christian Blum, Aleksandar Kartelj, Ana Nikolikj, Guenther Raidl

TL;DR

This paper addresses the Longest Filled Common Subsequence problem, a challenging NP-hard problem with applications in bioinformatics, including gene mutation prediction and genomic data reconstruction, with an adaptive Construct, Merge, Solve, Adapt (CMSA) framework that achieves state-of-the-art performance.

Abstract

This paper addresses the Longest Filled Common Subsequence (LFCS) problem, a challenging NP-hard problem with applications in bioinformatics, including gene mutation prediction and genomic data reconstruction. Existing approaches, including exact, metaheuristic, and approximation algorithms, have primarily been evaluated on small-sized instances, which offer limited insights into their scalability. In this work, we introduce a new benchmark dataset with significantly larger instances and demonstrate that existing datasets lack the discriminative power needed to meaningfully assess algorithm performance at scale. To solve large instances efficiently, we utilize an adaptive Construct, Merge, Solve, Adapt (CMSA) framework that iteratively generates promising subproblems via component-based construction and refines them using feedback from prior iterations. Subproblems are solved using an external black-box solver. Extensive experiments on both standard and newly introduced benchmarks show that the proposed adaptive CMSA achieves state-of-the-art performance, outperforming five leading methods. Notably, on 1,510 problem instances with known optimal solutions, our approach solves 1,486 of them -- achieving over 99.9% optimal solution quality and demonstrating exceptional scalability. We additionally propose a novel application of LFCS for song identification from degraded audio excerpts as an engineering contribution, using real-world energy-profile instances from popular music. Finally, we conducted an empirical explainability analysis to identify critical feature combinations influencing algorithm performance, i.e., the key problem features contributing to success or failure of the approaches across different instance types are revealed.

An Adaptive CMSA for Solving the Longest Filled Common Subsequence Problem with an Application in Audio Querying

TL;DR

This paper addresses the Longest Filled Common Subsequence problem, a challenging NP-hard problem with applications in bioinformatics, including gene mutation prediction and genomic data reconstruction, with an adaptive Construct, Merge, Solve, Adapt (CMSA) framework that achieves state-of-the-art performance.

Abstract

This paper addresses the Longest Filled Common Subsequence (LFCS) problem, a challenging NP-hard problem with applications in bioinformatics, including gene mutation prediction and genomic data reconstruction. Existing approaches, including exact, metaheuristic, and approximation algorithms, have primarily been evaluated on small-sized instances, which offer limited insights into their scalability. In this work, we introduce a new benchmark dataset with significantly larger instances and demonstrate that existing datasets lack the discriminative power needed to meaningfully assess algorithm performance at scale. To solve large instances efficiently, we utilize an adaptive Construct, Merge, Solve, Adapt (CMSA) framework that iteratively generates promising subproblems via component-based construction and refines them using feedback from prior iterations. Subproblems are solved using an external black-box solver. Extensive experiments on both standard and newly introduced benchmarks show that the proposed adaptive CMSA achieves state-of-the-art performance, outperforming five leading methods. Notably, on 1,510 problem instances with known optimal solutions, our approach solves 1,486 of them -- achieving over 99.9% optimal solution quality and demonstrating exceptional scalability. We additionally propose a novel application of LFCS for song identification from degraded audio excerpts as an engineering contribution, using real-world energy-profile instances from popular music. Finally, we conducted an empirical explainability analysis to identify critical feature combinations influencing algorithm performance, i.e., the key problem features contributing to success or failure of the approaches across different instance types are revealed.

Paper Structure

This paper contains 20 sections, 6 equations, 7 figures, 12 tables, 2 algorithms.

Figures (7)

  • Figure 1: Visualization of an LFCSP solution.
  • Figure 2: Plot showing number of instances where optimal solution is found for all six approaches on dataset Small.
  • Figure 3: Pairwise post-hoc statistical analysis: the results of two benchmark sets combined.
  • Figure 4: SHAP summary plots: Show the global feature importance and the direction of influence of different feature values (i.e., low in blue to high in red) on the benchmark set Small (the test data).
  • Figure 5: SHAP summary plots: Show the global feature importance and the direction of influence of different feature values (i.e., low in blue to high in red) on the benchmark set Large (the test data).
  • ...and 2 more figures

Theorems & Definitions (1)

  • Example 1