A GRASP-based memetic algorithm with path relinking for the far from most string problem
José E. Gallardo, Carlos Cotta
TL;DR
The Far From Most String Problem (FFMSP) seeks a string whose Hamming distance ${\cal{HD}}$ to as many input strings as possible meets a threshold $d$, a notoriously hard combinatorial task. The authors propose a memetic algorithm that marries GRASP-based population initialization, path relinking for recombination, and hill-climbing local search, guided by a problem-specific heuristic $h$ and a carefully precomputed $T$ table to accelerate evaluation. Through extensive experiments on random and real bioinformatics-inspired instances, the MA consistently outperforms state-of-the-art GRASP-based methods with strong statistical support, and shows favorable sensitivity to initialization greediness and the use of path relinking. The results extend to very large real datasets, where the MA achieves substantial improvements, underscoring its potential as a competitive tool for SSPs in biology and related domains; future work includes parallelization and adaptation to related problems in bioinformatics.
Abstract
The FAR FROM MOST STRING PROBLEM (FFMSP) is a string selection problem. The objective is to find a string whose distance to other strings in a certain input set is above a given threshold for as many of those strings as possible. This problem has links with some tasks in computational biology and its resolution has been shown to be very hard. We propose a memetic algorithm (MA) to tackle the FFMSP. This MA exploits a heuristic objective function for the problem and features initialization of the population via a Greedy Randomized Adaptive Search Procedure (GRASP) metaheuristic, intensive recombination via path relinking and local improvement via hill climbing. An extensive empirical evaluation using problem instances of both random and biological origin is done to assess parameter sensitivity and draw performance comparisons with other state-of-the-art techniques. The MA is shown to perform better than these latter techniques with statistical significance.
