Reachability Analysis for Lexicase Selection via Community Assembly Graphs
Emily Dolson, Alexander Lalejini
TL;DR
Lexicase selection presents population-dependent dynamics that are not captured by static fitness landscapes. This work introduces ecology-inspired community assembly graphs, where nodes denote stable phenotypic communities and edges model invasions by mutationally adjacent phenotypes, and stability is evaluated over a generation horizon using $P_{lex}$ and $P_{survival}$; reachability is then analyzed via graph traversal, with a hitting-probability based, truncated exploration and a PageRank-inspired reduction to handle cycles. Proof-of-concept experiments on NK landscapes with $N=3$, $K=2$ and on SignalGP-based genetic programming problems show that the graphs predict end states and reveal when optimal solutions may be unreachable under lexicase selection, depending on mutation rate and landscape structure. The approach provides a principled, graph-based toolkit to analyze ecological dynamics in evolutionary algorithms and to compare how subtle changes to selection schemes affect the set of reachable optima. This framework broadens reachability analysis beyond traditional fitness landscapes and offers a pathway to quantify and compare evolutionary dynamics across problems and representations.
Abstract
Fitness landscapes have historically been a powerful tool for analyzing the search space explored by evolutionary algorithms. In particular, they facilitate understanding how easily reachable an optimal solution is from a given starting point. However, simple fitness landscapes are inappropriate for analyzing the search space seen by selection schemes like lexicase selection in which the outcome of selection depends heavily on the current contents of the population (i.e. selection schemes with complex ecological dynamics). Here, we propose borrowing a tool from ecology to solve this problem: community assembly graphs. We demonstrate a simple proof-of-concept for this approach on an NK Landscape where we have perfect information. We then demonstrate that this approach can be successfully applied to a complex genetic programming problem. While further research is necessary to understand how to best use this tool, we believe it will be a valuable addition to our toolkit and facilitate analyses that were previously impossible.
