Table of Contents
Fetching ...

Reachability Analysis for Lexicase Selection via Community Assembly Graphs

Emily Dolson, Alexander Lalejini

TL;DR

Lexicase selection presents population-dependent dynamics that are not captured by static fitness landscapes. This work introduces ecology-inspired community assembly graphs, where nodes denote stable phenotypic communities and edges model invasions by mutationally adjacent phenotypes, and stability is evaluated over a generation horizon using $P_{lex}$ and $P_{survival}$; reachability is then analyzed via graph traversal, with a hitting-probability based, truncated exploration and a PageRank-inspired reduction to handle cycles. Proof-of-concept experiments on NK landscapes with $N=3$, $K=2$ and on SignalGP-based genetic programming problems show that the graphs predict end states and reveal when optimal solutions may be unreachable under lexicase selection, depending on mutation rate and landscape structure. The approach provides a principled, graph-based toolkit to analyze ecological dynamics in evolutionary algorithms and to compare how subtle changes to selection schemes affect the set of reachable optima. This framework broadens reachability analysis beyond traditional fitness landscapes and offers a pathway to quantify and compare evolutionary dynamics across problems and representations.

Abstract

Fitness landscapes have historically been a powerful tool for analyzing the search space explored by evolutionary algorithms. In particular, they facilitate understanding how easily reachable an optimal solution is from a given starting point. However, simple fitness landscapes are inappropriate for analyzing the search space seen by selection schemes like lexicase selection in which the outcome of selection depends heavily on the current contents of the population (i.e. selection schemes with complex ecological dynamics). Here, we propose borrowing a tool from ecology to solve this problem: community assembly graphs. We demonstrate a simple proof-of-concept for this approach on an NK Landscape where we have perfect information. We then demonstrate that this approach can be successfully applied to a complex genetic programming problem. While further research is necessary to understand how to best use this tool, we believe it will be a valuable addition to our toolkit and facilitate analyses that were previously impossible.

Reachability Analysis for Lexicase Selection via Community Assembly Graphs

TL;DR

Lexicase selection presents population-dependent dynamics that are not captured by static fitness landscapes. This work introduces ecology-inspired community assembly graphs, where nodes denote stable phenotypic communities and edges model invasions by mutationally adjacent phenotypes, and stability is evaluated over a generation horizon using and ; reachability is then analyzed via graph traversal, with a hitting-probability based, truncated exploration and a PageRank-inspired reduction to handle cycles. Proof-of-concept experiments on NK landscapes with , and on SignalGP-based genetic programming problems show that the graphs predict end states and reveal when optimal solutions may be unreachable under lexicase selection, depending on mutation rate and landscape structure. The approach provides a principled, graph-based toolkit to analyze ecological dynamics in evolutionary algorithms and to compare how subtle changes to selection schemes affect the set of reachable optima. This framework broadens reachability analysis beyond traditional fitness landscapes and offers a pathway to quantify and compare evolutionary dynamics across problems and representations.

Abstract

Fitness landscapes have historically been a powerful tool for analyzing the search space explored by evolutionary algorithms. In particular, they facilitate understanding how easily reachable an optimal solution is from a given starting point. However, simple fitness landscapes are inappropriate for analyzing the search space seen by selection schemes like lexicase selection in which the outcome of selection depends heavily on the current contents of the population (i.e. selection schemes with complex ecological dynamics). Here, we propose borrowing a tool from ecology to solve this problem: community assembly graphs. We demonstrate a simple proof-of-concept for this approach on an NK Landscape where we have perfect information. We then demonstrate that this approach can be successfully applied to a complex genetic programming problem. While further research is necessary to understand how to best use this tool, we believe it will be a valuable addition to our toolkit and facilitate analyses that were previously impossible.
Paper Structure (18 sections, 2 equations, 6 figures, 1 table)

This paper contains 18 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: A simple community assembly graph. This graph represents lexicase selection on a representative NK fitness landscape with N=3, K=2. The fitness contributions of each of the three genes function as the three fitness criteria for lexicase selection. Node labels indicate the ids of genotypes that are present in the community represented by each node. The genotype and phenotype corresponding to each id are shown below the graph. Edge labels indicate which genotype was added to the community to cause a transition from the edge's source node to its destination node. Here, for simplicity, evolution is arbitrarily assumed to have started from a population containing only genotype 0. The two mutationally adjacent genotypes to 0 are 1 and 2, so we consider the effect of adding either of them to the starting community. Both genotype 1 and genotype 2 can stably coexist with genotype 0 (they each have a fitness criterion on which they outperform genotype 0 and a fitness criterion on which they do not), so adding either one results in a two-genotype community. Ultimately, there are two different sink nodes in this graph; evolution will likely stagnate when it reaches either of them. Because genotype 1 only appears in one of the sink nodes, we can conclude that it will probably not always be found, despite having the highest score on objective 1.
  • Figure 2: Community assembly graph for the NK fitness landscape used in these experiments. N=3, K=2. The starting population is assumed to contain only the bitstring 000. Starting node is blue, sink nodes are red.
  • Figure 3: Community assembly graph of the 100 most accessible communities for the grade problem. Node colors and sizes indicate the PageRank of each node, which translates to the probability of a random walk ending on each node. Edge colors indicate the probability of choosing each edge. Nodes are arranged along the y axis according to how far away from the starting node they are (measured as shortest path). The starting node (representing a community containing only the worst-performing phenotype) is the lowest node on the y axis. Note that this portion of the graph contains only one true sink node (outlined in red). In this case, that node represents the optimal solution, indicating that the solution for this problem is indeed reachable.
  • Figure 4: Community assembly graph of the 100 most accessible communities for the median problem. Node colors and sizes indicate the PageRank of each node, which translates to the probability of a random walk ending on each node. Edge colors indicate the probability of choosing each edge. Nodes are arranged along the y axis according to how far away from the starting node they are (measured as shortest path). The starting node (representing a community containing only the worst-performing phenotype) is the lowest node on the y axis. This portion of the graph contains only one true sink node (outlined in red), which represents the optimal solution.
  • Figure 5: Community assembly graph of the 100 most accessible communities for the FizzBuzz problem. Node colors and sizes indicate the PageRank of each node. Edge colors indicate the probability of choosing each edge. Nodes are arranged along the y axis according to how far away from the starting node they are (measured as shortest path). The starting node (representing a community containing only the worst-performing phenotype) is the lowest node on the y axis. Note that this portion of the graph contains only one true sink node (outlined in red). In this case, that node represents the optimal solution, indicating that the solution for this problem is indeed reachable. However, this node has very low PageRank, indicating that reaching it is relatively unlikely.
  • ...and 1 more figures

Theorems & Definitions (1)

  • definition 1