Table of Contents
Fetching ...

Function Class Learning with Genetic Programming: Towards Explainable Meta Learning for Tumor Growth Functionals

E. M. C. Sijben, J. C. Jansen, P. A. N. Bosman, T. Alderliesten

TL;DR

This work introduces FC-GOMEA, a GP-based framework that learns a set of function classes representing overarching growth patterns across multiple tumor data sets, with local per-dataset constants $c^{\mathrm{FC}}$ to tailor predictions. By coupling GP-GOMEA with RV-GOMEA and employing a multi-modal, multi-objective search (MM-GP-GOMEA), FC-GOMEA discovers diverse, interpretable function classes that explain tumor growth while enabling dataset-specific refinements. The method is validated on both synthetic data that reflect known logistic and Gompertz growth forms and a real-world paraganglioma dataset, demonstrating the ability to recover meaningful classes and to provide per-tumor predictions via learned constants. Although computationally intensive, the approach yields explainable meta-learning outputs that can be used to assess growth patterns and uncertainty across related patient data, potentially informing treatment timing decisions.

Abstract

Paragangliomas are rare, primarily slow-growing tumors for which the underlying growth pattern is unknown. Therefore, determining the best care for a patient is hard. Currently, if no significant tumor growth is observed, treatment is often delayed, as treatment itself is not without risk. However, by doing so, the risk of (irreversible) adverse effects due to tumor growth may increase. Being able to predict the growth accurately could assist in determining whether a patient will need treatment during their lifetime and, if so, the timing of this treatment. The aim of this work is to learn the general underlying growth pattern of paragangliomas from multiple tumor growth data sets, in which each data set contains a tumor's volume over time. To do so, we propose a novel approach based on genetic programming to learn a function class, i.e., a parameterized function that can be fit anew for each tumor. We do so in a unique, multi-modal, multi-objective fashion to find multiple potentially interesting function classes in a single run. We evaluate our approach on a synthetic and a real-world data set. By analyzing the resulting function classes, we can effectively explain the general patterns in the data.

Function Class Learning with Genetic Programming: Towards Explainable Meta Learning for Tumor Growth Functionals

TL;DR

This work introduces FC-GOMEA, a GP-based framework that learns a set of function classes representing overarching growth patterns across multiple tumor data sets, with local per-dataset constants to tailor predictions. By coupling GP-GOMEA with RV-GOMEA and employing a multi-modal, multi-objective search (MM-GP-GOMEA), FC-GOMEA discovers diverse, interpretable function classes that explain tumor growth while enabling dataset-specific refinements. The method is validated on both synthetic data that reflect known logistic and Gompertz growth forms and a real-world paraganglioma dataset, demonstrating the ability to recover meaningful classes and to provide per-tumor predictions via learned constants. Although computationally intensive, the approach yields explainable meta-learning outputs that can be used to assess growth patterns and uncertainty across related patient data, potentially informing treatment timing decisions.

Abstract

Paragangliomas are rare, primarily slow-growing tumors for which the underlying growth pattern is unknown. Therefore, determining the best care for a patient is hard. Currently, if no significant tumor growth is observed, treatment is often delayed, as treatment itself is not without risk. However, by doing so, the risk of (irreversible) adverse effects due to tumor growth may increase. Being able to predict the growth accurately could assist in determining whether a patient will need treatment during their lifetime and, if so, the timing of this treatment. The aim of this work is to learn the general underlying growth pattern of paragangliomas from multiple tumor growth data sets, in which each data set contains a tumor's volume over time. To do so, we propose a novel approach based on genetic programming to learn a function class, i.e., a parameterized function that can be fit anew for each tumor. We do so in a unique, multi-modal, multi-objective fashion to find multiple potentially interesting function classes in a single run. We evaluate our approach on a synthetic and a real-world data set. By analyzing the resulting function classes, we can effectively explain the general patterns in the data.
Paper Structure (17 sections, 5 equations, 6 figures, 1 table, 3 algorithms)

This paper contains 17 sections, 5 equations, 6 figures, 1 table, 3 algorithms.

Figures (6)

  • Figure 1: Visualization of function class learning. The global data set (black dots) consists of multiple local data sets (colored dots). A function class $f(x,c^{\mathrm{{FC}}}) = c^{\mathrm{{FC}}} \cdot \text{sin}(x)$ is learned that fits well with each local data set using a different value for function class constant $c^{\mathrm{{FC}}}$.
  • Figure 2: The Function Class GOMEA learning cycle. First, we initialize the population of function classes. Then, we calculate the fitness for each individual by tuning the function class constants (in orange) to each data subset by using RV-GOMEA. Next, we perform variation and selection in the typical optimal mixing way of GOMEA (illustrated in blue), and calculate the fitness again to test whether changes should be accepted.
  • Figure 3: Visualisation of FC-GOMEA. Each local data set is either exponential or linear. The top scatter plot shows the approximation front with the trade-offs of the $\mathrm{MSE}_\mathrm{global}$ and the $\mathrm{DMSE}_\mathrm{global}$. Individual I has the lowest $\mathrm{MSE}_\mathrm{global}$ and thus the individual function classes fit the best on all local data sets, but there is no gain in using the classes together, because they are the same. Individual III has the lowest $\mathrm{DMSE}_\mathrm{global}$ and thus there is the most gain in using the function classes together. Individual II is somewhere in the middle: there is merit to using the function classes together, but at the same time, they fit relatively well on all local data sets. By utilizing multi-class learning we recover both function classes.
  • Figure 4: Histogram for number of times the correct function classes were found within any multi-tree of the full archive.
  • Figure 5: Convergence plots for optimization using FC-GOMEA. Each color is a different run (different seed). It shows the HV of the global data set as function of the number of generations. Notice that if a line stops before 30 generations, it means run was terminated due to the time budget. In each row, the convergence for a different global data set is shown. The columns represent the different batchsizes and number of data points used for learning the $c^{\mathrm{FC}}\mathrm{s}$.
  • ...and 1 more figures