Table of Contents
Fetching ...

Genetic Algorithms in Regression

Mo Li, QiQi Lu, Robert Lund, Xueheng Shi

Abstract

Many statistical problems involve optimization over a discrete parameter space having an unknown dimension. In such settings, gradient-based methods often fail due to the non-differentiability of the objective function or a non-convex or massive search space with an objective function having many local maxima/minima. This paper presents GAReg, a unified genetic algorithm package that handles discrete optimization regression problems, which works well when standard algorithms are unjustified. GAReg provides a compact chromosome representation supporting optimal knot placement for regression splines, best-subset regression variable selection, and related problems. The package allows for uniform initialization, constraint-preserving crossover and mutation, steady-state replacement, and an optional island-model parallelization. GAReg efficiently searches high-dimensional model spaces, providing near-optimal solutions in settings where exhaustive enumeration or integer or dynamic programming approaches are infeasible.

Genetic Algorithms in Regression

Abstract

Many statistical problems involve optimization over a discrete parameter space having an unknown dimension. In such settings, gradient-based methods often fail due to the non-differentiability of the objective function or a non-convex or massive search space with an objective function having many local maxima/minima. This paper presents GAReg, a unified genetic algorithm package that handles discrete optimization regression problems, which works well when standard algorithms are unjustified. GAReg provides a compact chromosome representation supporting optimal knot placement for regression splines, best-subset regression variable selection, and related problems. The package allows for uniform initialization, constraint-preserving crossover and mutation, steady-state replacement, and an optional island-model parallelization. GAReg efficiently searches high-dimensional model spaces, providing near-optimal solutions in settings where exhaustive enumeration or integer or dynamic programming approaches are infeasible.
Paper Structure (11 sections, 14 equations, 2 figures)

This paper contains 11 sections, 14 equations, 2 figures.

Figures (2)

  • Figure 1: The red dashed curve is a B-spline fit obtained using GAReg optimized knots with the single population GA model (four knots were chosen); the blue solid curve is a spline fit based on five equal-quantile knots.
  • Figure 2: A joinpoint analysis of the Earth's temperatures over the last two thousand years. The red curve depicts a GAReg optimized fit with 11 knots, while the blue curve fit is obtained using 10 equally spaced knots.