Table of Contents
Fetching ...

scikit-fda: A Python Package for Functional Data Analysis

Carlos Ramos-Carreño, José Luis Torrecilla, Miguel Carbajo-Berrocal, Pablo Marcos, Alberto Suárez

TL;DR

scikit-fda addresses the need for a comprehensive FDA toolkit in Python by providing two complementary representations for functional data—discretized grids and basis expansions—and a unified FData interface. The library enables full FDA workflows, including interpolation, derivatives, and regularization, plus powerful preprocessing (smoothing, registration, FPCA, variable selection) and exploratory analysis (depth, robust statistics, functional boxplots), all tightly integrated with scikit-learn pipelines. It also offers synthetic and real-world datasets, interactive visualization, and rigorous documentation and testing, facilitating reproducible research and easy adoption. By embedding FDA functionality within the Python ecosystem and adhering to BSD licensing, scikit-fda enhances accessibility, interoperability, and scalability for functional-data analysis in scientific computing and ML contexts.

Abstract

The library scikit-fda is a Python package for Functional Data Analysis (FDA). It provides a comprehensive set of tools for representation, preprocessing, and exploratory analysis of functional data. The library is built upon and integrated in Python's scientific ecosystem. In particular, it conforms to the scikit-learn application programming interface so as to take advantage of the functionality for machine learning provided by this package: pipelines, model selection, and hyperparameter tuning, among others. The scikit-fda package has been released as free and open-source software under a 3-Clause BSD license and is open to contributions from the FDA community. The library's extensive documentation includes step-by-step tutorials and detailed examples of use.

scikit-fda: A Python Package for Functional Data Analysis

TL;DR

scikit-fda addresses the need for a comprehensive FDA toolkit in Python by providing two complementary representations for functional data—discretized grids and basis expansions—and a unified FData interface. The library enables full FDA workflows, including interpolation, derivatives, and regularization, plus powerful preprocessing (smoothing, registration, FPCA, variable selection) and exploratory analysis (depth, robust statistics, functional boxplots), all tightly integrated with scikit-learn pipelines. It also offers synthetic and real-world datasets, interactive visualization, and rigorous documentation and testing, facilitating reproducible research and easy adoption. By embedding FDA functionality within the Python ecosystem and adhering to BSD licensing, scikit-fda enhances accessibility, interoperability, and scalability for functional-data analysis in scientific computing and ML contexts.

Abstract

The library scikit-fda is a Python package for Functional Data Analysis (FDA). It provides a comprehensive set of tools for representation, preprocessing, and exploratory analysis of functional data. The library is built upon and integrated in Python's scientific ecosystem. In particular, it conforms to the scikit-learn application programming interface so as to take advantage of the functionality for machine learning provided by this package: pipelines, model selection, and hyperparameter tuning, among others. The scikit-fda package has been released as free and open-source software under a 3-Clause BSD license and is open to contributions from the FDA community. The library's extensive documentation includes step-by-step tutorials and detailed examples of use.
Paper Structure (24 sections, 22 equations, 16 figures)

This paper contains 24 sections, 22 equations, 16 figures.

Figures (16)

  • Figure 1: Functional observations in discretized form. The quantity $x_n(t_j)$ represents the value of the $n$-th trajectory at $t_j$.
  • Figure 2: First five elements of the bases available in scikit-fda: Monomial (left), B-splines (center), and Fourier (right).
  • Figure 3: Different representations of the first ten trajectories of the Phoneme dataset. From left to right: original trajectories, B-spline, and Fourier basis representation. In both cases, $5$ basis functions are considered
  • Figure 4: Smoothed representation of the first ten trajectories of the Phoneme dataset in a B-spline basis with $40$ basis functions for different values for the regularization parameter $\lambda \ge 0$; from left to right: $\lambda = 0$ (no regularization), $\lambda = 1$, and $\lambda = 10$.
  • Figure 5: Trajectories sampled from Gaussian processes with different covariance functions. From left to right, standard Brownian, Exponential ($l=1$) and RBF ($l=0.1$).
  • ...and 11 more figures