scikit-fda: A Python Package for Functional Data Analysis

Carlos Ramos-Carreño; José Luis Torrecilla; Miguel Carbajo-Berrocal; Pablo Marcos; Alberto Suárez

scikit-fda: A Python Package for Functional Data Analysis

Carlos Ramos-Carreño, José Luis Torrecilla, Miguel Carbajo-Berrocal, Pablo Marcos, Alberto Suárez

TL;DR

scikit-fda addresses the need for a comprehensive FDA toolkit in Python by providing two complementary representations for functional data—discretized grids and basis expansions—and a unified FData interface. The library enables full FDA workflows, including interpolation, derivatives, and regularization, plus powerful preprocessing (smoothing, registration, FPCA, variable selection) and exploratory analysis (depth, robust statistics, functional boxplots), all tightly integrated with scikit-learn pipelines. It also offers synthetic and real-world datasets, interactive visualization, and rigorous documentation and testing, facilitating reproducible research and easy adoption. By embedding FDA functionality within the Python ecosystem and adhering to BSD licensing, scikit-fda enhances accessibility, interoperability, and scalability for functional-data analysis in scientific computing and ML contexts.

Abstract

The library scikit-fda is a Python package for Functional Data Analysis (FDA). It provides a comprehensive set of tools for representation, preprocessing, and exploratory analysis of functional data. The library is built upon and integrated in Python's scientific ecosystem. In particular, it conforms to the scikit-learn application programming interface so as to take advantage of the functionality for machine learning provided by this package: pipelines, model selection, and hyperparameter tuning, among others. The scikit-fda package has been released as free and open-source software under a 3-Clause BSD license and is open to contributions from the FDA community. The library's extensive documentation includes step-by-step tutorials and detailed examples of use.

scikit-fda: A Python Package for Functional Data Analysis

TL;DR

Abstract

Paper Structure (24 sections, 22 equations, 16 figures)

This paper contains 24 sections, 22 equations, 16 figures.

Introduction
Representation of functional data in scikit-fda
The class FData
Discretized representation: The class FDataGrid
Basis expansion representation: The class FDataBasis
Interpolation and extrapolation
Derivatives
Regularization
Functionality of scikit-fda
Generation of synthetic data
Real-world data
Preprocessing
Smoothing
Registration
Dimensionality reduction
...and 9 more sections

Figures (16)

Figure 1: Functional observations in discretized form. The quantity $x_n(t_j)$ represents the value of the $n$-th trajectory at $t_j$.
Figure 2: First five elements of the bases available in scikit-fda: Monomial (left), B-splines (center), and Fourier (right).
Figure 3: Different representations of the first ten trajectories of the Phoneme dataset. From left to right: original trajectories, B-spline, and Fourier basis representation. In both cases, $5$ basis functions are considered
Figure 4: Smoothed representation of the first ten trajectories of the Phoneme dataset in a B-spline basis with $40$ basis functions for different values for the regularization parameter $\lambda \ge 0$; from left to right: $\lambda = 0$ (no regularization), $\lambda = 1$, and $\lambda = 10$.
Figure 5: Trajectories sampled from Gaussian processes with different covariance functions. From left to right, standard Brownian, Exponential ($l=1$) and RBF ($l=0.1$).
...and 11 more figures

scikit-fda: A Python Package for Functional Data Analysis

TL;DR

Abstract

scikit-fda: A Python Package for Functional Data Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (16)