TabINR: An Implicit Neural Representation Framework for Tabular Data Imputation
Vincent Ochs, Florentin Bieder, Sidaty el Hadramy, Paul Friedrich, Stephanie Taha-Mehlitz, Anas Taha, Philippe C. Cattin
TL;DR
Missing values in tabular data hamper predictive modeling, especially with heterogeneous feature types and limited samples. The authors propose TabINR, an implicit neural representation framework that treats the table as a neural function $\,\hat{D}_{ij}=f_\theta(\lambda_i,c_j)$ parameterized by learnable row and feature embeddings, with test-time latent optimization to personalize imputations for unseen rows. Compared against classical and deep baselines across twelve real-world datasets and multiple missingness mechanisms, TabINR achieves competitive or superior reconstruction accuracy, particularly in high-dimensional settings, while offering fast inference and a simple, memory-efficient architecture. The work demonstrates INR-based representations as a unified paradigm for tabular learning and points to future extensions for non-random missingness, larger scales, and multimodal data integration, enabling broader applicability in real-world decision-making pipelines.
Abstract
Tabular data builds the basis for a wide range of applications, yet real-world datasets are frequently incomplete due to collection errors, privacy restrictions, or sensor failures. As missing values degrade the performance or hinder the applicability of downstream models, and while simple imputing strategies tend to introduce bias or distort the underlying data distribution, we require imputers that provide high-quality imputations, are robust across dataset sizes and yield fast inference. We therefore introduce TabINR, an auto-decoder based Implicit Neural Representation (INR) framework that models tables as neural functions. Building on recent advances in generalizable INRs, we introduce learnable row and feature embeddings that effectively deal with the discrete structure of tabular data and can be inferred from partial observations, enabling instance adaptive imputations without modifying the trained model. We evaluate our framework across a diverse range of twelve real-world datasets and multiple missingness mechanisms, demonstrating consistently strong imputation accuracy, mostly matching or outperforming classical (KNN, MICE, MissForest) and deep learning based models (GAIN, ReMasker), with the clearest gains on high-dimensional datasets.
