Table of Contents
Fetching ...

RecTable: Fast Modeling Tabular Data with Rectified Flow

Masane Fuchi, Tomohiro Takagi

TL;DR

RecTable tackles the expensive training costs of diffusion and LLM-based tabular data generation by employing rectified flow with a lightweight GLU-based network. It introduces a mixed-type noise model for numerical and categorical features and uses a logit-normal timestep distribution, while avoiding the reflow step to speed up generation. Across six real-world datasets, RecTable achieves competitive fidelity and superior machine-learning efficiency, often with substantially faster training than state-of-the-art baselines. The results indicate rectified flow, combined with targeted architectural and training choices, as a viable path toward high-quality, time-efficient tabular data synthesis that could surpass diffusion-based approaches with further improvements.

Abstract

Score-based or diffusion models generate high-quality tabular data, surpassing GAN-based and VAE-based models. However, these methods require substantial training time. In this paper, we introduce RecTable, which uses the rectified flow modeling, applied in such as text-to-image generation and text-to-video generation. RecTable features a simple architecture consisting of a few stacked gated linear unit blocks. Additionally, our training strategies are also simple, incorporating a mixed-type noise distribution and a logit-normal timestep distribution. Our experiments demonstrate that RecTable achieves competitive performance compared to the several state-of-the-art diffusion and score-based models while reducing the required training time. Our code is available at https://github.com/fmp453/rectable.

RecTable: Fast Modeling Tabular Data with Rectified Flow

TL;DR

RecTable tackles the expensive training costs of diffusion and LLM-based tabular data generation by employing rectified flow with a lightweight GLU-based network. It introduces a mixed-type noise model for numerical and categorical features and uses a logit-normal timestep distribution, while avoiding the reflow step to speed up generation. Across six real-world datasets, RecTable achieves competitive fidelity and superior machine-learning efficiency, often with substantially faster training than state-of-the-art baselines. The results indicate rectified flow, combined with targeted architectural and training choices, as a viable path toward high-quality, time-efficient tabular data synthesis that could surpass diffusion-based approaches with further improvements.

Abstract

Score-based or diffusion models generate high-quality tabular data, surpassing GAN-based and VAE-based models. However, these methods require substantial training time. In this paper, we introduce RecTable, which uses the rectified flow modeling, applied in such as text-to-image generation and text-to-video generation. RecTable features a simple architecture consisting of a few stacked gated linear unit blocks. Additionally, our training strategies are also simple, incorporating a mixed-type noise distribution and a logit-normal timestep distribution. Our experiments demonstrate that RecTable achieves competitive performance compared to the several state-of-the-art diffusion and score-based models while reducing the required training time. Our code is available at https://github.com/fmp453/rectable.

Paper Structure

This paper contains 33 sections, 12 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Training time and Machine Learning Efficiency score on the adult dataset. Our proposed method, RecTable, maintains the high performance in downstream task and shorten training time.
  • Figure 2: Visualizations of the generated and real adult dataset.
  • Figure 3: Visualizations of the generated and real default dataset.
  • Figure 4: Visualizations of the generated and real shoppers dataset.
  • Figure 5: Visualizations of the generated and real magic dataset.
  • ...and 2 more figures