Table of Contents
Fetching ...

Embedding Digital Signature into CSV Files Using Data Hiding

Akinori Ito

TL;DR

This work tackles ensuring integrity of CSV open data by embedding a digital signature directly into the CSV using a data hiding approach that exploits the quoting redundancy of CSV fields. It introduces a formal embedding scheme with $J_{simp}$ and $J_{emb}$, and a signing process $s = E(H(J_{simp}(strip(Y))), priv)$ to produce a signed CSV verifiable with the public key. Experiments on 10 government CSV files show a $512$-bit signature can be embedded and verification succeeds, while simple tampering (quote removal) breaks validation. The method highlights a practical path to in-file data authenticity for plain-text CSVs, but payload and metadata limitations motivate future work to extend signature size and attach signer information, timestamps, and certificates.

Abstract

Open data is an important basis for open science and evidence-based policymaking. Governments of many countries disclose government-related statistics as open data. Some of these data are provided as CSV files. However, since CSV files are plain texts, we cannot ensure the integrity of a downloaded CSV file. A popular way to prove the data's integrity is a digital signature; however, it is difficult to embed a signature into a CSV file. This paper proposes a method for embedding a digital signature into a CSV file using a data hiding technique. The proposed method exploits a redundancy of the CSV format related to the use of double quotes. The experiment revealed we could embed a 512-bit signature into actual open data CSV files.

Embedding Digital Signature into CSV Files Using Data Hiding

TL;DR

This work tackles ensuring integrity of CSV open data by embedding a digital signature directly into the CSV using a data hiding approach that exploits the quoting redundancy of CSV fields. It introduces a formal embedding scheme with and , and a signing process to produce a signed CSV verifiable with the public key. Experiments on 10 government CSV files show a -bit signature can be embedded and verification succeeds, while simple tampering (quote removal) breaks validation. The method highlights a practical path to in-file data authenticity for plain-text CSVs, but payload and metadata limitations motivate future work to extend signature size and attach signer information, timestamps, and certificates.

Abstract

Open data is an important basis for open science and evidence-based policymaking. Governments of many countries disclose government-related statistics as open data. Some of these data are provided as CSV files. However, since CSV files are plain texts, we cannot ensure the integrity of a downloaded CSV file. A popular way to prove the data's integrity is a digital signature; however, it is difficult to embed a signature into a CSV file. This paper proposes a method for embedding a digital signature into a CSV file using a data hiding technique. The proposed method exploits a redundancy of the CSV format related to the use of double quotes. The experiment revealed we could embed a 512-bit signature into actual open data CSV files.
Paper Structure (10 sections, 18 equations, 3 figures, 1 table)

This paper contains 10 sections, 18 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: An example of a CSV file.
  • Figure 2: The syntax description of the CSV file format shafranovich2005common.
  • Figure 3: An example of a CSV file with embedded information. The payload is 8 bits, and the message 10110011 is embedded.