Table of Contents
Fetching ...

OmniZip: Learning a Unified and Lightweight Lossless Compressor for Multi-Modal Data

Yan Zhao, Zhengxue Cheng, Junxuan Zhang, Dajiang Zhou, Qunshan Gu, Qi Wang, Li Song

TL;DR

OmniZip is proposed, a unified and lightweight lossless compressor for multi-modal data (like image, text, speech, tactile, database, and gene sequence) built on a lightweight backbone that outperforms or matches other state-of-the-art compressors on multiple modalities.

Abstract

Lossless compression is essential for efficient data storage and transmission. Although learning-based lossless compressors achieve strong results, most of them are designed for a single modality, leading to redundant compressor deployments in multi-modal settings. Designing a unified multi-modal compressor is critical yet challenging, as different data types vary largely in format, dimension, and statistics. Multi-modal large language models offer a promising resolution but remain too complex for practical use. Thus, we propose \textbf{OmniZip}, \textbf{a unified and lightweight lossless compressor for multi-modal data (like image, text, speech, tactile, database, and gene sequence)}. Built on a lightweight backbone, OmniZip incorporates three key components to enable efficient multi-modal lossless compression: a modality-unified tokenizer that reversibly transforms diverse data into tokens, a modality-routing context learning mechanism that enables flexible multi-modal context modeling, and a modality-routing feedforward design that further enhances the model's nonlinear representation flexibility. A reparameterization training strategy is used to enhance model capacity. OmniZip outperforms or matches other state-of-the-art compressors on multiple modalities, achieving 42\%, 57\%, 62\% and 42\%, 53\% higher compression efficiency than gzip on CLIC-M, TouchandGo, enwik9, LibriSpeech, and WikiSQL datasets, respectively. It also supports near real-time inference on resource-constrained edge devices, reaching about 1MB/s on MacBook CPUs and iPhone NPUs. Our code is released at https://github.com/adminasmi/OmniZip-CVPR2026.

OmniZip: Learning a Unified and Lightweight Lossless Compressor for Multi-Modal Data

TL;DR

OmniZip is proposed, a unified and lightweight lossless compressor for multi-modal data (like image, text, speech, tactile, database, and gene sequence) built on a lightweight backbone that outperforms or matches other state-of-the-art compressors on multiple modalities.

Abstract

Lossless compression is essential for efficient data storage and transmission. Although learning-based lossless compressors achieve strong results, most of them are designed for a single modality, leading to redundant compressor deployments in multi-modal settings. Designing a unified multi-modal compressor is critical yet challenging, as different data types vary largely in format, dimension, and statistics. Multi-modal large language models offer a promising resolution but remain too complex for practical use. Thus, we propose \textbf{OmniZip}, \textbf{a unified and lightweight lossless compressor for multi-modal data (like image, text, speech, tactile, database, and gene sequence)}. Built on a lightweight backbone, OmniZip incorporates three key components to enable efficient multi-modal lossless compression: a modality-unified tokenizer that reversibly transforms diverse data into tokens, a modality-routing context learning mechanism that enables flexible multi-modal context modeling, and a modality-routing feedforward design that further enhances the model's nonlinear representation flexibility. A reparameterization training strategy is used to enhance model capacity. OmniZip outperforms or matches other state-of-the-art compressors on multiple modalities, achieving 42\%, 57\%, 62\% and 42\%, 53\% higher compression efficiency than gzip on CLIC-M, TouchandGo, enwik9, LibriSpeech, and WikiSQL datasets, respectively. It also supports near real-time inference on resource-constrained edge devices, reaching about 1MB/s on MacBook CPUs and iPhone NPUs. Our code is released at https://github.com/adminasmi/OmniZip-CVPR2026.
Paper Structure (35 sections, 9 equations, 8 figures, 10 tables)

This paper contains 35 sections, 9 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Left: The world is multi-modal, motivating the need for unified lossless compression. Multi-modal compressors' performance is shown across modalities, where points closer to the edge indicate better efficiency (lower bits/Byte). Right: Lightweight design ensures broad usability across platforms. OmniZip achieves near real-time inference (hundreds of KB/s to MB/s) even on edge devices.
  • Figure 2: Overview of the proposed OmniZip framework. Diverse data is first converted into a unified, fully reversible token space. A predictive model then estimates each token’s contextual probability, followed by arithmetic coding to generate the bitstream.
  • Figure 3: One block of OmniZip’s predictive model. The model stacks $N$ such blocks in total. Built on a lightweight RWKV7 backbone, it incorporates two modality-routing MoE modules for contextual learning and feedforward processing.
  • Figure 4: Comparison of learning-based lossless compressors across multi-modal datasets. The x-axis shows the model size (in millions of parameters), and the y-axis indicates compression efficiency (bits/Byte, lower is better). Models closer to the lower-left corner achieve better compression with fewer parameters. The dashed orange line represents the performance baseline of gzip.
  • Figure 5: OmniZip's inference speed across various platforms (CPU of MacBook Pro, NPU of iPhone17 Pro, and GPU of NVIDIA A100) and batch sizes (1, 16, 128, 512, 1024).
  • ...and 3 more figures