Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

Lianyu Wang; Meng Wang; Huazhu Fu; Daoqiang Zhang

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

Lianyu Wang, Meng Wang, Huazhu Fu, Daoqiang Zhang

TL;DR

A novel dynamic authorization with legality-aware intellectual property protection (AoD-IP) for VLMs is proposed, a framework that supports authorize-on-demand and legality-aware assessment and provides substantially greater extensibility than existing static-domain approaches.

Abstract

The rapid adoption of vision-language models (VLMs) has heightened the demand for robust intellectual property (IP) protection of these high-value pretrained models. Effective IP protection should proactively confine model deployment within authorized domains and prevent unauthorized transfers. However, existing methods rely on static training-time definitions, limiting flexibility in dynamic environments and often producing opaque responses to unauthorized inputs. To address these limitations, we propose a novel dynamic authorization with legality-aware intellectual property protection (AoD-IP) for VLMs, a framework that supports authorize-on-demand and legality-aware assessment. AoD-IP introduces a lightweight dynamic authorization module that enables flexible, user-controlled authorization, allowing users to actively specify or switch authorized domains on demand at deployment time. This enables the model to adapt seamlessly as application scenarios evolve and provides substantially greater extensibility than existing static-domain approaches. In addition, AoD-IP incorporates a dual-path inference mechanism that jointly predicts input legality-aware and task-specific outputs. Comprehensive experimental results on multiple cross-domain benchmarks demonstrate that AoD-IP maintains strong authorized-domain performance and reliable unauthorized detection, while supporting user-controlled authorization for adaptive deployment in dynamic environments.

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

TL;DR

Abstract

Paper Structure (16 sections, 10 equations, 3 figures, 52 tables)

This paper contains 16 sections, 10 equations, 3 figures, 52 tables.

Introduction
Related Work
Vision-Language Models and Parameter-Efficient Tuning
Model Intellectual Property (IP) Protection
Method
Problem Definition and Authorize-on-Demand Formulation for IP Protection
Overview of the AoD-IP Architecture
Design of Extended Domain
Dynamic Authorization Module
Dual-path Output
Training and Inference
Experiment
Implementation Details
Target-Specified Model IP Protection
Authorization Application Model IP Protection
...and 1 more sections

Figures (3)

Figure 1: (a): Classical VLMs without IP protection; (b): Existing IP protection strategy (e.g., CUTI-Domain, CUPI-Domain) with static authorized domain; (c) The proposed AoD-IP allows users to actively specify or switch authorized domains on demand with legality-aware output.
Figure 2: (a) During training, authorized data ($x_a$), extended data ($x_e$), and unauthorized data ($x_u$) are simultaneously processed by the frozen CLIP visual encoder $E^v$ to extract visual features ($f_a^v$, $f_e^v$, $f_u^v$). The image projector $P_\mathrm{img}$ and domain projector $P_\mathrm{dom}$ generate image tokens ($\tau_a^g, \tau_e^g, \tau_u^g$) and domain-discriminative tokens ($\tau_a^d, \tau_e^d, \tau_u^d$) for the three domains, respectively. In parallel, an encryption projector $P_\mathrm{enc}$ produces a credential token $\tau_a^c$ for authorized data. These tokens are concatenated and fed into the frozen text encoder $E^t$, producing the corresponding textual features ($f_a^t$, $f_e^t$, $f_u^t$). The final prediction $p$ is derived from the similarity between visual features ($f^v$) and textual features ($f^t$), while an auxiliary output path verifies the legitimacy of each prediction. Frozen modules are indicated by snowflakes, and trainable modules by spark markers. (b) During inference, users may request additional credential tokens from the model owner, which function as “domain keys.” By selecting or switching these keys, users can dynamically control which domain is activated, thereby obtaining valid predictions accordingly. (c) Several inference cases are shown: only matching credentials and inputs lead to valid outputs (e.g., Cases A-C), while mismatched inputs result in invalid predictions and trigger security alerts (e.g., Cases D-F).
Figure :

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

TL;DR

Abstract

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (3)