Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval

Junkai Yang; Qirui Wang; Yaoqing Jin; Shuai Ma; Minghan Xu; Shanmin Pang

Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval

Junkai Yang, Qirui Wang, Yaoqing Jin, Shuai Ma, Minghan Xu, Shanmin Pang

Abstract

Retrieving partially relevant segments from untrimmed videos remains difficult due to two persistent challenges: the mismatch in information density between text and video segments, and limited attention mechanisms that overlook semantic focus and event correlations. We present KDC-Net, a Knowledge-Refined Dual Context-Aware Network that tackles these issues from both textual and visual perspectives. On the text side, a Hierarchical Semantic Aggregation module captures and adaptively fuses multi-scale phrase cues to enrich query semantics. On the video side, a Dynamic Temporal Attention mechanism employs relative positional encoding and adaptive temporal windows to highlight key events with local temporal coherence. Additionally, a dynamic CLIP-based distillation strategy, enhanced with temporal-continuity-aware refinement, ensures segment-aware and objective-aligned knowledge transfer. Experiments on PRVR benchmarks show that KDC-Net consistently outperforms state-of-the-art methods, especially under low moment-to-video ratios.

Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval

Abstract

Paper Structure (21 sections, 11 equations, 3 figures, 3 tables)

This paper contains 21 sections, 11 equations, 3 figures, 3 tables.

Abstract
Introduction
Related Work
Partially Relevant Video Retrieval
Knowledge Distillation
Methodology
Hierarchical Semantic Aggregation for Text
Dynamic Temporal Attention for Video
Knowledge Refinement Distillation
Learning and Inference
Experiment
Datasets
Evaluation Metrics
Implementation Details
Comparison with the State-of-the-Art
...and 6 more sections

Figures (3)

Figure 1: The Overview diagram illustrates the task objectives of PRVR, and also shows that the Knowledge Refined Distillation strategy we proposed can effectively optimize the distilled signals obtained from the teacher model.
Figure 2: Illustration of KDC-Net. It employs a distillation framework, the student model comprises two independent branches with no parameter sharing.
Figure 3: Ablation studies: (a) KRD window size ablation; (b) DTA parameters ablation; (c) $\delta$ and $\lambda$ parameters ablation.

Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval

Abstract

Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval

Authors

Abstract

Table of Contents

Figures (3)