Table of Contents
Fetching ...

Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection

Taichi Nishimura, Shota Nakada, Hokuto Munakata, Tatsuya Komatsu

TL;DR

Lighthouse tackles reproducibility and usability gaps in MR-HD by providing a unified codebase that spans $6×3×5=90$ configurations across six methods, three video-text features, and five datasets, plus an end-to-end inference API and web demo. It standardizes training and evaluation with YAML configs and releases features, pretrained weights, and logs to enable exact replication. Empirical results show Lighthouse largely reproduces reported scores and enables fair cross-configuration comparisons, while revealing that newer MR-HD methods are not consistently superior across different datasets and features. Overall, the work lowers the barrier to rigorous evaluation and accelerates development and benchmarking in MR-HD.

Abstract

We propose Lighthouse, a user-friendly library for reproducible video moment retrieval and highlight detection (MR-HD). Although researchers proposed various MR-HD approaches, the research community holds two main issues. The first is a lack of comprehensive and reproducible experiments across various methods, datasets, and video-text features. This is because no unified training and evaluation codebase covers multiple settings. The second is user-unfriendly design. Because previous works use different libraries, researchers set up individual environments. In addition, most works release only the training codes, requiring users to implement the whole inference process of MR-HD. Lighthouse addresses these issues by implementing a unified reproducible codebase that includes six models, three features, and five datasets. In addition, it provides an inference API and web demo to make these methods easily accessible for researchers and developers. Our experiments demonstrate that Lighthouse generally reproduces the reported scores in the reference papers. The code is available at https://github.com/line/lighthouse.

Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection

TL;DR

Lighthouse tackles reproducibility and usability gaps in MR-HD by providing a unified codebase that spans configurations across six methods, three video-text features, and five datasets, plus an end-to-end inference API and web demo. It standardizes training and evaluation with YAML configs and releases features, pretrained weights, and logs to enable exact replication. Empirical results show Lighthouse largely reproduces reported scores and enables fair cross-configuration comparisons, while revealing that newer MR-HD methods are not consistently superior across different datasets and features. Overall, the work lowers the barrier to rigorous evaluation and accelerates development and benchmarking in MR-HD.

Abstract

We propose Lighthouse, a user-friendly library for reproducible video moment retrieval and highlight detection (MR-HD). Although researchers proposed various MR-HD approaches, the research community holds two main issues. The first is a lack of comprehensive and reproducible experiments across various methods, datasets, and video-text features. This is because no unified training and evaluation codebase covers multiple settings. The second is user-unfriendly design. Because previous works use different libraries, researchers set up individual environments. In addition, most works release only the training codes, requiring users to implement the whole inference process of MR-HD. Lighthouse addresses these issues by implementing a unified reproducible codebase that includes six models, three features, and five datasets. In addition, it provides an inference API and web demo to make these methods easily accessible for researchers and developers. Our experiments demonstrate that Lighthouse generally reproduces the reported scores in the reference papers. The code is available at https://github.com/line/lighthouse.
Paper Structure (15 sections, 4 figures, 4 tables)

This paper contains 15 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of MR-HD and Lighthouse. Given a video and query, the model predicts relevant moments for MR and saliency scores for HD. Lighthouse achieves reproducible MR-HD by supporting multiple settings. In addition, it aims at a user-friendly design with an easy-to-setup environment, inference API, and web demo.
  • Figure 2: YAML configuration example.
  • Figure 3: A screenshot of the web demo. In the web demo, you can select a model and feature in the model selection pane. Then, in the video and query pane, you can upload a video and input a text query. By clicking the 'Retrieve Moment & Highlight Detection' button, the retrieved moments and highlighted frames will be displayed in the right panes. Hugging face spaces: https://huggingface.co/spaces/awkrail/lighthouse_demo.
  • Figure 4: Overview of Lighthouse architecture for MR-HD training and evaluation. It consists of four components: datasets, video-text feature extractor, models, and evaluation metrics.