Table of Contents
Fetching ...

First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge

Yingzhe Peng, Yixiao Yuan, Zitian Ao, Huapeng Zhou, Kangqi Wang, Qipeng Zhu, Xu Yang

TL;DR

This report presents the first-place solution to the Multiple-choice Video Question Answering (QA) track of The Second Perception Test Challenge, which posed a complex video understanding task, requiring models to accurately comprehend and answer questions about video content.

Abstract

In this report, we present our first-place solution to the Multiple-choice Video Question Answering (QA) track of The Second Perception Test Challenge. This competition posed a complex video understanding task, requiring models to accurately comprehend and answer questions about video content. To address this challenge, we leveraged the powerful QwenVL2 (7B) model and fine-tune it on the provided training set. Additionally, we employed model ensemble strategies and Test Time Augmentation to boost performance. Through continuous optimization, our approach achieved a Top-1 Accuracy of 0.7647 on the leaderboard.

First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge

TL;DR

This report presents the first-place solution to the Multiple-choice Video Question Answering (QA) track of The Second Perception Test Challenge, which posed a complex video understanding task, requiring models to accurately comprehend and answer questions about video content.

Abstract

In this report, we present our first-place solution to the Multiple-choice Video Question Answering (QA) track of The Second Perception Test Challenge. This competition posed a complex video understanding task, requiring models to accurately comprehend and answer questions about video content. To address this challenge, we leveraged the powerful QwenVL2 (7B) model and fine-tune it on the provided training set. Additionally, we employed model ensemble strategies and Test Time Augmentation to boost performance. Through continuous optimization, our approach achieved a Top-1 Accuracy of 0.7647 on the leaderboard.
Paper Structure (9 sections, 4 tables)