Estimating Difficulty Levels of Programming Problems with Pre-trained Model
Zhiyuan Wang, Wei Zhang, Jun Wang
TL;DR
This work tackles automatic difficulty level estimation for programming problems on POJ platforms by treating it as a multi-modal task over problem statements and code solutions. It introduces C-BERT, a coupled architecture that uses BERT for text and CodeBERT for code, with CLS-based cross-modal interaction to fuse modalities and explicit features to predict difficulty via a Softmax classifier. Experiments on Codeforces and CodeChef datasets, with 5-fold cross-validation, show C-BERT outperforms strong baselines and ablations, and the ablation study confirms the importance of each component, especially the code modality and cross-modal coupling. The approach offers a practical, objective means to guide adaptive learning and problem selection on POJ platforms, reducing reliance on expert annotations or waiting for student solution statistics.
Abstract
As the demand for programming skills grows across industries and academia, students often turn to Programming Online Judge (POJ) platforms for coding practice and competition. The difficulty level of each programming problem serves as an essential reference for guiding students' adaptive learning. However, current methods of determining difficulty levels either require extensive expert annotations or take a long time to accumulate enough student solutions for each problem. To address this issue, we formulate the problem of automatic difficulty level estimation of each programming problem, given its textual description and a solution example of code. For tackling this problem, we propose to couple two pre-trained models, one for text modality and the other for code modality, into a unified model. We built two POJ datasets for the task and the results demonstrate the effectiveness of the proposed approach and the contributions of both modalities.
