Table of Contents
Fetching ...

Can LLMs Assist Computer Education? an Empirical Case Study of DeepSeek

Dongfu Xiao, Chen Gao, Zhengquan Luo, Chi Liu, Sheng Shen

TL;DR

This study investigates the practicality and reliability of DeepSeek-V3 for computer networking education by evaluating it on CCNA simulation questions and the Chinese Network Engineer examination. It employs a multi-dimension framework (role dependency, cross-linguistic reliability, and answer reproducibility) and rigorous statistics to assess performance, revealing strong recall-based capabilities but clear limits in higher-order reasoning. The results show consistent accuracy across original and translated question sets and no significant benefit from explicit role prompts, while reproducibility of responses correlates with accuracy, suggesting a useful reliability metric. The work provides actionable insight into deploying and refining LLMs in professional education and outlines concrete avenues for improving multimodal support and complex-domain reasoning.

Abstract

This study presents an empirical case study to assess the efficacy and reliability of DeepSeek-V3, an emerging large language model, within the context of computer education. The evaluation employs both CCNA simulation questions and real-world inquiries concerning computer network security posed by Chinese network engineers. To ensure a thorough evaluation, diverse dimensions are considered, encompassing role dependency, cross-linguistic proficiency, and answer reproducibility, accompanied by statistical analysis. The findings demonstrate that the model performs consistently, regardless of whether prompts include a role definition or not. In addition, its adaptability across languages is confirmed by maintaining stable accuracy in both original and translated datasets. A distinct contrast emerges between its performance on lower-order factual recall tasks and higher-order reasoning exercises, which underscores its strengths in retrieving information and its limitations in complex analytical tasks. Although DeepSeek-V3 offers considerable practical value for network security education, challenges remain in its capability to process multimodal data and address highly intricate topics. These results provide valuable insights for future refinement of large language models in specialized professional environments.

Can LLMs Assist Computer Education? an Empirical Case Study of DeepSeek

TL;DR

This study investigates the practicality and reliability of DeepSeek-V3 for computer networking education by evaluating it on CCNA simulation questions and the Chinese Network Engineer examination. It employs a multi-dimension framework (role dependency, cross-linguistic reliability, and answer reproducibility) and rigorous statistics to assess performance, revealing strong recall-based capabilities but clear limits in higher-order reasoning. The results show consistent accuracy across original and translated question sets and no significant benefit from explicit role prompts, while reproducibility of responses correlates with accuracy, suggesting a useful reliability metric. The work provides actionable insight into deploying and refining LLMs in professional education and outlines concrete avenues for improving multimodal support and complex-domain reasoning.

Abstract

This study presents an empirical case study to assess the efficacy and reliability of DeepSeek-V3, an emerging large language model, within the context of computer education. The evaluation employs both CCNA simulation questions and real-world inquiries concerning computer network security posed by Chinese network engineers. To ensure a thorough evaluation, diverse dimensions are considered, encompassing role dependency, cross-linguistic proficiency, and answer reproducibility, accompanied by statistical analysis. The findings demonstrate that the model performs consistently, regardless of whether prompts include a role definition or not. In addition, its adaptability across languages is confirmed by maintaining stable accuracy in both original and translated datasets. A distinct contrast emerges between its performance on lower-order factual recall tasks and higher-order reasoning exercises, which underscores its strengths in retrieving information and its limitations in complex analytical tasks. Although DeepSeek-V3 offers considerable practical value for network security education, challenges remain in its capability to process multimodal data and address highly intricate topics. These results provide valuable insights for future refinement of large language models in specialized professional environments.

Paper Structure

This paper contains 25 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Difficulty Assessment Prompt Engineering
  • Figure 2: Prompt Engineering
  • Figure 3: Impact of Prompt Design on Exam Answers
  • Figure 4: Accuracy Comparison: Higher-order vs Lower-order
  • Figure 5: Deepseek-V3's Performance across Diverse Topics