Automated discovery of symbolic laws governing skill acquisition from naturally occurring data

Sannyuya Liu; Qing Li; Xiaoxuan Shen; Jianwen Sun; Zongkai Yang

Automated discovery of symbolic laws governing skill acquisition from naturally occurring data

Sannyuya Liu, Qing Li, Xiaoxuan Shen, Jianwen Sun, Zongkai Yang

TL;DR

The paper tackles understanding skill acquisition by learning governing laws from large-scale, naturally occurring training data. It introduces a two-stage Auto-Discovered Model (ADM) that first uses a Transformer-like deep regressor to estimate latent cognitive states and then applies symbolic distillation to extract algebraic governing laws via symbolic regression. Across simulated and Lumosity real-world data, the approach accurately recovers preset laws under noise, yields interpretable laws that often outperform classical fits, and reveals novel patterns such as logarithmic and inverse relationships. This data-driven, interpretable framework advances cognitive science by providing scalable, evidence-based rules for skill learning with potential applications in education and personalized training.

Abstract

Skill acquisition is a key area of research in cognitive psychology as it encompasses multiple psychological processes. The laws discovered under experimental paradigms are controversial and lack generalizability. This paper aims to unearth the laws of skill learning from large-scale training log data. A two-stage algorithm was developed to tackle the issues of unobservable cognitive states and algorithmic explosion in searching. Initially a deep learning model is employed to determine the learner's cognitive state and assess the feature importance. Subsequently, symbolic regression algorithms are utilized to parse the neural network model into algebraic equations. Experimental results show the algorithm can accurately restore preset laws within a noise range in continuous feedback settings. When applied to Lumosity training data, the method outperforms traditional and recent models in fitness terms. The study reveals two new forms of skill acquisition laws and reaffirms some previous findings.

Automated discovery of symbolic laws governing skill acquisition from naturally occurring data

TL;DR

Abstract

Paper Structure (18 sections, 7 equations, 9 figures, 3 tables)

This paper contains 18 sections, 7 equations, 9 figures, 3 tables.

Introduction
Results
Model validation on simulated data
Model application on large-scale real-world cognitive training data
Exploring auto-discovered patterns in real-world datasets
Discussion
Method
Deep learning regression for training log data
Problem formulation
Feature encoding
Mastery inference
Score prediction
Model learning
Symbolic law extraction from deep learning regressor
Symbolic distillation
...and 3 more sections

Figures (9)

Figure 1: Overall model architecture diagram. A two-stage model is proposed for the automated discovery of symbolic laws that govern skill acquisition. Specifically, (a) provides a toy example of training log data. The example illustrates the relevant information of nine practice sessions for a specific learner, where each record containing five main elements: learner, practice, skill, score, and time. (b) depicts the constructed deep regressor. It consists of three main modules: feature encoding, mastery inference, and score prediction. The model takes the output of the previous n practice sessions and predicts the score of the next practice session. The proposed model follows the autoregressive paradigm to establish the optimization objective of the model. (c) describes the process of extracting symbolic governing laws from the trained deep regressor. First, we propose the symbolic distillation method, which represents the black-box neural network model as the closest symbolic representation. Secondly, for a trained deep regressor, it has already embedded hidden patterns from the data into the model. The symbolic distillation method is utilized to interpret the various modules of the deep regressor and fuse them together to obtain symbolic governing laws.
Figure 1: Partial results on the Lumosity dataset. More specifically, (a) the change curve of mean fitting absolute error during the iteration process of the deep learning regressor; (b) the change curve of the value of the regularization term during the iteration processes; (c) the prediction error distribution of 1000 randomly selected records from the trained H1000+R1 model. The box plot displays the interquartile range (IQR) with the median line, while the whiskers extend to the minimum and maximum values or a multiple of the IQR from the quartiles. Outliers are depicted as individual points beyond the whiskers; (d) the proportion of the number of practice times for each skill to the total number of practice times; (e) the distribution of feature importance for each skill in H1000+R1.
Figure 2: Results of the simulated data experiment. The results of the model experiments on simulated data. Specifically, it presents more detailed analyses of (a) the fitting degree of the deep learning fitter under different parameter settings, (b) the numerical analysis of regularization terms under different parameter settings, and (c) the analysis of the degree of accuracy in reproducing governing laws. Feature, Structure, and Value are introduced as three dimensions to evaluate the degree of formula restoration. Feature measures whether the variables in the restored formula match those in the assumed formula. Structure indicates whether the form and structure of the restored formula align with the assumed formula. Value represents whether the parameters in the formula match those in the assumed formula. In general, if the Feature, Structure, and Value are correctly restored, the formula can be considered completely restored and accurate. "Yes" indicates that the algorithm has successfully restored the predetermined pattern, whereas "No" indicates its failure to do so.
Figure 2: Schematic representation of the simulation experiment. (a) Flowchart depicting the process of generating simulated data and the data format. (b) Schematic diagram illustrating the evaluation procedure of the simulation experiment. (c) Sample representation of the simulated data.
Figure 3: The skill acquisition patterns discovered by the proposed method. The symbolic governing laws discovered by the model on the Lumosity dataset. The symbol regression results for six related skills and their corresponding complexities (CX.$<$15) are presented, including algebraic equations and their mean absolute error (MAE). The best-fitting equation with the same number of features is marked in red.
...and 4 more figures

Automated discovery of symbolic laws governing skill acquisition from naturally occurring data

TL;DR

Abstract

Automated discovery of symbolic laws governing skill acquisition from naturally occurring data

Authors

TL;DR

Abstract

Table of Contents

Figures (9)