Operations Research and Management Science ›› 2020, Vol. 29 ›› Issue (2): 129-143.DOI: 10.12005/orms.2020.0043

• Application Research • Previous Articles     Next Articles

A Multi Knowledge Points Labeling Method for Test Questions Based on Ensemble Learning

GUO Chong-hui, LV Zheng-da   

  1. Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China
  • Received:2018-05-25 Online:2020-02-25

一种基于集成学习的试题多知识点标注方法

郭崇慧, 吕征达   

  1. 大连理工大学 系统工程研究所,辽宁 大连 116024摘 要:个性化试题推荐、试题难度预测、学习者建模等教育数据挖掘任务需要使用到学生作答数据资源及试题知识点标注,现阶段的试题数据都是由人工标注知识点。因此,利用机器学习方法自动标注试题知识点是一项迫切的需求。针对海量试题资源情况下的试题知识点自动标注问题,本文提出了一种基于集成学习的试题多知识点标注方法。首先,形式化定义了试题知识点标注问题,并借助教材目录和领域知识构建知识点的知识图谱作为类别标签。其次,采用基于集成学习的方法训练多个支持向量机作为基分类器,筛选出表现优异的基分类器进行集成,构建出试题多知识点标注模型。最后,以某在线教育平台数据库中的高中数学试题为实验数据集,应用所提方法预测试题考察的知识点,取得了较好的效果。关键词:教育数据挖掘;知识点标注;文本分类;多标签学习;集成学习 中图分类号:TP391 文章标识码:A 文章编号:1007-3221202002-0129
  • 作者简介:郭崇慧(1973-), 男, 博士, 教授, 博士生导师, 主要研究方向:数据挖掘与知识发现, 决策理论与方法等;吕征达(1993-), 男, 硕士研究生, 研究方向:教育数据挖掘, 自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(71771034,71421001);大连市科技创新基金项目(2018J11CY009)

Abstract: Education data mining tasks such as personalized test question recommendation, test question difficulty prediction, learner modeling need answer data and test question labeled with knowledge points. At present, the test question data are labeled manually. Therefore, it is an urgent need to label the knowledge points automatically by utilizing machine learning. In order to label knowledge points to massive test questions, a multi knowledge points labeling method based on ensemble learning is proposed in this paper. Firstly, the problem of labeling knowledge points is formally defined, and the knowledge graph of knowledge points is used as category labels with the help of textbooks. Secondly, the ensemble learning based method is used to train multiple support vector machines as the base classifiers, which are selected for integration with excellent performance , building the multi knowledge points labeling model. Finally, the high school mathematics test questions in the online education platform database are used as the experimental data sets. The method is used to predict the knowledge points of the test questions, and good results have been achieved.

Key words: educational data mining, knowledge points labeling, text classification, multi-label learning, ensemble learning

摘要: 个性化试题推荐、试题难度预测、学习者建模等教育数据挖掘任务需要使用到学生作答数据资源及试题知识点标注,现阶段的试题数据都是由人工标注知识点。因此,利用机器学习方法自动标注试题知识点是一项迫切的需求。针对海量试题资源情况下的试题知识点自动标注问题,本文提出了一种基于集成学习的试题多知识点标注方法。首先,形式化定义了试题知识点标注问题,并借助教材目录和领域知识构建知识点的知识图谱作为类别标签。其次,采用基于集成学习的方法训练多个支持向量机作为基分类器,筛选出表现优异的基分类器进行集成,构建出试题多知识点标注模型。最后,以某在线教育平台数据库中的高中数学试题为实验数据集,应用所提方法预测试题考察的知识点,取得了较好的效果。

关键词: 教育数据挖掘, 知识点标注, 文本分类, 多标签学习, 集成学习

CLC Number: