nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2023, 12, No.348 124-139
An Ensemble Learning Model for Early Dropout Prediction of MOOC Courses
基金项目(Foundation): supported by the National Natural Science Foundation of China (No. 61772231); the Natural Science Foundation of Shandong Province (No. ZR2022LZH016&No. ZR2017MF025); the Project of Shandong Provincial Social Science Program (No. 18CHLJ39); the Shandong Provincial Key R&D Program of China (No. 2021CXGC010103); the Shandong Provincial Teaching Research Project of Graduate Education (No. SDYAL2022102&No. SDYJG21034); the Teaching Research Project of University of Jinan (No. JZ2212)
邮箱(Email): ise_mak@ujn.edu.cn;
DOI: 10.16512/j.cnki.jsjjy.2023.12.041
发布时间: 2023-12-10
出版时间: 2023-12-10
移动端阅读
摘要:

Massive open online courses(MOOCs) have become a way of online learning across the world in the past few years. However, the extremely high dropout rate has brought many challenges to the development of online learning. Most of the current methods have low accuracy and poor generalization ability when dealing with high-dimensional dropout features. They focus on the analysis of the learning score and check result of online course, but neglect the phased student behaviors. Besides, the status of student participation at a given moment is necessarily impacted by the prior status of learning. To address these issues, this paper has proposed an ensemble learning model for early dropout prediction(ELM-EDP) that integrates attention-based document representation as a vector(A-Doc2vec), feature learning of course difficulty, and weighted soft voting ensemble with heterogeneous classifiers(WSV-HC). First, A-Doc2vec is proposed to learn sequence features of student behaviors of watching lecture videos and completing course assignments. It also captures the relationship between courses and videos. Then, a feature learning method is proposed to reduce the interference caused by the differences of course difficulty on the dropout prediction. Finally, WSV-HC is proposed to highlight the benefits of integration strategies of boosting and bagging. Experiments on the MOOCCube2020 dataset show that the high accuracy of our ELM-EDP has better results on Accuracy, Precision, Recall, and F1.

关键词:
Abstract:

Massive open online courses(MOOCs) have become a way of online learning across the world in the past few years. However, the extremely high dropout rate has brought many challenges to the development of online learning. Most of the current methods have low accuracy and poor generalization ability when dealing with high-dimensional dropout features. They focus on the analysis of the learning score and check result of online course, but neglect the phased student behaviors. Besides, the status of student participation at a given moment is necessarily impacted by the prior status of learning. To address these issues, this paper has proposed an ensemble learning model for early dropout prediction(ELM-EDP) that integrates attention-based document representation as a vector(A-Doc2vec), feature learning of course difficulty, and weighted soft voting ensemble with heterogeneous classifiers(WSV-HC). First, A-Doc2vec is proposed to learn sequence features of student behaviors of watching lecture videos and completing course assignments. It also captures the relationship between courses and videos. Then, a feature learning method is proposed to reduce the interference caused by the differences of course difficulty on the dropout prediction. Finally, WSV-HC is proposed to highlight the benefits of integration strategies of boosting and bagging. Experiments on the MOOCCube2020 dataset show that the high accuracy of our ELM-EDP has better results on Accuracy, Precision, Recall, and F1.

参考文献

[1]Feng W,Tang J,Liu T.Understanding dropouts in MOOCs[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence.Honolulu:AAAI,2019:517-524.

[2]Yin S,Lei L,Wang H,et al.Power of attention in MOOCDropout prediction[J].IEEE Access,2020(8):2993-3002.

[3]Lykourentzou I,Giannoukos I,Nikolopoulos V,et al.Dropout prediction in e-learning courses through the combination of machine learning techniques[J].Computers&Education,2009,53(3):950-965.

[4]Qiu J,Tang J,Liu T,et al.Modeling and predicting learning behavior in MOOCs[C]//Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining.San Francisco:ACM,2016:93-102.

[5]Zhu M,Doo M.The relationship among motivation,selfmonitoring,self-management,and learning strategies of MOOC learners[J].Journal of Computing in Higher Education,2022,34(2):321-342.

[6]Yu J,Luo G,Xiao T.MOOCCube:A large-scale data repostiory for NLP applications in MOOCs[C].In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,2020:3135-3142.

[7]Goopio J,Cheung C.The MOOC dropout phenomenon and retention strategies[J].Journal of Teaching in Travel&Tourism,2021,21(2):177-197.

[8]Mubarak A A,Cao H,Ahmed S A M.Predictive learning analytics using deep learning model in MOOCs’courses videos[J].Education and Information Technologies,2021,26(1):371-392.

[9]Zheng Y,Gao Z,Wang Y,et al.MOOC dropout prediction using FWTS-CNN model based on fused feature weighting and time series[J].IEEE Access,2020(8):324-335.

[10]Chaplot D S,Rhim E,Kim J.Predicting student attrition in MOOCs using sentiment analysis and neural networks[C]//Proceedings of the Workshops at the 17th International Conference on Artificial Intelligence in Education.Madrid:CEUR-WS.org,2015:7-12.

[11]Fei M,Yeung D.Temporal models for predicting student dropout in massive open online courses[C]//Proceedings of 2015 IEEE International Conference on Data Mining Workshop.Atlantic City:IEEE,2015:256-263.

[12]Sinha T,Jermann P,Li N,et al.Your click decides your fate:Inferring information processing and attrition behavior from MOOC video clickstream interactions[C]//Proceedings of EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs.Doha:Association for Computational Linguistics,2014:3-14.

[13]Kloft M,Stiehler F,Zheng Z L,et al.Predicting MOOCdropout over weeks using machine learning methods[C]//Proceedings of EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs.Doha:Association for Computational Linguistics,2014:60-65.

[14]Shaleena K P,Paul S.Data mining techniques for predicting student performance[C]//Proceedings of 2015IEEE International Conference on Engineering and Technology.Coimbatore:IEEE,2015:1-3.

[15]Sperandei S.Understanding logistic regression analysis[J].Biochemia Medica,2014,24(1):12-18.

[16]Liu Y,Zhang B,Wang L M,et al.A self-trained semisupervised SVM approach to the remote sensing land cover classification[J].Computers&Geosciences,2013,59:98-107.

[17]Myles A J,Feudale R N,Liu Y,et al.An introduction to decision tree modeling[J].Journal of Chemometrics,2004,18(6):275-285.

[18]Qiu L,Liu Y S,Hu Q,et al.Student dropout prediction in massive open online courses by convolutional neural networks[J].Soft Computing,2019,23(20):287-301.

[19]Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.

[20]Kattenborn T,Leitloff J,Schiefer S,et al.Review on Convolutional Neural Networks (CNN) in vegetation remote sensing[J].ISPRS Journal of Photogrammetry and Remote Sensing,2021(17):24-49.

[21]Cho K,Van Merri?nboer B,Gulcehre C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of 2014Conference on Empirical Methods in Natural Language Processing.Doha:Association for Computational Linguistics,2014:1724-1734.

[22]Yu Y,Si X S,Hu C H,et al.A review of recurrent neural networks:LSTM cells and network architectures[J].Neural Computation,2019,31(7):1235-1270.

[23]Wang W,Yu H,Miao C Y.Deep model for dropout prediction in MOOCs[C]//Proceedings of the 2nd I n t e r n a t i o n a l C o n f e r e n c e o n C r o w d S c i e n c e a n d Engineering.Beijing:ACM,2017:26-32.

[24]Hou Z H,Ma K,Wang Y F,et al.Attention-based learning of self-media data for marketing intention detection[J].Engineering Applications of Artificial Intelligence,2021(8):104-118.

[25]Kim D,Seo S,Cho S,et al.Multi-co-training for document classification using various document representations:TF-IDF,LDA,and Doc2Vec[J].Information Sciences,2019(7):15-29.

[26]Taylor C,Veeramachaneni K,O’Reilly U M.Likely to stop?Predicting stopout in massive open online[EB/OL].[2023-09-01].https://arxiv.org/abs/1408.3382v1.

[27]P e a c h R L,G r e e n b u r y S F,J o h n s t o n I G,e t a l.Understanding learner behaviour in online courses with Bayesian modelling and time series characterisation[J].Scientific Reports,2021,11(1):2823.

[28]Liu T,Li X.Finding out reasons for low completion in MOOC environment:an explicable approach using hybrid data mining methods[C].In 2017 International Conference on Modern Education and Information Technology (MEIT2017).2017:376-384.

[29]Hu Y H,Lo C L,Shih S P.Developing early warning systems to predict students’online learning performance[J].Computers in Human Behavior,2014(36):469-478.

[30]Mrhar K,Douimi O,Abik M.A dropout predictor system in MOOCs based on neural networks[J].Mobile Robotics and Intelligent Systems,2020,14(4):72-80.

[31]Balakrishnan G.Predicting student retention in massive open online courses using hidden Markov models[D].Berkeley:University of California,2013:57-58.

[32]Ameri S,Fard M J,Chinnam R B,et al.Survival analysis based framework for early prediction of student dropouts[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management.Indianapolis:ACM,2016:903-912.

[33]Iam-On N,Boongoen T.Generating descriptive model for student dropout:a review of clustering approach[J].Human-centric Computing and Information Sciences,2017,7(1):1.

[34]Mehrdad J,Mina H.College student retention:When do we losing them?[EB/OL].[2023-09-01].https://arxiv.org/abs/1707.06210..

[35]Hu Q,Polyzou A,Karypis G,et al.Enriching coursespecific regression models with content features for grade prediction[C]//Proceedings of 2017 IEEE International Conference on Data Science and Advanced Analytics.Tokyo:IEEE,2017:504-513.

[36]Wen Y,Tian Y,Wen B,et al.Consideration of the local correlation of learning behaviors to predict dropouts from MOOCs[J].Tsinghua Science and Technology,2020,25(3):336-347.

[37]Yoon K,Liao R,Xiong Y,et al.Inference in probabilistic g r a p h i c a l m o d e l s b y g r a p h n e u r a l n e t w o r k s[C]//Proceedings of the 2019 53rd Asilomar Conference on Signals,Systems,and Computers.Pacific Grove:IEEE,2019:868-875.

[38]Dai Z.Local contextual attention with hierarchical structure for dialogue act recognition.[EB/OL].[2023-09-01].https://arxiv.org/abs/2003.06044..

[39]Fu Q,Gao Z,Zhou J,et al.CLSA:a novel deep learning model for MOOC dropout prediction[J].Computers&Electrical Engineering,2021(94):7315.

[40]Liang J J,Li C,Zheng L.Machine learning application in MOOCs:dropout prediction[C]//Proceedings of 201611th International Conference on Computer Science&Education.Nagoya:IEEE,2016:52-57.

[41]Zhou Y,Xu Z.Multi-Model stacking ensemble learning for dropout prediction in MOOCs[J].Conference Series,2020,16(7):12-14.

[42]Niyogisubizo J,Liao L,Nziyumva E,et al.Predicting student’s dropout in university classes using two-layer ensemble machine learning approach:a novel stacked generalization[J].Computers and Education:Artificial Intelligence,2022(3):66.

[43]Qiu L,Liu Y,Liu Y.An integrated framework with feature selection for dropout prediction in massive open online courses[J].IEEE Access,2018(6):74-84.

[44]Liaw A,Wiener M.Classification and regression by randomForest[J].R News,2002(2-3):18-22.

[45]Chen T,Guestrin C.XGBoost:a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining.San Francisco:ACM,2016:785-794.

[46]Dorogush A V,Ershov V,Gulin A.CatBoost:gradient boosting with categorical features support[EB/OL].[2023-09-01].https://arxiv.org/abs/1810.11363.

[47]Ke G,Meng Q,Finley T,et al.Lightgbm:a highly efficient gradient boosting decision tree[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Long Beach:Curran Associates Inc.,2017:3149-3157.

[48]Borrella I,Caballero-Caballero s,Ponce-Cueto e.Taking action to reduce dropout in MOOCs:Tested interventions[j].Computers&Education,2022(179):412.

基本信息:

DOI:10.16512/j.cnki.jsjjy.2023.12.041

中图分类号:G434;TP18

引用信息:

[1]Kun Ma,Jiaxuan Zhang,Yongwei Shao,等.An Ensemble Learning Model for Early Dropout Prediction of MOOC Courses[J].计算机教育,2023,No.348(12):124-139.DOI:10.16512/j.cnki.jsjjy.2023.12.041.

基金信息:

supported by the National Natural Science Foundation of China (No. 61772231); the Natural Science Foundation of Shandong Province (No. ZR2022LZH016&No. ZR2017MF025); the Project of Shandong Provincial Social Science Program (No. 18CHLJ39); the Shandong Provincial Key R&D Program of China (No. 2021CXGC010103); the Shandong Provincial Teaching Research Project of Graduate Education (No. SDYAL2022102&No. SDYJG21034); the Teaching Research Project of University of Jinan (No. JZ2212)

发布时间:

2023-12-10

出版时间:

2023-12-10

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文