Autosoft Journal

Online Manuscript Access

Improving Performance Prediction On Education Data With Noise And Class Imbalance



This paper proposes to apply machine learning techniques to predict students’ performance on two real-world educational data-sets. The first data-set is used to predict the response of students with autism while they learn a specific task, whereas the second one is used to predict students’ failure at a secondary school. The two data-sets suffer from two major problems that can negatively impact the ability of classification models to predict the correct label; class imbalance and class noise. A series of experiments have been carried out to improve the quality of training data, and hence improve prediction results. In this paper, we propose two noise filter methods to eliminate the noisy instances from the majority class located inside the borderline area. Our methods combine the over-sampling SMOTE technique with the thresholding technique to balance the training data and choose the best boundary between classes. Then we apply a noise detection approach to identify the noisy instances. We have used the two data-sets to assess the efficacy of class-imbalance approaches as well as both proposed methods. Results for different classifiers show that, the AUC scores significantly improved when the two proposed methods combined with existing class-imbalance techniques.



Total Pages: 8
Pages: 777-783


Manuscript ViewPdf Subscription required to access this document

Obtain access this manuscript in one of the following ways

Already subscribed?

Need information on obtaining a subscription? Personal and institutional subscriptions are available.

Already an author? Have access via email address?


Volume: 24
Issue: 4
Year: 2018

Cite this document


Boukis, Christos, Aristodemos Pnevmatikakis, and Lazaros Polymenakos, eds. "Artificial Intelligence and Innovations 2007: From Theory to Applications." IFIP The International Federation for Information Processing (2007): n. pag. Crossref. Web.

ACM SIGKDD Explorations Newsletter 6.1 (2004): n. pag. Crossref. Web.

Berrar, D., and P. Flach. "Caveats and Pitfalls of ROC Analysis in Clinical Microarray Research (and How to Avoid Them)." Briefings in Bioinformatics 13.1 (2011): 83-97. Crossref. Web.

Blagus R. BMC Bioinformatics 14.1 (2013)

Breiman, Leo. Machine Learning 45.1 (2001): 5-32. Crossref. Web.

Brodley C.E. Journal of Artificial Intelligence Research

Chawla N.V. Journal of Artificial Intelligence Research

Cortez P. Proceedings of 5th Annual Future Business Technology Conference

Dietterich, Thomas G. "Ensemble Methods in Machine Learning." Lecture Notes in Computer Science (2000): 1-15. Crossref. Web.

Fawcett, Tom. "An Introduction to ROC Analysis." Pattern Recognition Letters 27.8 (2006): 861-874. Crossref. Web.

Gamberger D. Proc. of 16th ICML

García, V. et al. "Surrounding Neighborhood-Based SMOTE for Learning from Imbalanced Data Sets." Progress in Artificial Intelligence 1.4 (2012): 347-362. Crossref. Web.

Guo X. Fourth International Conference on Natural Computation IEEE

Huang, De-Shuang, Xiao-Ping Zhang, and Guang-Bin Huang, eds. "Advances in Intelligent Computing." Lecture Notes in Computer Science (2005): n. pag. Crossref. Web.

Haibo He, and E.A. Garcia. "Learning from Imbalanced Data." IEEE Transactions on Knowledge and Data Engineering 21.9 (2009): 1263-1284. Crossref. Web.

Khoshgoftaar, Taghi M., and Pierre Rebours. "Improving Software Quality Prediction by Noise Filtering Techniques." Journal of Computer Science and Technology 22.3 (2007): 387-396. Crossref. Web.

KHOSHGOFTAAR, TAGHI M., VEDANG JOSHI, and NAEEM SELIYA. "DETECTING NOISY INSTANCES WITH THE ENSEMBLE FILTER: A STUDY IN SOFTWARE QUALITY ESTIMATION." International Journal of Software Engineering and Knowledge Engineering 16.01 (2006): 53-76. Crossref. Web.

López, Victoria et al. "An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics." Information Sciences 250 (2013): 113-141. Crossref. Web.

Márquez-Vera, Carlos et al. "Predicting Student Failure at School Using Genetic Programming and Different Data Mining Approaches with High Dimensional and Imbalanced Data." Applied Intelligence 38.3 (2012): 315-330. Crossref. Web.

Szczuka, Marcin et al., eds. "Rough Sets and Current Trends in Computing." Lecture Notes in Computer Science (2010): n. pag. Crossref. Web.

Pedregosa F. Journal of Machine Learning Research

Radwan, Akram M. et al. "Active Machine Learning Framework for Teaching Object Recognition Skills to Children with Autism." International Journal of Developmental Disabilities 63.3 (2016): 158-169. Crossref. Web.

Sáez, José A. et al. "Managing Borderline and Noisy Examples in Imbalanced Classification by Combining SMOTE with Ensemble Filtering." Lecture Notes in Computer Science (2014): 61-68. Crossref. Web.

Sáez, José A. et al. "SMOTE-IPF: Addressing the Noisy and Borderline Examples Problem in Imbalanced Classification by a Re-Sampling Method with Filtering." Information Sciences 291 (2015): 184-203. Crossref. Web.

Seiffert, Chris et al. "RUSBoost: A Hybrid Approach to Alleviating Class Imbalance." IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40.1 (2010): 185-197. Crossref. Web.

Seiffert, Chris et al. "An Empirical Study of the Classification Performance of Learners on Imbalanced and Noisy Software Quality Data." Information Sciences 259 (2014): 571-595. Crossref. Web.

Sheng V.S. Proceedings of the 21st National Conference on Artificial Intelligence

Sluban, Borut, Dragan Gamberger, and Nada Lavrač. "Ensemble-Based Noise Detection: Noise Ranking and Visual Performance Evaluation." Data Mining and Knowledge Discovery 28.2 (2013): 265-303. Crossref. Web.

Thai-Nghe, Nguyen, Andre Busche, and Lars Schmidt-Thieme. "Improving Academic Performance Prediction by Dealing with Class Imbalance." 2009 Ninth International Conference on Intelligent Systems Design and Applications (2009): n. pag. Crossref. Web.

Thai-Nghe, Nguyen, Zeno Gantner, and Lars Schmidt-Thieme. "Cost-Sensitive Learning Methods for Imbalanced Data." The 2010 International Joint Conference on Neural Networks (IJCNN) (2010): n. pag. Crossref. Web.

Żytkow, Jan M., and Mohamed Quafafou, eds. "Principles of Data Mining and Knowledge Discovery." Lecture Notes in Computer Science (1998): n. pag. Crossref. Web.

Tomek I. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6

Wu, Xindong, and Xingquan Zhu. "Mining With Noise Knowledge: Error-Aware Data Mining." IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 38.4 (2008): 917-932. Crossref. Web.

YANG, QIANG, and XINDONG WU. "10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH." International Journal of Information Technology & Decision Making 05.04 (2006): 597-604. Crossref. Web.

Zhu, Xingquan, and Xindong Wu. "Class Noise Vs. Attribute Noise: A Quantitative Study." Artificial Intelligence Review 22.3 (2004): 177-210. Crossref. Web.

Zhu X. Proceedings of the 20th ICML


ISSN PRINT: 1079-8587
ISSN ONLINE: 2326-005X
DOI PREFIX: 10.31209
10.1080/10798587 with T&F
IMPACT FACTOR: 0.652 (2017/2018)
Journal: 1995-Present


TSI Press
18015 Bullis Hill
San Antonio, TX 78258 USA
PH: 210 479 1022
FAX: 210 479 1048