Binary Classification of Academic Outcomes Using Ensemble Learning and Neural Networks: A Case Study on OULAD
Keywords:
Educational Data Mining (EDM); OULAD; Feature Selection; Dense Neural Networks (DNN); Machine LearningAbstract
The importance of academic classification in online learning platforms is increasingly recognized as it helps in assessing student performance, early detection of issues, and identifying factors that influence academic success. This study uses the Open University Learning Analytics Dataset (OULAD) to predict students' academic success in various classification areas, including Distinction vs Non-Distinction, Withdrawn vs Non-Withdrawn, Pass vs Non-Pass, and Pass vs Fail. The aim of this research is to compare machine learning and deep learning techniques, such as Random Forest, Gradient Boosting, AdaBoost, LightGBM, and Voting Classifier, with a deep learning model based on Dense Neural Networks (DNN) to produce the best possible predictions. Relevant features are also selected using feature selection and dimensionality reduction strategies, including autoencoders and Recursive Feature Elimination (RFE). The results show that LightGBM and Gradient Boosting perform best in several classifications, with an accuracy of 75.47% for Pass vs Fail. On the other hand, DNN requires further refinement but shows potential in handling more complex classifications. In addition to identifying students at risk of failing, this method provides a deeper understanding of the variables affecting academic success in online learning environments.
Downloads
References
Alhothali, A., Albsisi, M., Assalahi, H., & Aldosemani, T. (2022). Predicting Student Outcomes in Online Courses Using Machine Learning Techniques: A Review. Sustainability (Switzerland), 14(10), 1–23. https://doi.org/10.3390/su14106199
Almulihi, A., Saleh, H., Hussien, A. M., Mostafa, S., El-Sappagh, S., Alnowaiser, K., Ali, A. A., & Refaat Hassan, M. (2022). Ensemble Learning Based on Hybrid Deep Learning Model for Heart Disease Early Prediction. Diagnostics, 12(12), 1–17. https://doi.org/10.3390/diagnostics12123215
Al-Zawqari, A., Peumans, D., & Vandersteen, G. (2022). A flexible feature selection approach for predicting students’ academic performance in online courses. Computers and Education: Artificial Intelligence, 3(November), 100103. https://doi.org/10.1016/j.caeai.2022.100103
Buenaño-Fernández, D., Gil, D., & Luján-Mora, S. (2019). Application of machine learning in predicting performance for computer engineering students: A case study. Sustainability (Switzerland), 11(10), 1–18. https://doi.org/10.3390/su11102833
Gnat, S. (2021). Impact of categorical variables encoding on property mass valuation. Procedia Computer Science, 192, 3542–3550. https://doi.org/10.1016/j.procs.2021.09.127
Habibi, A., Delavar, M. R., Sadeghian, M. S., Nazari, B., & Pirasteh, S. (2023). A hybrid of ensemble machine learning models with RFE and Boruta wrapper-based algorithms for flash flood susceptibility assessment. International Journal of Applied Earth Observation and Geoinformation, 122(March), 103401. https://doi.org/10.1016/j.jag.2023.103401
Hasan, R., Palaniappan, S., Mahmood, S., Abbas, A., & Sarker, K. U. (2021). Dataset of students’ performance using student information system, moodle and the mobile application “edify.” Data, 6(11), 1–10. https://doi.org/10.3390/data6110110
Jawad, K., Shah, M. A., & Tahir, M. (2022). Students’ Academic Performance and Engagement Prediction in a Virtual Learning Environment Using Random Forest with Data Balancing. Sustainability (Switzerland), 14(22). https://doi.org/10.3390/su142214795
Lemay, D. J., Baek, C., & Doleck, T. (2021). Comparison of learning analytics and educational data mining: A topic modeling approach. Computers and Education: Artificial Intelligence, 2(March), 100016. https://doi.org/10.1016/j.caeai.2021.100016
Natras, R., Soja, B., & Schmidt, M. (2022). Ensemble Machine Learning of Random Forest, AdaBoost and XGBoost for Vertical Total Electron Content Forecasting. Remote Sensing, 14(15), 1–34. https://doi.org/10.3390/rs14153547
Renò, V., Stella, E., Patruno, C., Capurso, A., Dimauro, G., & Maglietta, R. (2022). Learning Analytics: Analysis of Methods for Online Assessment. Applied Sciences (Switzerland), 12(18), 1–10. https://doi.org/10.3390/app12189296
Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 40(6), 601–618. https://doi.org/10.1109/TSMCC.2010.2053532
Trishna, T. I., Emon, S. U., Ema, R. R., Sajal, G. I. H., Kundu, S., & Islam, T. (2019). Detection of Hepatitis (A, B, C and E) Viruses Based on Random Forest, K-nearest and Naïve Bayes Classifier. 2019 10th International Conference on Computing, Communication and Networking Technologies, ICCCNT 2019, 1–7. https://doi.org/10.1109/ICCCNT45670.2019.8944455
Yağcı, M. (2022). Educational data mining: prediction of students’ academic performance using machine learning algorithms. Smart Learning Environments, 9(1). https://doi.org/10.1186/s40561-022-00192-z
Yahya, A. A., Sulaiman, A. A., Mashraqi, A. M., Zaidan, Z. M., & Halawani, H. T. (2021). Toward a better understanding of academic programs educational objectives: A data analytics-based approach. Applied Sciences (Switzerland), 11(20). https://doi.org/10.3390/app11209623
Zhang, P., Ma, Z., Ren, Z., Wang, H., Zhang, C., Wan, Q., & Sun, D. (2024). Design of an Automatic Classification System for Educational Reform Documents Based on Naive Bayes Algorithm. Mathematics, 12(8), 1127. https://doi.org/10.3390/math12081127











