Patterns to Identify Dropout University Students with Educational Data Mining

Authors

DOI:

https://doi.org/10.24320/redie.2021.23.e29.3918

Keywords:

dropping out, dropouts characteristics, decision making

Supporting Agencies:

UPAEP-Universidad, Universidad Tecnológica de la Mixteca

Abstract

This paper applies educational data mining algorithms to present an analysis of the most relevant characteristics of potential dropout students. The study used a dataset of 10,635 instances, acquired between 2014 and 2019 from 53 bachelor’s degree programs at a private university in the state of Puebla (Mexico). The results show that the model obtained from the decision trees performs better than other algorithms and allows for easy interpretation through decision rules. Furthermore, the model performs better than other related models in the literature that have been applied to the same problem. The methods used to select characteristics yielded the most important attributes to identify potential dropouts, such as the period, last semester completed, credits completed, attendance, courses failed, and program. These attributes and decision rules can be used to create mechanisms that help prevent dropout.

Downloads

Download data is not yet available.

References

Agaoglu, M. (2016). Predicting instructor performance using data mining techniques in higher education. IEEE Access, 4. https://doi.org/10.1109/ACCESS.2016.2568756

Albán, M. y Mauricio, D. (2019). Factors that influence undergraduate university desertion according to students perspective. International Journal of Engineering and Technology, 10(6), 1585-1602. https://dx.doi.org/10.21817/ijet/2018/v10i6/181006017

Al-Barrak, M. A. y Al-Razgan, M. (2016). Predicting student’s final GPA using decision trees: a case study. International Journal of Information and Education Technology, 6(7), 528-533. http://www.ijiet.org/vol6/745-IT205.pdf

Alpaydin, E. (2010). Introduction to machine learning. MIT Press.

Aulck, L., Velagapudi, N., Blumenstock, J. y West, J. (2017). Predicting Student Dropout in Higher Education. Machine Learning in Social Good Applications, 16(20). https://bit.ly/2R2XGdX

Bird, S., Klein, E. y Loper, E. (2009). Natural language processing with python. O’Reilly Media, Inc.

Carvajal, P. y Trejos, A. (2016). Revisión de estudios sobre deserción estudiantil en educación superior en Latinoamerica bajo la perspectiva de Pierre Bourdieu. Congreso CLABES, Quito, Ecuador. https://bit.ly/33YO5c2

Chiheb, F., Boumahdi, F., Bouarfa, H. y Boukraa, D. (2017). Predicting students’ performance using decision trees: Case of an Algerian University. International Conference on Mathematics and Information Technology (ICMIT). IEEE, Adrar, Algeria. https://doi.org/10.1109/MATHIT.2017.8259704

Clow, D. (2013). An overview of learning analytics. Teaching in Higher Education, 18(6), 683-695. https://doi.org/10.1080/13562517.2013.827653

Devijver, P. A. y Kittler, J. (1982). Pattern recognition: A statistical approach. Prentice Hall.

Estrada, R. I., Zamarripa-Franco, R. A., Zúñiga-Garay, P. G. y Martínez-Trejo, I. (2016). Aportaciones desde la minería de datos al proceso de captación de matrícula de instituciones de educación superior particulares. Revista Electrónica Educare, 20(3), 1-21. http://dx.doi.org/10.15359/ree.20-3.11

Frank, E., Hall, M. A. y Witten I. H. (2016). The WEKA workbench. Online appendix for data mining: Practical machine learning tools and techniques. Morgan Kaufmann.

Guevara, C., Sanchez, S., Arias, H., Varela, J., Castillo, D., Borja, M., Fierro, W., Rivera, R., Hidalgo, J. y Yandún, M. (2019). Detection of student behavior profiles applying neural networks and decision trees. En T. Ahram, W. Karwowski, S. Pickl y R. Taiar (Eds.), Human systems engineering and design II. IHSED 2019. Advances in Intelligent Systems and Computing (pp. 591-597). Springer. https://doi.org/10.1007/978-3-030-27928-8_90

Hernández-Sampieri, R. y Mendoza, C. P. (2018). Metodología de la investigación: Rutas cuantitativa, cualitativa y mixta (6a. ed.). McGraw Hill.

Kira, K. y Rendell, L. A. (1992). A practical approach to feature selection. International Conference on Machine Learning (pp. 249-256). Morgan Kaufmann Publishers.

Kononenko, I., Simec, E. y Robnik Sikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with RELIEFF. Applied Intelligence, 7(1), 39-55.

López, L. y Beltrán, A. (2017). La deserción en estudiantes de educación superior: tres percepciones en estudio, estudiantes, docentes y padres de familia. Pistas Educativas, (126), 143-159. https://bit.ly/2SXOab5

Márquez-Vera, C., Cano, A., Romero, C., Mohammad, A. Y., Fardoun, H. M. y Ventura, S. (2016). Early dropout prediction using data mining: a case study with high school students. Expert Systems, 33(1), 107-125. https://doi.org/10.1111/exsy.12135

Mitchell, T. M. (2000). Decision Tree Learning. Washington State University.

Morales, J. y Parraga-Alava, J. (2018). How predicting the academic success of students of the ESPAM MFL?: A preliminary decision trees based study. Third Ecuador Technical Chapters Meeting (ETCM). Cuenca, Ecuador. https://bit.ly/2SHdDXu

Muñoz, S., Gallardo, T., Muñoz, M. y Muñoz, C. (2018). Probabilidad de deserción estudiantil en cursos de matemáticas básicas en programas profesionales de la Universidad de los Andes Venezuela. Formación Universitaria, 11(4), 33-42. https://bit.ly/38KFg8d

OECD. (2019). OECD Skills Strategy 2019. https://bit.ly/2P8GJwL

Rahi, S. (2017). Research design and methods: A systematic review of research paradigms, sampling issues and instruments development. International Journal of Economics & Management Sciences, 6(2).

Ramírez, E., Espinosa, D. y Millán, E. (2016). Estrategia para afrontar la deserción universitaria desde las tecnologías de la información y las comunicaciones. Revista Científica, 24, 52-62. https://doi.org/10.14483/udistrital.jour.RC.2016.24.a5

Rodríguez-Maya, N. E., Lara-Álvarez, C., May-Tzuc, O. y Suárez-Carranza, B. A. (2017). Modeling Students’ Dropout in Mexican Universities. Research in Computing Science, 139, 163-175. https://www.rcs.cic.ipn.mx/2017_139/Modeling%20Students_%20Dropout%20in%20Mexican%20Universities.pdf

Rokach, L. y Maimon, O. (2014). Data mining with decision trees: Theory and applications. World Scientific Publishing Co.

Romero, C. y Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, 40(6), 601-618. https://ieeexplore.ieee.org/document/5524021

Sara, N. B., Halland, R., Igel, C. y Alstrup, S. (April, 2015). High-school dropout prediction using machine learning: A danish large-scale study. Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges, Belgium.

Secretaría de Educación Pública. (2019a). Glosario Educación Superior. https://bit.ly/31PLNLu

Sivakumar, S., Venkataraman, S. y Selvaraj, R. (2016). Predictive modeling of student dropout indicators in educational data mining using improved decision tree. Indian Journal of Science and Technology, 9(4), 1-5. https://doi.org/10.17485/ijst/2016/v9i4/87032

Theodoridis, S. y Koutroumbas, K. (2008). Pattern Recognition. Academic Press.

Witten, I. H., Frank, E., Hall, M. y Pal, C. (2016). Data mining: Practical machine learning tools and techniques. Morgan Kauffman.

Yamao, E., Saavedra, L. C., Campos, R. y Huancas, V. D. (2018). Prediction of academic performance using data mining in first year students of peruvian university. CAMPUS, 23(26), 151-160. https//doi.org/10.24265/campus.2018.v23n26.05

Yukselturk, E., Ozekes, S. y Kılıç Türel, Y. (2014). Predicting dropout student: an application of data mining methods in an online education program. European Journal of Open, Distance and e-Learning, 17(1), 119-133. https://doi.org/10.2478/eurodl-2014-0008

Zhou, Z. H. (2012). Ensemble methods: Foundations and algorithms. Chapman & Hall / CRC Hall / CRC Press.

Downloads

Article abstract page views: 2302

Published

2021-12-20