<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "https://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.1" specific-use="sps-1.9" xml:lang="en"
    xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
    <front>
        <journal-meta>
            <journal-id journal-id-type="publisher-id">redie</journal-id>
            <journal-title-group>
                <journal-title>Revista electrónica de investigación educativa</journal-title>
                <abbrev-journal-title abbrev-type="publisher">REDIE</abbrev-journal-title>
            </journal-title-group>
            <issn pub-type="epub">1607-4041</issn>
            <publisher>
                <publisher-name>Universidad Autónoma de Baja California, Instituto de Investigación
                    y Desarrollo Educativo</publisher-name>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.24320/redie.2023.25.e13.5398</article-id>
            <article-id pub-id-type="other">00113</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Artículos</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Predictive Model to Identify College Students with High Dropout
                    Rates</article-title>
                <trans-title-group xml:lang="es">
                    <trans-title>Modelo predictivo para identificar estudiantes universitarios con
                        alto grado de deserción</trans-title>
                </trans-title-group>
                <trans-title-group xml:lang="pt">
                    <trans-title>Modelo preditivo para identificar estudantes universitários com
                        alto risco de evasão</trans-title>
                </trans-title-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <contrib-id contrib-id-type="orcid">0000-0002-7169-7963</contrib-id>
                    <name>
                        <surname>Hoyos Osorio</surname>
                        <given-names>Jhoan Keider</given-names>
                    </name>
                    <xref ref-type="aff" rid="aff1"><sup>*</sup></xref>
                </contrib>
                <contrib contrib-type="author">
                    <contrib-id contrib-id-type="orcid">0000-0002-1429-5925</contrib-id>
                    <name>
                        <surname>Daza Santacoloma</surname>
                        <given-names>Genaro</given-names>
                    </name>
                    <xref ref-type="aff" rid="aff1"><sup>*</sup></xref>
                </contrib>
                <aff id="aff1">
                    <label>*</label>
                    <institution content-type="original">Universidad Tecnológica de Pereira,
                        Colombia</institution>
                    <institution content-type="normalized">Universidad Tecnológica de
                        Pereira</institution>
                    <institution content-type="orgname">Universidad Tecnológica de
                        Pereira</institution>
                    <country country="CO">Colombia</country>
                </aff>
            </contrib-group>
            <pub-date date-type="pub" publication-format="electronic">
                <day>03</day>
                <month>05</month>
                <year>2023</year>
            </pub-date>
            <pub-date date-type="collection" publication-format="electronic">
                <year>2023</year>
            </pub-date>
            <volume>25</volume>
            <elocation-id>e13</elocation-id>
            <history>
                <date date-type="received">
                    <day>27</day>
                    <month>07</month>
                    <year>2021</year>
                </date>
                <date date-type="accepted">
                    <day>11</day>
                    <month>01</month>
                    <year>2022</year>
                </date>
            </history>
            <permissions>
                <license license-type="open-access"
                    xlink:href="https://creativecommons.org/licenses/by-nc/4.0/" xml:lang="en">
                    <license-p>This is an open-access article distributed under the terms of the
                        Creative Commons Attribution License</license-p>
                </license>
            </permissions>
            <abstract>
                <title>Abstract</title>
                <p>Decreasing student attrition rates is one of the main objectives of most higher
                    education institutions. However, to achieve this goal, universities need to
                    accurately identify and focus their efforts on students most likely to quit
                    their studies before they graduate. This has given rise to a need to implement
                    forecasting models to predict which students will eventually drop out. In this
                    paper, we present an early warning system to automatically identify
                    first-semester students at high risk of dropping out. The system is based on a
                    machine learning model trained from historical data on first-semester students.
                    The results show that the system can predict “at-risk” students with a
                    sensitivity of 61.97%, which allows early intervention for those students,
                    thereby reducing the student attrition rate.</p>
            </abstract>
            <trans-abstract xml:lang="es">
                <title>Resumen</title>
                <p>Disminuir la tasa de deserción estudiantil es uno de los principales objetivos de
                    las instituciones de educación superior; para lograrlo, las universidades deben
                    identificar con precisión a los estudiantes con mayor riesgo de abandonar los
                    estudios antes de graduarse y centrar sus esfuerzos en ellos. De ahí surge la
                    necesidad de implementar modelos predictivos capaces de identificar a los
                    estudiantes que finalmente desertarán. En este trabajo se presenta un sistema de
                    alerta temprana para identificar a los estudiantes de primer semestre con alto
                    riesgo de deserción; el sistema se basa en un modelo de aprendizaje automático
                    entrenado a partir de datos históricos de estudiantes de primer semestre. Los
                    resultados muestran que el sistema puede identificar a los estudiantes “en
                    riesgo” con una sensibilidad del 61.97%, lo que permite ofrecerles atención
                    temprana y reducir la tasa de abandono.</p>
            </trans-abstract>
            <trans-abstract xml:lang="pt">
                <title>Resumo</title>
                <p>Reduzir a taxa de evasão estudantil é um dos principais objetivos das
                    instituições de ensino superior; para conseguir isso, as universidades devem
                    identificar com precisão os alunos com maior risco de abandonar os estudos antes
                    da conclusão do curso e concentrar seus esforços neles. Daí surge a necessidade
                    de implementar modelos preditivos capazes de identificar os alunos que acabarão
                    por desistir. Este artigo apresenta um sistema de alerta precoce para
                    identificar alunos do primeiro semestre com alto risco de evasão; o sistema é
                    baseado em um modelo de aprendizagem automático treinado a partir de dados
                    históricos de alunos do primeiro semestre. Os resultados mostram que o sistema
                    pode identificar os alunos “em risco” com uma sensibilidade de 61.97%, o que
                    possibilita oferecer-lhes atendimento precoce e reduzir o índice de evasão.</p>
            </trans-abstract>
            <kwd-group xml:lang="en">
                <title><italic>Keywords:</italic></title>
                <kwd>dropping out</kwd>
                <kwd>college students</kwd>
                <kwd>forecasting</kwd>
                <kwd>regression analysis</kwd>
            </kwd-group>
            <kwd-group xml:lang="es">
                <title><italic>Palabras clave:</italic></title>
                <kwd>deserción escolar</kwd>
                <kwd>estudiante universitario</kwd>
                <kwd>previsión</kwd>
                <kwd>análisis de regresión</kwd>
            </kwd-group>
            <kwd-group xml:lang="pt">
                <title><italic>Palavras-chave:</italic></title>
                <kwd>evasão escolar</kwd>
                <kwd>estudante universitário</kwd>
                <kwd>previsão</kwd>
                <kwd>análise de regressão</kwd>
            </kwd-group>
            <counts>
                <fig-count count="1"/>
                <table-count count="2"/>
                <equation-count count="0"/>
                <ref-count count="27"/>
                <page-count count="10"/>
            </counts>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>I. Introduction</title>
            <p>One of the biggest challenges facing higher education institutions (HEIs) in most
                education systems worldwide is dropping out. Dropout is a complex phenomenon in
                higher education that cannot be easily defined (<xref ref-type="bibr" rid="B26"
                    >Tinto, 1982</xref>). One of the earliest dropout models defined it as the
                failure of a student enrolled at a particular university in the spring to enroll in
                that same university the next fall semester (<xref ref-type="bibr" rid="B2">Bean,
                    1985</xref>). According to <xref ref-type="bibr" rid="B26">Tinto (1982)</xref>,
                dropout must be defined from different points of view. From an individual
                standpoint, dropping out refers to the failure to complete a given course of action
                or attain the desired objective that led a student to enroll in a particular higher
                education institution. University dropout is also defined as the premature
                abandonment of a study program due to factors arising within the educational system
                or relating to society, family, and environment, considering sufficient time to rule
                out the possibility of student reincorporation (<xref ref-type="bibr" rid="B15"
                    >Himmel, 2002</xref>). However, in practice, most universities define dropping
                out as abandoning a degree without graduating, provided that the student does not
                re-enroll during the next two semesters in the same degree course.</p>
            <p>According to the report “Education at a Glance”, the average dropout rate in HEIs
                reaches 31% among OECD countries. The countries with the highest dropout rates are
                Hungary, New Zealand, and the United States, with New Zealand reaching 46%.
                Meanwhile, the lowest dropout rates are found in Japan, Germany, France, and
                Belgium. In Latin America, according to a bulletin by the Higher Education
                Observatory (<xref ref-type="bibr" rid="B19">ODES, 2017</xref>), attrition ranges
                between 40% and 75%. According to a study published by the World Bank (<xref
                    ref-type="bibr" rid="B8">Ferreyra et al., 2017</xref>), on average 50% of Latin
                American students graduate on time, and the remaining 50% either drop out of the
                system or continue studying. Bolivia and Colombia have the highest dropout rates in
                Latin America. Specifically, in Colombia around 37% of students who enroll in a
                university program drop out without finishing their degree (<xref ref-type="bibr"
                    rid="B27">Urzúa, 2017</xref>). Moreover, about 36% of students who drop out in
                Colombia do so at the end of the first year, making early dropout a critical problem
                in the Colombian education system.</p>
            <p>According to the System for the Prevention and Analysis of Dropout in HEIs (<xref
                    ref-type="bibr" rid="B24">SPADIES, 2016</xref>), the causes of dropout in higher
                education in Colombia are classified into four major categories: i) Individual:
                starting age of the students, monetary and time costs of studying in another city,
                unfulfilled expectations, pregnancy, etc.; ii) Academic: Lack of preparation from
                secondary education in general skills, insufficient professional and vocational
                guidance before college admission, and low academic performance, among other causes;
                iii) Socioeconomic: Low social class, low family income and parental unemployment,
                financial self-reliance, etc.; and iv) Institutional: lack of financial support from
                the institution for tuition and maintenance, instability in the academic rhythm in
                public universities, etc. Some other common factors have recently been identified as
                causes of college dropout, such as depression, anxiety, and weak family structure
                    (<xref ref-type="bibr" rid="B7">Daley, 2010</xref>). Other studies have
                suggested that a proportion of college attrition may result from drug use (<xref
                    ref-type="bibr" rid="B20">Patrick et al., 2016</xref>). The study showed that
                students who used cigarettes, marijuana, and other illicit drugs in high school were
                more likely to drop out of college.</p>
            <p>The advantages of improving student retention are countless. In Latin America,
                education has the main purpose of reducing inequality and the gap between social
                classes. Therefore, ensuring that students complete their degrees is to provide them
                a higher chance of securing an enhanced standard of living and a better career
                    (<xref ref-type="bibr" rid="B25">Thomas, 2002</xref>). Tertiary education for
                the most vulnerable population contributes fundamentally to equalizing opportunities
                to access the most highly desired positions on the social ladder, triggering
                processes of upward social mobility. The lack of educational development of the most
                vulnerable classes, resulting from a scarcity of development opportunities, is a
                factor in the increase in violence and insecurity. Dropping out represents a major
                problem not only for students themselves but also for universities and governments,
                due to the waste of resources invested in students who do not finish their studies.
                Consequently, reducing student attrition would help to ensure state resources are
                used more effectively. Additionally, if the student dropout rate is low, a
                university is more likely to achieve a higher ranking, thus securing more government
                funds and gaining an easier path to program accreditations (<xref ref-type="bibr"
                    rid="B1">Ameri et al., 2016</xref>). Accordingly, universities are increasingly
                implementing strategies to decrease student attrition. These require adequate
                planning for interventions and a full knowledge of the causes behind the student
                attrition problem.</p>
            <p>Latin American governments have developed methodologies to measure and study dropout.
                In particular, the Colombian Ministry of National Education has set up the SPADIES
                platform, which collects socioeconomic and academic information on students from
                different HEIs and makes it possible to establish links between data on dropout.
                This tool enables observation of students according to indicators of risk of
                abandonment. However, beyond merely investigating the causes of dropout, action
                should be taken for better understanding and intervention by monitoring, recording,
                and analyzing risk factors, and in particular, by identifying students at increased
                risk of abandonment. Universities, for their part, have implemented many strategies
                at students’ disposal to encourage them to stay in the HEI. These strategies include
                monitoring, tutoring, advising, and offering workshops and courses that support and
                promote students’ academic success by addressing their particular needs. For
                instance, Universidad Tecnológica de Pereira (UTP) in Colombia has deployed the
                Integral Support Program (Programa de Acompañamiento Integral, PAI), which is an
                institutional strategy aimed at tackling the issues of students dropping out or
                failing to complete their degree on time through multiple institutional efforts to
                respond to the biopsychosocial, academic, economic and policy needs of students.
                However, the success of these personalized support programs depends on the
                universities’ ability to properly recognize and prioritize students who need
                assistance and support. Therefore, in order to address dropout and improve retention
                rates, universities need to focus their efforts on students most at risk of dropping
                out.</p>
            <p>This backdrop clearly gives rise to a need to implement predictive models to identify
                students liable to drop out. In this sense, some explanatory models have been
                developed to help HEIs to detect dropout students (<xref ref-type="bibr" rid="B1"
                    >Ameri et al., 2016</xref>). Traditional pattern recognition methods have also
                been used to identify at-risk students (<xref ref-type="bibr" rid="B18">Lin et al.,
                    2009</xref>), and recently, data mining and machine learning communities have
                given special attention to student dropout prediction (<xref ref-type="bibr"
                    rid="B17">Lakkaraju et al., 2015</xref>; <xref ref-type="bibr" rid="B21">Pérez
                    et al., 2018</xref>; <xref ref-type="bibr" rid="B23">Sandoval-Palis et al.,
                    2020</xref>). Nonetheless, some authors agree that training prediction models
                for dropout students remains a tough task. In addition, despite several years of
                work, further research is needed to improve the methods employed to find patterns in
                student attrition. In this context, we have developed an early warning system able
                to monitor students at considerable risk of dropping out, which is integrated into
                the PAI of the UTP to bring help to at-risk students and encourage them to stay in
                the university. This system is based on processing historical data relating to
                individual, academic, and socioeconomic variables for first-year students from the
                UTP, with the goal of training a machine learning algorithm to recognize patterns in
                students with a high likelihood of dropping out.</p>
        </sec>
        <sec sec-type="methods">
            <title>II. Methods</title>
            <p>Recently, the problem of classifying students at risk of dropping out of college has
                become relevant. In this context, data analytics and machine learning methods are
                particularly useful because of their ability to detect patterns in historical data
                sets that allow for predictions of future data. Machine learning is a branch of
                artificial intelligence dedicated to the study of methods to provide artificial
                agents with the ability to learn from examples. Machine learning methods can
                generate models of complex problems through specific instances, finding patterns of
                behavior. These methods, in turn, can generalize and/or adapt to new situations and
                predict new cases based on past experience. Most universities have information
                systems in which all students are registered and characterized, providing insight
                into exactly which students are dropping out. This is ideal for predicting students
                at risk of dropping out, and makes it possible to train algorithms based on the
                characteristics of first-semester students from previous years, find common patterns
                among them, and assess new students by means of these algorithms, enabling us to
                identify and calculate which students are most likely to drop out.</p>
            <p>The machine learning methods are based on three fundamental stages: i) data
                preprocessing, ii) model training, and iii) system validation. For an early
                prediction of the students at risk of dropping out, we have combined different
                techniques as part of the three stages mentioned above, which are illustrated in the
                flow chart presented in <xref ref-type="fig" rid="f1">Figure 1</xref>.</p>
            <p>
                <fig id="f1">
                    <label>Figure 1</label>
                    <caption>
                        <title>Flow chart of the proposed methodology for the automatic prediction
                            of students at high risk of dropout</title>
                    </caption>
                    <graphic xlink:href="1607-4041-redie-25-e13-gf1.jpg"/>
                </fig>
            </p>
            <sec>
                <title>2.1 Database</title>
                <p>In order to predict first-year students at high risk of dropping out, we created
                    a database that compiles information relating to first-semester students from
                    five different semesters (2017-1 to 2019-1). The dataset consists of 6617
                    participants (2845 female and 3772 male), with an average age of 20 years. The
                    database is made up of a set of student features prior to their entry to the
                    university such as age, sex, social stratum, state test score (Saber 11 score in
                    Colombia), type of school (public or private), and cost of tuition paid.
                    Additionally, the students take two kinds of tests when they start their
                    degrees. Initially, the PAI test measures students’ level of academic, economic,
                    family, and psychosocial risk, as well as their level of depression and anxiety,
                    and individual learning style. In addition, they take the Alcohol, Smoking, and
                    Substance Involvement Screening Test (ASSIST) developed for the World Health
                    Organization (WHO). This test measures the level of consumption of different
                    substances on a numerical scale from 0 to 39. Later, this feature matrix is
                    complemented with inter-annual information on dropout from the university. This
                    table reports students who remained in their degree program, changed programs,
                    graduated, or dropped out of the university. Ultimately, this process yields a
                    database of 6617 students and 26 features, and a vector of binary labels: not
                    enrolled (for two consecutive semesters) and enrolled.</p>
            </sec>
            <sec>
                <title>2.2 Data preprocessing</title>
                <p>Data preprocessing is a crucial stage in machine learning applications to enhance
                    the quality of data and recognize meaningful patterns in the data.</p>
                <p>Data preprocessing refers to the techniques of “cleaning” the original data to
                    make it suitable for training machine learning models. Data preprocessing
                    includes data preparation, which includes integration, cleaning, normalization
                    and transformation of data, and data reduction tasks such as feature selection,
                    instance selection, discretization, etc. (<xref ref-type="bibr" rid="B10">García
                        et al., 2015</xref>). Some preprocessing techniques used in this study, such
                    as categorical variable encoding, outlier removal, oversampling, and feature
                    selection, are explained below.</p>
                <p>Categorical variable encoding. Most machine learning techniques cannot deal with
                    categorical variables unless they are first encoded as numerical values.
                    Categorical variables break down into two categories: nominal (no particular
                    order) and ordinal (ordered). A nominal variable may be, for example, a color or
                    a city, and an ordinal variable could be, for example, the level of satisfaction
                    with a service, which could range from very dissatisfied to very satisfied.
                    Before training a machine learning model, it is necessary to define how to
                    encode the categorical variables. Dichotomous variables like sex can easily be
                    encoded as a binary variable by making one of the two categories equal to one
                    and the other equal to zero. Ordinal categorical variables can be assigned to
                    numbers in their respective order, e.g. LOW = 1, MID = 2, and HIGH = 3. For
                    nominal variables, it is impossible to employ the same procedure because there
                    is no specific order for each category. For this reason, a well-known encoding
                    technique known as one-hot encoding is used. This technique converts categorical
                    variables into various new variables where 0 indicates the non-existence of a
                    specific category while 1 indicates the presence of that variable.</p>
                <p>Outlier detection. An outlier is an observation whose value differs from the
                    general pattern of a sample, affecting the analysis of a given dataset. Outliers
                    can have a range of distinct causes such as data entry errors, errors while
                    designing the experiment, errors in the processing stage or just natural
                    abnormalities in data. For classification and prediction purposes, the quality
                    of data is essential, and there are several methods that allow us to detect and
                    remove outliers from a dataset. In this research, we have employed a method
                    based on decision trees called isolation forests (<xref ref-type="bibr"
                        rid="B13">Hariri et al., 2019</xref>).</p>
                <p>Minority class oversampling. Due to the nature of the event that we are trying to
                    predict, it is common to find in our datasets many more students labeled as
                    “enrolled” than “dropped out.” This phenomenon is known as the class imbalance
                    problem. This issue is challenging to handle since most classifiers often expect
                    evenly distributed training samples among classes. Without consideration of the
                    imbalance problem, the classification algorithms can be overwhelmed by the
                    majority class and ignore the minority one (<xref ref-type="bibr" rid="B12">Guo
                        et al., 2008</xref>).</p>
                <p>There are different alternatives to deal with the imbalanced data classification
                    issue. One option, known as oversampling, augments the number of minority class
                    samples to match the number of samples in each of the classes. Specifically, we
                    employed a well-known method named the Synthetic Minority Oversampling Technique
                    (SMOTE) (<xref ref-type="bibr" rid="B5">Chawla et al., 2002</xref>). The SMOTE
                    algorithm creates synthetic data between samples of the minority class. To
                    create a sample, this algorithm randomly selects one of the nearest neighbors to
                    a specific sample, then computes the difference vector between the two samples,
                    and this vector is then multiplied by a random number between 0 and 1. Finally,
                    this value is added to the sample in consideration, creating a new sample.</p>
                <p>Relevance analysis. The number of variables used to measure the observations is
                    known as the dimension of the feature space. One problem with many data sets is
                    that, in many cases, not all the measured variables are important for
                    understanding the phenomenon under analysis (<xref ref-type="bibr" rid="B9"
                        >Fodor, 2002</xref>), that is, some variables are relevant for pattern
                    recognition but others are not. Additionally, there could be redundant variables
                    providing the same information to the model, and some of these may therefore be
                    discarded. One common way to identify relevant features is to employ feature
                    selection methodologies. Feature selection is the process by which researchers
                    select the most relevant features that contribute to predicting the phenomenon
                    of interest.</p>
                <p>Specifically, in this research, we implemented a methodology known as recursive
                    feature elimination (RFE), which is a feature selection method that fits a model
                    and removes the weakest feature (or features) until a specified number of
                    features is reached (<xref ref-type="bibr" rid="B6">Chen &amp; Jeong,
                        2007</xref>). Features are classified according to each feature’s
                    importance, obtained from a relevance model. Then, RFE recursively removes one
                    feature per cycle (the lowest ranked feature according to the relevance model).
                    The optimal number of features to achieve the best result is determined through
                    five-fold cross-validation.</p>
            </sec>
            <sec>
                <title>2.3 Training the prediction model</title>
                <p>Once the data has been preprocessed, we can train a machine learning classifier
                    to recognize patterns in data that allow us to predict students at high risk of
                    dropping out. In this case we used the well-known logistic regressor (<xref
                        ref-type="bibr" rid="B16">Hosmer et al., 2013</xref>), which maps the output
                    of a linear regression model to probabilities between 0 and 1 through a logistic
                    function defined as:</p>
                <p>
                    <inline-formula>
                        <alternatives>
                            <inline-graphic xlink:href="1607-4041-redie-25-e13-i002.png"/>
                            <mml:math>
                                <mml:mi>P</mml:mi>
                                <mml:mfenced separators="|">
                                    <mml:mrow>
                                        <mml:msup>
                                            <mml:mrow>
                                                <mml:mi>y</mml:mi>
                                            </mml:mrow>
                                            <mml:mrow>
                                                <mml:mfenced separators="|">
                                                <mml:mrow>
                                                <mml:mi>i</mml:mi>
                                                </mml:mrow>
                                                </mml:mfenced>
                                            </mml:mrow>
                                        </mml:msup>
                                        <mml:mo>=</mml:mo>
                                        <mml:mn>1</mml:mn>
                                    </mml:mrow>
                                </mml:mfenced>
                                <mml:mo>=</mml:mo>
                                <mml:mfrac>
                                    <mml:mrow>
                                        <mml:mn>1</mml:mn>
                                    </mml:mrow>
                                    <mml:mrow>
                                        <mml:mn>1</mml:mn>
                                        <mml:mo>+</mml:mo>
                                        <mml:mi>e</mml:mi>
                                        <mml:mi>x</mml:mi>
                                        <mml:mi>p</mml:mi>
                                        <mml:mfenced separators="|">
                                            <mml:mrow>
                                                <mml:mo>-</mml:mo>
                                                <mml:mfenced separators="|">
                                                <mml:mrow>
                                                <mml:msub>
                                                <mml:mrow>
                                                <mml:mi>β</mml:mi>
                                                </mml:mrow>
                                                <mml:mrow>
                                                <mml:mn>0</mml:mn>
                                                </mml:mrow>
                                                </mml:msub>
                                                <mml:mo>+</mml:mo>
                                                <mml:msub>
                                                <mml:mrow>
                                                <mml:mi>β</mml:mi>
                                                </mml:mrow>
                                                <mml:mrow>
                                                <mml:mn>1</mml:mn>
                                                </mml:mrow>
                                                </mml:msub>
                                                <mml:msub>
                                                <mml:mrow>
                                                <mml:mi>x</mml:mi>
                                                </mml:mrow>
                                                <mml:mrow>
                                                <mml:mn>1</mml:mn>
                                                </mml:mrow>
                                                </mml:msub>
                                                <mml:mfenced separators="|">
                                                <mml:mrow>
                                                <mml:mi>i</mml:mi>
                                                </mml:mrow>
                                                </mml:mfenced>
                                                <mml:mo>+</mml:mo>
                                                <mml:mo>.</mml:mo>
                                                <mml:mo>.</mml:mo>
                                                <mml:mo>.</mml:mo>
                                                <mml:mo>+</mml:mo>
                                                <mml:msub>
                                                <mml:mrow>
                                                <mml:mi>β</mml:mi>
                                                </mml:mrow>
                                                <mml:mrow>
                                                <mml:mi>p</mml:mi>
                                                </mml:mrow>
                                                </mml:msub>
                                                <mml:msub>
                                                <mml:mrow>
                                                <mml:mi>x</mml:mi>
                                                </mml:mrow>
                                                <mml:mrow>
                                                <mml:mi>p</mml:mi>
                                                </mml:mrow>
                                                </mml:msub>
                                                <mml:mfenced separators="|">
                                                <mml:mrow>
                                                <mml:mi>i</mml:mi>
                                                </mml:mrow>
                                                </mml:mfenced>
                                                </mml:mrow>
                                                </mml:mfenced>
                                            </mml:mrow>
                                        </mml:mfenced>
                                    </mml:mrow>
                                </mml:mfrac>
                            </mml:math>
                        </alternatives>
                    </inline-formula>
                </p>
                <p>where x represents each one of the features of the training data and y represents
                    each one of the classes.</p>
            </sec>
            <sec>
                <title>2.4 Model validation</title>
                <p>To assess the performance of a classifier, some data must be reserved for testing
                    the trained classifier. This process is known as cross-validation. This
                    methodology is used to evaluate the performance of a classifier by training it
                    on a subset of the data and then testing the algorithm on the remaining input
                    data (<xref ref-type="bibr" rid="B22">Raschka, 2018</xref>). There are three
                    main variants of cross-validation for classifiers, but in this research, we
                    employ the holdout cross-validation technique. The holdout method divides the
                    database into training and testing sets. The model is trained using the training
                    samples, and then assessed by predicting the labels for the testing set that the
                    model has never seen before.</p>
                <p>Some metrics exist to measure the performance of a classifier. Most
                    classification assessments are carried out by measuring the overall
                    classification error rate; however, when handling imbalanced data,
                    classification accuracy is not sufficient. As suggested by <xref ref-type="bibr"
                        rid="B14">He and Ma (2013)</xref>, class-specific metrics, such as
                    sensitivity and specificity, and a combination of both, like the geometric mean,
                    provide a more complete assessment of imbalanced learning. Therefore, in this
                    research we use these three metrics. We quantify the effectiveness of the
                    classification system to detect students who will drop out (sensitivity) and to
                    correctly classify students not at risk of quitting university (specificity),
                    and we also calculate the geometric mean (G-mean), which is defined as <mml:math>
                        <mml:mi>g</mml:mi>
                        <mml:mo>=</mml:mo>
                        <mml:msqrt>
                            <mml:mi>s</mml:mi>
                            <mml:mi>e</mml:mi>
                            <mml:mi>n</mml:mi>
                            <mml:mi>s</mml:mi>
                            <mml:mi>i</mml:mi>
                            <mml:mi>t</mml:mi>
                            <mml:mi>i</mml:mi>
                            <mml:mi>v</mml:mi>
                            <mml:mi>i</mml:mi>
                            <mml:mi>t</mml:mi>
                            <mml:mi>y</mml:mi>
                            <mml:mo>.</mml:mo>
                            <mml:mi>s</mml:mi>
                            <mml:mi>p</mml:mi>
                            <mml:mi>e</mml:mi>
                            <mml:mi>c</mml:mi>
                            <mml:mi>i</mml:mi>
                            <mml:mi>f</mml:mi>
                            <mml:mi>i</mml:mi>
                            <mml:mi>c</mml:mi>
                            <mml:mi>i</mml:mi>
                            <mml:mi>t</mml:mi>
                            <mml:mi>y</mml:mi>
                        </mml:msqrt>
                    </mml:math>.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>III. Experiments and results</title>
            <sec>
                <title>3.1 Experimental framework</title>
                <p>All the experiments carried out in this research were performed using the
                    statistical program Python 3. The experiment was set up as follows. First, the
                    dataset was divided into training, testing, and validation sets. To validate the
                    system, 20% of the data was randomly chosen and kept aside to prove the
                    performance of the final classifier. The remaining 80% of the data was split
                    according to the semester in which the data was collected. Finally, we trained
                    the classifier from the data of all semesters except one, which was used as a
                    testing set.</p>
                <p>Before the model was trained, the variables were normalized using the Z-score
                    method to ensure that all variables were in a similar range. Then SMOTE was
                    implemented to balance the dataset, which was highly imbalanced. As explained
                    above, SMOTE creates synthetic samples of the class with the lowest number of
                    samples - in this case, the dropout class.</p>
                <p>Then, the system goes through a stage of feature selection using RFE. This allows
                    us to determine the most relevant features that characterize freshman students
                    who drop out. As previously outlined, we started with an initial set of 26
                    features, and for each period tested, the RFE method selected an appropriate
                    number of relevant features that led to the best performance. Yet these features
                    are not the same in each iteration. Therefore, to train the final classifier, we
                    chose those features that were selected in at least three of the five periods
                    validated. Thus, the final set of features was reduced to 20. These are shown in
                        <xref ref-type="table" rid="t1">Table 1</xref>.</p>
                <p>
                    <table-wrap id="t1">
                        <label>Table 1</label>
                        <caption>
                            <title>Set of features collected and selected for training the dropout
                                prediction model</title>
                        </caption>
                        <table>
                            <colgroup>
                                <col/>
                                <col/>
                                <col/>
                                <col/>
                            </colgroup>
                            <thead>
                                <tr>
                                    <th align="left"
                                        style="border-top: 1px solid black; border-bottom: 1px solid black"
                                        >Feature</th>
                                    <th align="center"
                                        style="border-top: 1px solid black; border-bottom: 1px solid black"
                                        >Select by RFE</th>
                                    <th align="center"
                                        style="border-top: 1px solid black; border-bottom: 1px solid black"
                                        >Feature</th>
                                    <th align="center"
                                        style="border-top: 1px solid black; border-bottom: 1px solid black"
                                        >Select by RFE</th>
                                </tr>
                            </thead>
                            <tbody>
                                <tr>
                                    <td align="left">Age</td>
                                    <td align="center">No</td>
                                    <td align="left">ASSIST alcohol value</td>
                                    <td align="center">Yes</td>
                                </tr>
                                <tr>
                                    <td align="left">Sex</td>
                                    <td align="center">Yes</td>
                                    <td align="left">ASSIST cannabis value</td>
                                    <td align="center">Yes</td>
                                </tr>
                                <tr>
                                    <td align="left">Stratum</td>
                                    <td align="center">Yes</td>
                                    <td align="left">ASSIST cocaine value</td>
                                    <td align="center">Yes</td>
                                </tr>
                                <tr>
                                    <td align="left">Type of school</td>
                                    <td align="center">Yes</td>
                                    <td align="left">ASSIST amphetamine value</td>
                                    <td align="center">Yes</td>
                                </tr>
                                <tr>
                                    <td align="left">State test score</td>
                                    <td align="center">Yes</td>
                                    <td align="left">ASSIST inhalants value</td>
                                    <td align="center">Yes</td>
                                </tr>
                                <tr>
                                    <td align="left">Tuition cost</td>
                                    <td align="center">No</td>
                                    <td align="left">ASSIST sedatives value</td>
                                    <td align="center">Yes</td>
                                </tr>
                                <tr>
                                    <td align="left">Academic risk</td>
                                    <td align="center">Yes</td>
                                    <td align="left">ASSIST hallucinogens value</td>
                                    <td align="center">No</td>
                                </tr>
                                <tr>
                                    <td align="left">Family risk</td>
                                    <td align="center">Yes</td>
                                    <td align="left">ASSIST opioids value</td>
                                    <td align="center">Yes</td>
                                </tr>
                                <tr>
                                    <td align="left">Economic risk</td>
                                    <td align="center">No</td>
                                    <td align="left">ASSIST other drugs value</td>
                                    <td align="center">Yes</td>
                                </tr>
                                <tr>
                                    <td align="left">Psychosocial risk</td>
                                    <td align="center">No</td>
                                    <td align="left">Learning style: Converger</td>
                                    <td align="center">Yes</td>
                                </tr>
                                <tr>
                                    <td align="left">Depression level</td>
                                    <td align="center">Yes</td>
                                    <td align="left">Learning style: Diverger</td>
                                    <td align="center">Yes</td>
                                </tr>
                                <tr>
                                    <td align="left">Anxiety level</td>
                                    <td align="center">No</td>
                                    <td align="left">Learning style: Assimilator</td>
                                    <td align="center">Yes</td>
                                </tr>
                                <tr>
                                    <td align="left" style="border-bottom: 1px solid black">ASSIST
                                        tobacco value</td>
                                    <td align="center" style="border-bottom: 1px solid black"
                                        >Yes</td>
                                    <td align="left" style="border-bottom: 1px solid black">Learning
                                        style: Accommodator</td>
                                    <td align="center" style="border-bottom: 1px solid black"
                                        >Yes</td>
                                </tr>
                            </tbody>
                        </table>
                    </table-wrap>
                </p>
                <p>Finally, a logistic regressor was trained from the resulting characteristics.
                    Once the classifier was trained, it was tested on the validation data. Then, the
                    sensitivity, specificity, and geometric mean (G-mean) of the prediction were
                    calculated.</p>
                <p>Finally, as the objective is to assign predictive risks, the probabilities or
                    “scores” of the classifier are taken for each test sample and classified into
                    five ranges, as follows: i) 0 - 0.2, low risk; ii) 0.2 - 0.4, medium-low risk;
                    iii) 0.4 - 0.6, medium risk; iv) 0.6 - 0.8, medium-high risk; v) 0.8 - 1, high
                    risk. Finally, the performance of the classifier is measured in these end
                    ranges, which are of greatest interest. For the sake of clarity, <bold>high-risk
                        accuracy</bold> refers to the percentage of students labeled as high-risk
                    who did indeed drop out, and, conversely, <bold>low-risk accuracy</bold> is the
                    proportion of students who remained enrolled and who were successfully
                    classified by the system as such.</p>
            </sec>
            <sec>
                <title>3.2 Results and discussion</title>
                <p>The obtained results are shown in <xref ref-type="table" rid="t2">Table 2</xref>.
                    The testing periods are presented by row, and the evaluation metrics are given
                    in the columns. The main purpose of these metrics is to provide clarity about
                    how accurate the trained classifier is. The closer these metrics come to 1, the
                    more effective the classifier is.</p>
                <p>
                    <table-wrap id="t2">
                        <label>Table 2</label>
                        <caption>
                            <title>Classification results achieved by the early warning
                                system</title>
                        </caption>
                        <table>
                            <colgroup>
                                <col/>
                                <col/>
                                <col/>
                                <col/>
                                <col/>
                                <col/>
                            </colgroup>
                            <thead>
                                <tr>
                                    <th align="left"
                                        style="border-top: 1px solid black; border-bottom: 1px solid black"
                                        >Testing period</th>
                                    <th align="center"
                                        style="border-top: 1px solid black; border-bottom: 1px solid black"
                                        >G-mean</th>
                                    <th align="center"
                                        style="border-top: 1px solid black; border-bottom: 1px solid black"
                                        >Sensitivity</th>
                                    <th align="center"
                                        style="border-top: 1px solid black; border-bottom: 1px solid black"
                                        >Specificity</th>
                                    <th align="center"
                                        style="border-top: 1px solid black; border-bottom: 1px solid black"
                                        >High-risk accuracy</th>
                                    <th align="center"
                                        style="border-top: 1px solid black; border-bottom: 1px solid black"
                                        >Low-risk accuracy</th>
                                </tr>
                            </thead>
                            <tbody>
                                <tr>
                                    <td align="left">2017-1</td>
                                    <td align="center">0.5706</td>
                                    <td align="center">0.7225</td>
                                    <td align="center">0.4507</td>
                                    <td align="center">0.8</td>
                                    <td align="center">0.8333</td>
                                </tr>
                                <tr>
                                    <td align="left">2017-2</td>
                                    <td align="center">0.5360</td>
                                    <td align="center">0.7105</td>
                                    <td align="center">0.4043</td>
                                    <td align="center">0.375</td>
                                    <td align="center">0.75</td>
                                </tr>
                                <tr>
                                    <td align="left">2018-1</td>
                                    <td align="center">0.5808</td>
                                    <td align="center">0.5681</td>
                                    <td align="center">0.5938</td>
                                    <td align="center">0.625</td>
                                    <td align="center">0.9622</td>
                                </tr>
                                <tr>
                                    <td align="left">2018-2</td>
                                    <td align="center">0.6114</td>
                                    <td align="center">0.7172</td>
                                    <td align="center">0.5212</td>
                                    <td align="center">0.8077</td>
                                    <td align="center">0.6428</td>
                                </tr>
                                <tr>
                                    <td align="left">2019-1</td>
                                    <td align="center">0.6258</td>
                                    <td align="center">0.5164</td>
                                    <td align="center">0.7584</td>
                                    <td align="center">0.3382</td>
                                    <td align="center">0.9426</td>
                                </tr>
                                <tr>
                                    <td align="left" style="border-bottom: 1px solid black"
                                        >Validation test</td>
                                    <td align="center" style="border-bottom: 1px solid black"
                                        >0.6563</td>
                                    <td align="center" style="border-bottom: 1px solid black"
                                        >0.6197</td>
                                    <td align="center" style="border-bottom: 1px solid black"
                                        >0.6951</td>
                                    <td align="center" style="border-bottom: 1px solid black"
                                        >0.6226</td>
                                    <td align="center" style="border-bottom: 1px solid black"
                                        >0.9450</td>
                                </tr>
                            </tbody>
                        </table>
                    </table-wrap>
                </p>
                <p>Since the aim of this prediction model is to detect students at high risk of
                    dropping out, we focus on the metrics of high-risk accuracy and low-risk
                    accuracy. In some periods the model achieves outstanding high-risk accuracy, as
                    in the 2017-1 and 2018-2 periods. However, in 2017-2 and 2019-1, the system
                    performs more poorly. Additionally, in the validation test, the system achieves
                    a high-risk accuracy of 0.6226. Nonetheless, these results are not as
                    problematic as they may seem, because this means that the system is labeling
                    some students as at high risk of dropout, but ultimately they continue studying.
                    This can often happen because the system is trained from some features measured
                    before the start of their first semester, and does not consider the university’s
                    social intervention program, the Integral Support Program (PAI), which provides
                    economic, academic, and biopsychosocial support. Since this assistance is
                    focused on the student population facing the greatest risk and greatest
                    difficulties in college, the support received may encourage them to remain
                    enrolled, yielding a poor high-risk accuracy in the system. By contrast, the
                    system exhibits adequate low-risk accuracy in most cases, ranging from 75% to
                    96%, which means that low-risk predictions are fairly accurate. It is worth
                    stressing that it is more burdensome to have a low low-risk accuracy than a low
                    high-risk accuracy because a poor low-risk accuracy means that many dropout
                    students are not being detected by the algorithm, and the system can ill afford
                    not to detect students at high risk of quitting their studies.</p>
                <p>There are also recent machine learning methodologies for early detection of
                    students at risk of dropping out (<xref ref-type="bibr" rid="B3">Berens et al.,
                        2018</xref>; <xref ref-type="bibr" rid="B11">González &amp; Arismendi,
                        2018</xref>; <xref ref-type="bibr" rid="B21">Pérez et al., 2018</xref>),
                    just one of which focuses on first-year students. <xref ref-type="bibr"
                        rid="B21">Pérez et al. (2018)</xref> modeled a predictive system for
                    retention of first-year students at Bernardo O’Higgins University, which
                    obtained 86.4% accuracy for the student retention variable. Although this result
                    seems quite high, it can be misleading, since the study did not consider the
                    imbalance of the dataset. This may mean that all students in the dataset were
                    classified as “retention students,” even those who actually dropped out. Our
                    methodology instead focuses on this specific group of potential dropout
                    students, since it is those students who require special attention from the
                    university. Compared with the other approaches, the predictor variables used in
                    each one of these studies vary from country to country, and even though some may
                    be similar, each national education system - and, indeed, each individual
                    university - may collect different information, making direct comparisons
                    unfair. In fact, this methodology is, to the best of our knowledge, the first
                    dropout prediction approach that includes the Alcohol, Smoking, and Substance
                    Involvement Screening Test and learning styles as predictor variables.</p>
                <p>The results obtained by our model match several studies that identify drug use as
                    one potential cause of college dropout (<xref ref-type="bibr" rid="B20">Patrick
                        et al., 2016</xref>), since the feature selection technique chose most of
                    the ASSIST test variables as key variables for dropout prediction. The
                    depression variable was also selected by the model, which corroborates some
                    studies that have shown that depression is related to a lower grade point
                    average and therefore dropout (<xref ref-type="bibr" rid="B4">Bruffaerts et al.,
                        2018</xref>).</p>
            </sec>
        </sec>
        <sec sec-type="conclusions">
            <title>IV. Conclusions</title>
            <p>This paper presents an early dropout prediction system to detect students at high
                risk of leaving college. This system is based on the processing and analysis of
                several variables influencing university dropout. The analysis focused on
                first-semester students, because they allow early identification of at-risk students
                by higher education institutions, and additionally, it is this group of students who
                are most likely to drop out of college. Specifically, we train a machine learning
                algorithm to recognize patterns in first-semester dropout students from previous
                years, enabling the system to learn to detect the characteristics of a student at
                high risk of dropping out. Thus, the system ties in with the Integral Support
                Program of the Universidad Tecnológica de Pereira, which is responsible for
                providing these students with economic and psychological support and responding to
                the biopsychosocial, academic, economic, and policy needs of high-risk students, to
                encourage them to stay in college. The results show that the system can discriminate
                between students at risk and not at risk of dropping out, establishing that at least
                62.26% of students labeled as high-risk will indeed drop out. This will allow these
                students to receive prompt attention, thereby reducing the student attrition
                rate.</p>
        </sec>
    </body>
    <back>
        <ref-list>
            <title>Referencias</title>
            <ref id="B1">
                <mixed-citation>Ameri, S., Fard, M. J., Chinnam, R. B., &amp; Reddy, C. K. (2016).
                        <italic>Survival analysis based framework for early prediction of student
                        dropouts</italic>. In Proceedings of the 25th ACM International on
                    Conference on Information and Knowledge Management (pp. 903-912).
                    https://doi.org/10.1145/2983323.2983351</mixed-citation>
                <element-citation publication-type="confproc">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Ameri</surname>
                            <given-names>S.</given-names>
                        </name>
                        <name>
                            <surname>Fard</surname>
                            <given-names>M. J.</given-names>
                        </name>
                        <name>
                            <surname>Chinnam</surname>
                            <given-names>R. B.</given-names>
                        </name>
                        <name>
                            <surname>Reddy</surname>
                            <given-names>C. K.</given-names>
                        </name>
                    </person-group>
                    <year>2016</year>
                    <source>Survival analysis based framework for early prediction of student
                        dropouts</source>
                    <conf-name>25ACM International on Conference on Information and Knowledge
                        Management</conf-name>
                    <fpage>903</fpage>
                    <lpage>912</lpage>
                    <pub-id pub-id-type="doi">10.1145/2983323.2983351</pub-id>
                </element-citation>
            </ref>
            <ref id="B2">
                <mixed-citation>Bean, J. P. (1985). Interaction effects based on class level in an
                    explanatory model of college student dropout syndrome. <italic>American
                        Educational Research Journal</italic>, <italic>22</italic>(1), 35-64.
                    https://doi.org/10.3102/00028312022001035</mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Bean</surname>
                            <given-names>J. P.</given-names>
                        </name>
                    </person-group>
                    <year>1985</year>
                    <article-title>Interaction effects based on class level in an explanatory model
                        of college student dropout syndrome</article-title>
                    <source>American Educational Research Journal</source>
                    <volume>22</volume>
                    <issue>1</issue>
                    <fpage>35</fpage>
                    <lpage>64</lpage>
                    <pub-id pub-id-type="doi">10.3102/00028312022001035</pub-id>
                </element-citation>
            </ref>
            <ref id="B3">
                <mixed-citation>Berens, J., Schneider, K., Görtz, S., Oster, S., &amp; Burghoff, J.
                    (2018). <italic>Early detection of students at risk: Predicting student dropouts
                        using administrative student data and machine learning methods</italic>
                    (Working paper No. 7259). Center for Economic Studies &amp; Ifo Institute.
                    http://dx.doi.org/10.2139/ssrn.3275433</mixed-citation>
                <element-citation publication-type="other">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Berens</surname>
                            <given-names>J.</given-names>
                        </name>
                        <name>
                            <surname>Schneider</surname>
                            <given-names>K.</given-names>
                        </name>
                        <name>
                            <surname>Görtz</surname>
                            <given-names>S.</given-names>
                        </name>
                        <name>
                            <surname>Oster</surname>
                            <given-names>S.</given-names>
                        </name>
                        <name>
                            <surname>Burghoff</surname>
                            <given-names>J.</given-names>
                        </name>
                    </person-group>
                    <year>2018</year>
                    <source>Early detection of students at risk: Predicting student dropouts using
                        administrative student data and machine learning methods</source>
                    <comment>Working paper No. 7259</comment>
                    <publisher-name>Center for Economic Studies &amp; Ifo Institute</publisher-name>
                    <pub-id pub-id-type="doi">10.2139/ssrn.3275433</pub-id>
                </element-citation>
            </ref>
            <ref id="B4">
                <mixed-citation>Bruffaerts, R., Mortier, P., Kiekens, G., Auerbach, R. P., Cuijpers,
                    P., Demyttenaere, K., Green, G., Nock, M., &amp; Kessler, R. C. (2018). Mental
                    health problems in college freshmen: Prevalence and academic functioning.
                        <italic>Journal of Affective Disorders</italic>, 225, 97-103.
                    https://doi.org/10.1016/j.jad.2017.07.044</mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Bruffaerts</surname>
                            <given-names>R.</given-names>
                        </name>
                        <name>
                            <surname>Mortier</surname>
                            <given-names>P.</given-names>
                        </name>
                        <name>
                            <surname>Kiekens</surname>
                            <given-names>G.</given-names>
                        </name>
                        <name>
                            <surname>Auerbach</surname>
                            <given-names>R. P.</given-names>
                        </name>
                        <name>
                            <surname>Cuijpers</surname>
                            <given-names>P.</given-names>
                        </name>
                        <name>
                            <surname>Demyttenaere</surname>
                            <given-names>K.</given-names>
                        </name>
                        <name>
                            <surname>Green</surname>
                            <given-names>G.</given-names>
                        </name>
                        <name>
                            <surname>Nock</surname>
                            <given-names>M.</given-names>
                        </name>
                        <name>
                            <surname>Kessler</surname>
                            <given-names>R. C.</given-names>
                        </name>
                    </person-group>
                    <year>2018</year>
                    <article-title>Mental health problems in college freshmen: Prevalence and
                        academic functioning</article-title>
                    <source>Journal of Affective Disorders</source>
                    <volume>225</volume>
                    <fpage>97</fpage>
                    <lpage>103</lpage>
                    <pub-id pub-id-type="doi">10.1016/j.jad.2017.07.044</pub-id>
                </element-citation>
            </ref>
            <ref id="B5">
                <mixed-citation>Chawla, N. V., Bowyer, K. W., Hall, L. O., &amp; Kegelmeyer, W. P.
                    (2002). SMOTE: Synthetic Minority Over-Sampling Technique. <italic>The Journal
                        of Artificial Intelligence Research</italic>, <italic>16</italic>, 321-357.
                    https://doi.org/10.1613/jair.953</mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Chawla</surname>
                            <given-names>N. V.</given-names>
                        </name>
                        <name>
                            <surname>Bowyer</surname>
                            <given-names>K. W.</given-names>
                        </name>
                        <name>
                            <surname>Hall</surname>
                            <given-names>L. O.</given-names>
                        </name>
                        <name>
                            <surname>Kegelmeyer</surname>
                            <given-names>W. P.</given-names>
                        </name>
                    </person-group>
                    <year>2002</year>
                    <article-title>SMOTE: Synthetic Minority Over-Sampling Technique</article-title>
                    <source>The Journal of Artificial Intelligence Research</source>
                    <volume>16</volume>
                    <fpage>321</fpage>
                    <lpage>357</lpage>
                    <pub-id pub-id-type="doi">10.1613/jair.953</pub-id>
                </element-citation>
            </ref>
            <ref id="B6">
                <mixed-citation>Chen, X. W., &amp; Jeong, J. C. (2007, December). <italic>Enhanced
                        recursive feature elimination</italic>. In Sixth International Conference on
                    Machine Learning and Applications (ICMLA) (pp. 429-435). IEEE.
                    https://doi.org/10.1109/ICMLA.2007.35</mixed-citation>
                <element-citation publication-type="confproc">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Chen</surname>
                            <given-names>X. W.</given-names>
                        </name>
                        <name>
                            <surname>Jeong</surname>
                            <given-names>J. C.</given-names>
                        </name>
                    </person-group>
                    <month>12</month>
                    <year>2007</year>
                    <source>Enhanced recursive feature elimination</source>
                    <conf-name>SixthInternational Conference on Machine Learning and Applications
                        (ICMLA)</conf-name>
                    <fpage>429</fpage>
                    <lpage>435</lpage>
                    <publisher-name>IEEE</publisher-name>
                    <pub-id pub-id-type="doi">10.1109/ICMLA.2007.35</pub-id>
                </element-citation>
            </ref>
            <ref id="B7">
                <mixed-citation>Daley, F. (2010). Why college students drop out and what we do about
                    it. <italic>College Quarterly</italic>, <italic>13</italic>(3), 1-5. <ext-link
                        ext-link-type="uri" xlink:href="https://eric.ed.gov/?id=EJ930391"
                        >https://eric.ed.gov/?id=EJ930391</ext-link>
                </mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Daley</surname>
                            <given-names>F.</given-names>
                        </name>
                    </person-group>
                    <year>2010</year>
                    <article-title>Why college students drop out and what we do about
                        it</article-title>
                    <source>College Quarterly</source>
                    <volume>13</volume>
                    <issue>3</issue>
                    <fpage>1</fpage>
                    <lpage>5</lpage>
                    <ext-link ext-link-type="uri" xlink:href="https://eric.ed.gov/?id=EJ930391"
                        >https://eric.ed.gov/?id=EJ930391</ext-link>
                </element-citation>
            </ref>
            <ref id="B8">
                <mixed-citation>Ferreyra, M. M., Avitabile, C., Botero Álvarez, J., Haimovich Paz,
                    F., &amp; Urzúa, S. (2017). <italic>At a crossroads: Higher education in Latin
                        America and the Caribbean</italic>. World Bank.</mixed-citation>
                <element-citation publication-type="book">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Ferreyra</surname>
                            <given-names>M. M.</given-names>
                        </name>
                        <name>
                            <surname>Avitabile</surname>
                            <given-names>C.</given-names>
                        </name>
                        <name>
                            <surname>Botero Álvarez</surname>
                            <given-names>J.</given-names>
                        </name>
                        <name>
                            <surname>Haimovich Paz</surname>
                            <given-names>F.</given-names>
                        </name>
                        <name>
                            <surname>Urzúa</surname>
                            <given-names>S.</given-names>
                        </name>
                    </person-group>
                    <year>2017</year>
                    <source>At a crossroads: Higher education in Latin America and the
                        Caribbean</source>
                    <publisher-name>World Bank</publisher-name>
                </element-citation>
            </ref>
            <ref id="B9">
                <mixed-citation>Fodor, I. K. (2002). <italic>A survey of dimension reduction
                        techniques</italic> (Technical Report No. UCRL-ID-148494). Lawrence
                    Livermore National Lab. <ext-link ext-link-type="uri"
                        xlink:href="https://www.osti.gov/biblio/15002155"
                        >https://www.osti.gov/biblio/15002155</ext-link>
                </mixed-citation>
                <element-citation publication-type="report">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Fodor</surname>
                            <given-names>I. K.</given-names>
                        </name>
                    </person-group>
                    <year>2002</year>
                    <source>A survey of dimension reduction techniques</source>
                    <pub-id pub-id-type="other">UCRL-ID-148494</pub-id>
                    <publisher-name>Lawrence Livermore National Lab</publisher-name>
                    <ext-link ext-link-type="uri" xlink:href="https://www.osti.gov/biblio/15002155"
                        >https://www.osti.gov/biblio/15002155</ext-link>
                </element-citation>
            </ref>
            <ref id="B10">
                <mixed-citation>García, S., Luengo, J., &amp; Herrera, F. (2015). <italic>Feature
                        selection</italic>. In Data preprocessing in data mining (pp. 163-193).
                    Springer International Publishing.
                    https://doi.org/10.1007/978-3-319-10247-4_7</mixed-citation>
                <element-citation publication-type="book">
                    <person-group person-group-type="author">
                        <name>
                            <surname>García</surname>
                            <given-names>S.</given-names>
                        </name>
                        <name>
                            <surname>Luengo</surname>
                            <given-names>J.</given-names>
                        </name>
                        <name>
                            <surname>Herrera</surname>
                            <given-names>F.</given-names>
                        </name>
                    </person-group>
                    <year>2015</year>
                    <source>Feature selection</source>
                    <comment>Data preprocessing in data mining</comment>
                    <fpage>163</fpage>
                    <lpage>193</lpage>
                    <publisher-name>Springer International Publishing</publisher-name>
                    <pub-id pub-id-type="doi">10.1007/978-3-319-10247-4_7</pub-id>
                </element-citation>
            </ref>
            <ref id="B11">
                <mixed-citation>González, F. I., &amp; Arismendi, K. J. (2018). Deserción
                    estudiantil en la educación superior técnico-profesional: explorando los
                    factores que inciden en alumnos de primer año [Student dropout in technical and
                    vocational higher education: Exploring factors that influence freshmen].
                        <italic>Revista de la Educación Superior</italic>, <italic>47</italic>(188),
                    109-137. https://doi.org/10.36857/resu.2018.188.510</mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>González</surname>
                            <given-names>F. I.</given-names>
                        </name>
                        <name>
                            <surname>Arismendi</surname>
                            <given-names>K. J.</given-names>
                        </name>
                    </person-group>
                    <year>2018</year>
                    <article-title>Deserción estudiantil en la educación superior
                        técnico-profesional: explorando los factores que inciden en alumnos de
                        primer año [Student dropout in technical and vocational higher education:
                        Exploring factors that influence freshmen]</article-title>
                    <source>Revista de la Educación Superior</source>
                    <volume>47</volume>
                    <issue>188</issue>
                    <fpage>109</fpage>
                    <lpage>137</lpage>
                    <pub-id pub-id-type="doi">10.36857/resu.2018.188.510</pub-id>
                </element-citation>
            </ref>
            <ref id="B12">
                <mixed-citation>Guo, X., Yin, Y., Dong, C., Yang, G., &amp; Zhou, G. (2008,
                    October). <italic>On the class imbalance problem</italic>. In 2008 Fourth
                    international conference on natural computation (Vol. 4, pp. 192-201). IEEE.
                    https://doi.org/10.1109/ICNC.2008.871</mixed-citation>
                <element-citation publication-type="confproc">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Guo</surname>
                            <given-names>X.</given-names>
                        </name>
                        <name>
                            <surname>Yin</surname>
                            <given-names>Y.</given-names>
                        </name>
                        <name>
                            <surname>Dong</surname>
                            <given-names>C.</given-names>
                        </name>
                        <name>
                            <surname>Yang</surname>
                            <given-names>G.</given-names>
                        </name>
                        <name>
                            <surname>Zhou</surname>
                            <given-names>G.</given-names>
                        </name>
                    </person-group>
                    <month>10</month>
                    <year>2008</year>
                    <source>On the class imbalance problem</source>
                    <conf-name>Fourthinternational conference on natural computation</conf-name>
                    <volume>4</volume>
                    <fpage>192</fpage>
                    <lpage>201</lpage>
                    <publisher-name>IEEE</publisher-name>
                    <pub-id pub-id-type="doi">10.1109/ICNC.2008.871</pub-id>
                </element-citation>
            </ref>
            <ref id="B13">
                <mixed-citation>Hariri, S., Kind, M. C., &amp; Brunner, R. J. (2019). Extended
                    isolation forest. <italic>IEEE Transactions on Knowledge and Data
                        Engineering</italic>, <italic>33</italic>(4), 1479-1489.
                    https://doi.org/10.1109/TKDE.2019.2947676</mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Hariri</surname>
                            <given-names>S.</given-names>
                        </name>
                        <name>
                            <surname>Kind</surname>
                            <given-names>M. C.</given-names>
                        </name>
                        <name>
                            <surname>Brunner</surname>
                            <given-names>R. J.</given-names>
                        </name>
                    </person-group>
                    <year>2019</year>
                    <article-title>Extended isolation forest</article-title>
                    <source>IEEE Transactions on Knowledge and Data Engineering</source>
                    <volume>33</volume>
                    <issue>4</issue>
                    <fpage>1479</fpage>
                    <lpage>1489</lpage>
                    <pub-id pub-id-type="doi">10.1109/TKDE.2019.2947676</pub-id>
                </element-citation>
            </ref>
            <ref id="B14">
                <mixed-citation>He, H., &amp; Ma, Y. (2013). <italic>Imbalanced learning:
                        Foundations, algorithms, and applications</italic>. John Wiley &amp;
                    Sons.</mixed-citation>
                <element-citation publication-type="book">
                    <person-group person-group-type="author">
                        <name>
                            <surname>He</surname>
                            <given-names>H.</given-names>
                        </name>
                        <name>
                            <surname>Ma</surname>
                            <given-names>Y.</given-names>
                        </name>
                    </person-group>
                    <year>2013</year>
                    <source>Imbalanced learning: Foundations, algorithms, and applications</source>
                    <publisher-name>John Wiley &amp; Sons</publisher-name>
                </element-citation>
            </ref>
            <ref id="B15">
                <mixed-citation>Himmel, E. (2002). Modelo de análisis de la deserción estudiantil en
                    la educación superior [Higher education student dropout analysis model].
                        <italic>Calidad en la Educación</italic>, (17), 91-108.
                    http://dx.doi.org/10.31619/caledu.n17.409</mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Himmel</surname>
                            <given-names>E.</given-names>
                        </name>
                    </person-group>
                    <year>2002</year>
                    <article-title>Modelo de análisis de la deserción estudiantil en la educación
                        superior [Higher education student dropout analysis model]</article-title>
                    <source>Calidad en la Educación</source>
                    <issue>17</issue>
                    <fpage>91</fpage>
                    <lpage>108</lpage>
                    <pub-id pub-id-type="doi">10.31619/caledu.n17.409</pub-id>
                </element-citation>
            </ref>
            <ref id="B16">
                <mixed-citation>Hosmer Jr, D. W., Lemeshow, S., &amp; Sturdivant, R. X. (2013).
                        <italic>Applied logistic regression</italic> (Vol. 398). John Wiley &amp;
                    Sons.</mixed-citation>
                <element-citation publication-type="book">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Hosmer</surname>
                            <given-names>D. W.</given-names>
                            <suffix>Jr</suffix>
                        </name>
                        <name>
                            <surname>Lemeshow</surname>
                            <given-names>S.</given-names>
                        </name>
                        <name>
                            <surname>Sturdivant</surname>
                            <given-names>R. X.</given-names>
                        </name>
                    </person-group>
                    <year>2013</year>
                    <source>Applied logistic regression</source>
                    <volume>398</volume>
                    <publisher-name>John Wiley &amp; Sons</publisher-name>
                </element-citation>
            </ref>
            <ref id="B17">
                <mixed-citation>Lakkaraju, H., Aguiar, E., Shan, C., Miller, D., Bhanpuri, N.,
                    Ghani, R., &amp; Addison, K. L. (2015, August). <italic>A machine learning
                        framework to identify students at risk of adverse academic
                    outcomes</italic>. In Proceedings of the 21th ACM SIGKDD International
                    Conference on Knowledge Discovery and Data Mining (pp. 1909-1918), Sydney NSW
                    Australia. https://doi.org/10.1145/2783258.2788620</mixed-citation>
                <element-citation publication-type="confproc">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Lakkaraju</surname>
                            <given-names>H.</given-names>
                        </name>
                        <name>
                            <surname>Aguiar</surname>
                            <given-names>E.</given-names>
                        </name>
                        <name>
                            <surname>Shan</surname>
                            <given-names>C.</given-names>
                        </name>
                        <name>
                            <surname>Miller</surname>
                            <given-names>D.</given-names>
                        </name>
                        <name>
                            <surname>Bhanpuri</surname>
                            <given-names>N.</given-names>
                        </name>
                        <name>
                            <surname>Ghani</surname>
                            <given-names>R.</given-names>
                        </name>
                        <name>
                            <surname>Addison</surname>
                            <given-names>K. L.</given-names>
                        </name>
                    </person-group>
                    <month>08</month>
                    <year>2015</year>
                    <source>A machine learning framework to identify students at risk of adverse
                        academic outcomes</source>
                    <conf-name>21ACM SIGKDD International Conference on Knowledge Discovery and Data
                        Mining</conf-name>
                    <fpage>1909</fpage>
                    <lpage>1918</lpage>
                    <publisher-loc>Sydney NSW Australia</publisher-loc>
                    <pub-id pub-id-type="doi">10.1145/2783258.2788620</pub-id>
                </element-citation>
            </ref>
            <ref id="B18">
                <mixed-citation>Lin, J. J., Imbrie, P. K., &amp; Reid, K. J. (2009. July).
                        <italic>Student retention modelling: An evaluation of different methods and
                        their impact on prediction results</italic>. Proceedings of the Research in
                    Engineering Education Symposium (REES), Palm Cove, Australia. <ext-link
                        ext-link-type="uri"
                        xlink:href="https://www.proceedings.com/content/023/023353webtoc.pdf"
                        >https://www.proceedings.com/content/023/023353webtoc.pdf</ext-link>
                </mixed-citation>
                <element-citation publication-type="confproc">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Lin</surname>
                            <given-names>J. J.</given-names>
                        </name>
                        <name>
                            <surname>Imbrie</surname>
                            <given-names>P. K.</given-names>
                        </name>
                        <name>
                            <surname>Reid</surname>
                            <given-names>K. J.</given-names>
                        </name>
                    </person-group>
                    <month>07</month>
                    <year>2009</year>
                    <source>Student retention modelling: An evaluation of different methods and
                        their impact on prediction results</source>
                    <conf-name>Engineering Education Symposium (REES)</conf-name>
                    <conf-loc>Palm Cove, Australia</conf-loc>
                    <ext-link ext-link-type="uri"
                        xlink:href="https://www.proceedings.com/content/023/023353webtoc.pdf"
                        >https://www.proceedings.com/content/023/023353webtoc.pdf</ext-link>
                </element-citation>
            </ref>
            <ref id="B19">
                <mixed-citation>Observatorio de Educación Superior. (2017, July 1). Deserción en la
                    educación superior [Dropout in higher education]. ODES Boletín (5). <ext-link
                        ext-link-type="uri"
                        xlink:href="https://www.sapiencia.gov.co/wp-content/uploads/2017/11/5_JULIO_BOLETIN_ODES_DESERCION_EN_LA_EDUCACION_SUPERIOR.pdf"
                        >https://www.sapiencia.gov.co/wp-content/uploads/2017/11/5_JULIO_BOLETIN_ODES_DESERCION_EN_LA_EDUCACION_SUPERIOR.pdf</ext-link>
                </mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <collab>Observatorio de Educación Superior</collab>
                    </person-group>
                    <day>01</day>
                    <month>07</month>
                    <year>2017</year>
                    <article-title>Deserción en la educación superior [Dropout in higher
                        education]</article-title>
                    <source>ODES Boletín</source>
                    <issue>5</issue>
                    <ext-link ext-link-type="uri"
                        xlink:href="https://www.sapiencia.gov.co/wp-content/uploads/2017/11/5_JULIO_BOLETIN_ODES_DESERCION_EN_LA_EDUCACION_SUPERIOR.pdf"
                        >https://www.sapiencia.gov.co/wp-content/uploads/2017/11/5_JULIO_BOLETIN_ODES_DESERCION_EN_LA_EDUCACION_SUPERIOR.pdf</ext-link>
                </element-citation>
            </ref>
            <ref id="B20">
                <mixed-citation>Patrick, M. E., Schulenberg, J. E., &amp; O’Malley, P. M. (2016).
                    High school substance use as a predictor of college attendance, completion, and
                    dropout: A national multicohort longitudinal study. <italic>Youth &amp;
                        society</italic>, <italic>48</italic>(3), 425-447.
                    https://doi.org/10.1177/0044118X13508961</mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Patrick</surname>
                            <given-names>M. E.</given-names>
                        </name>
                        <name>
                            <surname>Schulenberg</surname>
                            <given-names>J. E.</given-names>
                        </name>
                        <name>
                            <surname>O’Malley</surname>
                            <given-names>P. M.</given-names>
                        </name>
                    </person-group>
                    <year>2016</year>
                    <article-title>High school substance use as a predictor of college attendance,
                        completion, and dropout: A national multicohort longitudinal
                        study.</article-title>
                    <source>Youth &amp; society</source>
                    <volume>48</volume>
                    <issue>3</issue>
                    <fpage>425</fpage>
                    <lpage>447</lpage>
                    <pub-id pub-id-type="doi">10.1177/0044118X13508961</pub-id>
                </element-citation>
            </ref>
            <ref id="B21">
                <mixed-citation>Pérez, A. M., Escobar, C. R., Toledo, M. R., Gutierrez, L. B., &amp;
                    Reyes, G. M. (2018). Prediction model of first-year student desertion at
                    Universidad Bernardo O’ Higgins (UBO). <italic>Educação e Pesquisa</italic>,
                        <italic>44</italic>.
                    https://doi.org/10.1590/S1678-4634201844172094</mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Pérez</surname>
                            <given-names>A. M.</given-names>
                        </name>
                        <name>
                            <surname>Escobar</surname>
                            <given-names>C. R.</given-names>
                        </name>
                        <name>
                            <surname>Toledo</surname>
                            <given-names>M. R.</given-names>
                        </name>
                        <name>
                            <surname>Gutierrez</surname>
                            <given-names>L. B.</given-names>
                        </name>
                        <name>
                            <surname>Reyes</surname>
                            <given-names>G. M.</given-names>
                        </name>
                    </person-group>
                    <year>2018</year>
                    <article-title>Prediction model of first-year student desertion at Universidad
                        Bernardo O’ Higgins (UBO)</article-title>
                    <source>Educação e Pesquisa</source>
                    <volume>44</volume>
                    <pub-id pub-id-type="doi">10.1590/S1678-4634201844172094</pub-id>
                </element-citation>
            </ref>
            <ref id="B22">
                <mixed-citation>Raschka, S. (2018). Model evaluation, model selection, and algorithm
                    selection in machine learning. <italic>arXiv</italic> Cornell University.
                        <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1811.12808"
                        >http://arxiv.org/abs/1811.12808</ext-link>
                </mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Raschka</surname>
                            <given-names>S.</given-names>
                        </name>
                    </person-group>
                    <year>2018</year>
                    <article-title>Model evaluation, model selection, and algorithm selection in
                        machine learning</article-title>
                    <source>arXiv</source>
                    <publisher-name>Cornell University</publisher-name>
                    <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1811.12808"
                        >http://arxiv.org/abs/1811.12808</ext-link>
                </element-citation>
            </ref>
            <ref id="B23">
                <mixed-citation>Sandoval-Palis, I., Naranjo, D., Vidal, J., &amp; Gilar-Corbi, R.
                    (2020). Early dropout prediction model: A case study of university leveling
                    course students. <italic>Sustainability</italic>, <italic>12</italic>(22), 2-17.
                    https://doi.org/10.3390/su12229314</mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Sandoval-Palis</surname>
                            <given-names>I.</given-names>
                        </name>
                        <name>
                            <surname>Naranjo</surname>
                            <given-names>D.</given-names>
                        </name>
                        <name>
                            <surname>Vidal</surname>
                            <given-names>J.</given-names>
                        </name>
                        <name>
                            <surname>Gilar-Corbi</surname>
                            <given-names>R.</given-names>
                        </name>
                    </person-group>
                    <year>2020</year>
                    <article-title>Early dropout prediction model: A case study of university
                        leveling course students</article-title>
                    <source>Sustainability</source>
                    <volume>12</volume>
                    <issue>22</issue>
                    <fpage>2</fpage>
                    <lpage>17</lpage>
                    <pub-id pub-id-type="doi">10.3390/su12229314</pub-id>
                </element-citation>
            </ref>
            <ref id="B24">
                <mixed-citation>Sistema para la Prevención de la Deserción en las Instituciones de
                    Educación Superior-SPADIES. (2016). <italic>Reporte sobre deserción y graduación
                        en educación superior año 2016</italic> [Report on dropout and graduation in
                    higher education, year 2016]. <ext-link ext-link-type="uri"
                        xlink:href="https://bit.ly/3K0RQmc">https://bit.ly/3K0RQmc</ext-link>
                </mixed-citation>
                <element-citation publication-type="report">
                    <person-group person-group-type="author">
                        <collab>Sistema para la Prevención de la Deserción en las Instituciones de
                            Educación Superior-SPADIES</collab>
                    </person-group>
                    <year>2016</year>
                    <source><italic>Reporte sobre deserción y graduación en educación superior año
                            2016</italic> [Report on dropout and graduation in higher education,
                        year 2016]</source>
                    <ext-link ext-link-type="uri" xlink:href="https://bit.ly/3K0RQmc"
                        >https://bit.ly/3K0RQmc</ext-link>
                </element-citation>
            </ref>
            <ref id="B25">
                <mixed-citation>Thomas, L. (2002). Student retention in higher education: the role
                    of institutional habitus. <italic>Journal of Education Policy</italic>,
                        <italic>17</italic>(4), 423-442.
                    https://doi.org/10.1080/02680930210140257</mixed-citation>
                <element-citation publication-type="journal">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Thomas</surname>
                            <given-names>L.</given-names>
                        </name>
                    </person-group>
                    <year>2002</year>
                    <article-title>Student retention in higher education: the role of institutional
                        habitus</article-title>
                    <source>Journal of Education Policy</source>
                    <volume>17</volume>
                    <issue>4</issue>
                    <fpage>423</fpage>
                    <lpage>442</lpage>
                    <pub-id pub-id-type="doi">10.1080/02680930210140257</pub-id>
                </element-citation>
            </ref>
            <ref id="B26">
                <mixed-citation>Tinto, V. (1982). Defining dropout: A matter of perspective.
                        <italic>New Directions for Institutional Research</italic>, (36), 3-15.
                    https://doi.org/10.1002/ir.37019823603</mixed-citation>
                <element-citation publication-type="book">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Tinto</surname>
                            <given-names>V.</given-names>
                        </name>
                    </person-group>
                    <year>1982</year>
                    <person-group person-group-type="author">
                        <name>
                            <surname>Defining dropout:</surname>
                            <given-names>A matter of perspective</given-names>
                        </name>
                    </person-group>
                    <source>New Directions for Institutional Research</source>
                    <issue>36</issue>
                    <fpage>3</fpage>
                    <lpage>15</lpage>
                    <pub-id pub-id-type="doi">10.1002/ir.37019823603</pub-id>
                </element-citation>
            </ref>
            <ref id="B27">
                <mixed-citation>Urzúa, S. (2017). The economic impact of higher education. In M. M.
                    Ferreyra, C. Avitabile, J. Botero, F. Haimovich, &amp; S. Urzúa (Eds.),
                        <italic>At a crossroads: Higher education in Latin America and the
                        Caribbean</italic> (pp. 115-148). World Bank.
                    https://doi.org/10.1596/978-1-4648-1014-5_ch3</mixed-citation>
                <element-citation publication-type="book">
                    <person-group person-group-type="author">
                        <name>
                            <surname>Urzúa</surname>
                            <given-names>S.</given-names>
                        </name>
                    </person-group>
                    <year>2017</year>
                    <chapter-title>The economic impact of higher education</chapter-title>
                    <person-group person-group-type="editor">
                        <name>
                            <surname>Ferreyra</surname>
                            <given-names>M. M.</given-names>
                        </name>
                        <name>
                            <surname>Avitabile</surname>
                            <given-names>C.</given-names>
                        </name>
                        <name>
                            <surname>Botero</surname>
                            <given-names>J.</given-names>
                        </name>
                        <name>
                            <surname>Haimovich</surname>
                            <given-names>F.</given-names>
                        </name>
                        <name>
                            <surname>Urzúa</surname>
                            <given-names>S.</given-names>
                        </name>
                    </person-group>
                    <source>At a crossroads: Higher education in Latin America and the
                        Caribbean</source>
                    <fpage>115</fpage>
                    <lpage>148</lpage>
                    <publisher-name>World Bank</publisher-name>
                    <pub-id pub-id-type="doi">10.1596/978-1-4648-1014-5_ch3</pub-id>
                </element-citation>
            </ref>
        </ref-list>
        <fn-group>
            <fn fn-type="other" id="fn1">
                <p><bold>How to cite:</bold> Hoyos, J. K. and Daza, G. (2023). Predictive model to
                    identify college students with high dropout rates. <italic>Revista Electrónica
                        de Investigación Educativa, 25</italic>, e13, 1-10. <ext-link
                        ext-link-type="uri"
                        xlink:href="https://doi.org/10.24320/redie.2023.25.e13.5398"
                        >https://doi.org/10.24320/redie.2023.25.e13.5398</ext-link>
                </p>
            </fn>
        </fn-group>
    </back>
</article>
