Predicting who will graduate from a university is a
difficult challenge, especially for US public
universities whose missions serve diverse populations
under relaxed admission criteria. Building predictive
models for entering freshmen poses many problems:
some students receive financial aid, others do not;
some enter with SAT scores, others with ACT scores;
some students stop out and then return. And, with the
advent of the modern data warehouse, a dizzying array
of data exists, which might, or might not, help build
predictive models. This doctoral study examines the
work required to build four predictive models for
entering freshmen: logistic regression, automatic
cluster detection, neural network, and decision tree.
Practical problems are addressed squarely: Cleaning
institutional data, dealing with missing data,
adjusting model parameters, recognizing model drift,
grouping students into prediction bands, and
evaluating disparate model types are just some of the
practical solutions shared in this work.
difficult challenge, especially for US public
universities whose missions serve diverse populations
under relaxed admission criteria. Building predictive
models for entering freshmen poses many problems:
some students receive financial aid, others do not;
some enter with SAT scores, others with ACT scores;
some students stop out and then return. And, with the
advent of the modern data warehouse, a dizzying array
of data exists, which might, or might not, help build
predictive models. This doctoral study examines the
work required to build four predictive models for
entering freshmen: logistic regression, automatic
cluster detection, neural network, and decision tree.
Practical problems are addressed squarely: Cleaning
institutional data, dealing with missing data,
adjusting model parameters, recognizing model drift,
grouping students into prediction bands, and
evaluating disparate model types are just some of the
practical solutions shared in this work.