Centre for Scholarship and Innovation
Most of the research around the identification of at-risk students and the prediction of their performance using Machine Learning focuses on developing the most accurate model. Despite recognising the importance of transparency and understanding of the models, little effort has been made to investigate the errors made by these models. In this project, we address this gap by analysing large errors of predicting students at-risk of not submitting their assignments, when even the sophisticated machine learning model was confident about a student outcome, yet the result was different.
The underlying predictions are part of OUAnalyse and are available in most of the undergraduate OU modules to all tutors via the Early Alert Indicators (EAI) Dashboard. The models are updated every week to capture the dynamic changes in student learning behaviour.
We analysed both groups of errors: students predicted to submit their assignment, yet they did not (False Negative) and students predicted not to submit yet they did (False Positive). We conducted mixed-method analysis, combining quantitative analysis of predictions of more than 25,000 students with follow-up online interviews with 27 of them and thematic analysis. We focused on undergraduate level 1 modules on STEM faculty and analysed the predictions for the 1st Tutor Marked Assignment (TMA).
The quantitative analysis revealed that the most prevalent factor in False Positives was immediate growth of student activity after the predictions were generated. Interviews revealed that amongst those students the most prevalent themes were students that were working last minute and were able to overcome last-minute problems, students that had high study workload and dropped some of their other modules, or students who had either the knowledge required for the TMA or studied outside the VLE. In False Negatives, non-submission of assignments was explained mostly by financial reasons, family responsibilities or deferring the module because of high study workload.
Overall, the factors explaining the different outcomes were not related to any of the student data currently captured by the model. As a result of this study, data related to student finance will be part of the OUAnalyse model. We proposed that the absence of missing data can be handled by either giving students an initial questionnaire or letting tutors know so they are able to capture this before the module starts. Intervention strategies based on student recommendations are suggested as well as considerations that we will make available to tutors in the OUA training materials, which might lead to better understanding of the capabilities of Predictive Learning Analytics and subsequently its better usage.