How can we leverage machine learning to reduce the high school dropout rate?

Project Summary

The Policy Lab partnered with the RI Department of Education (RIDE) and DataSpark on developing an early warning system (EWS) that can predict which students are most prone to dropping out, allowing educators the time to act upon this information. Using six indicators, the dashboard generates color-coded risk bands to inform a multi-tiered system of support and keep students in school and on track to graduate. We are currently exploring how the changes and challenges of the COVID-19 pandemic is impacting the underlying data and how they might impact model performance.

Why is this issue important?

One of the main goals of a K-12 education is simple: completion. We want students to pass their classes and cross the stage at their high school graduation. Unfortunately, completion of a K-12 education is not a given: 15% of public school students nationwide do not graduate high school within four years1. Early intervention and adequate support are essential for students who are at risk of dropping out.

To address this problem, the Rhode Island Department of Education (RIDE) partnered with The Policy Lab and DataSpark to develop an Early Warning System (EWS) to help school administrators and teachers connect 9th through 12th grade students who are at risk of dropping out with resources they need to be successful. The system features an educator information portal, which displays the latest available data on six student performance indicators updated monthly:

  • Attendance percentages
  • Grade retention
  • Suspensions
  • Math proficiency
  • English Language Arts (ELA) proficiency
  • A risk indicator of dropping out. The risk indicator is color-coded with red, yellow, or green to indicate high, moderate, and low risk of dropping out at the beginning of each school year.
The Early Warning System Risk Indicators
Figure 1.The Early Warning System Risk Indicators

What did we do?

In order to build a tool that Rhode Island educators would find useful, The Policy Lab and DataSpark conducted interviews and brainstorming sessions with many educators at RIDE and in schools across Rhode Island. After a design was finalized and a prototype built by RIDE’s Office of Data and Technology Services, we conducted a round of user testing and a small-scale pilot.

Additionally, The Policy Lab led the development of the risk prediction model that powers the risk indicators. This machine learning model was trained on historical student outcome data on all public high school students in Rhode Island from between School Years 2007-08 and 2015-16. In building this model, The Policy Lab held several sessions with educators to explain various aspects of the model and incorporate their subject matter expertise into various modeling decisions such as what data to include and how to weight data from different eras of RIDE policy regimes.

What have we learned (so far)?

After one month of user testing and small-scale pilot tests, the EWS was successfully released to all teachers and administrators in February 2020, one month before the COVID-19 pandemic closed down schools and instruction started moving online. Due to uncertainty in model performance under changed circumstances, RIDE decided to temporarily disable the predictive model but retain the other performance indicators on their information portal. To relaunch the model, RIDE is working with The Policy Lab and DataSpark to explore how several potential challenges of the pandemic to the EWS's underlying data might impact model performance.

What happens next?

Among the issues resulting from the pandemic are increases in unemployment, rises in evictions and food insecurity, and changes to policies around in-person schooling. As a result, many students are being educated within a very different context than they were prior to the pandemic. These contextual differences impact what we can reliably say about which students may be at risk for dropping out, in a phenomenon known as concept drift. Concept drift, in this context, means the massive disruptions to everyday instruction caused by the pandemic have changed the data used for dropout prediction and affected the performance of the predictive model. The Policy Lab will work with RIDE to make a decision on whether and how to relaunch the dropout prediction models for School Year 2020-21 and beyond.

  1. National Center for Education Statistics. (n.d.). NCES Fast Facts: High School Graduate Rates. Retrieved March 10, 2021, from

How to cite this Project: The Policy Lab. (2021, May 17). How can we leverage machine learning to reduce the high school dropout rate?. The Policy Lab.

Get Updates

Sign up to get updates on projects, events, and new episodes of our podcast, 30,000 Leagues


Check out our podcast

30,000 Leagues

Listen with Apple Podcasts
Listen with Google Podcasts
Listen with Spotify 863-3392

An official website of Brown University