Main Article Content

Oscar Jardey Suarez
Edier Hernán Bustos Velazco
Jaime Duván Reyes Roncancio




Scores on standardized tests are part of the measure of educational quality. The objective is to construct a predictive model, applying linear regression and multilevel linear regression, for the scores of the Saber 11 tests, using statistically significant factors collected from surveys administered to test takers between 2019 and 2022 in Bogotá, Colombia. The methodological approach is quantitative. The information organization procedure follows the Standard Process for Data Mining across Industries. The modeling utilized the open database of the Colombian Institute for the Promotion of Higher Education, consisting of 129,087 records (over 81% of the total). Women account for 53% of the records in the study. The Inter Class Correlation in the multilevel linear regression model among the 20 localities is 1.54. The Mean Absolute Percentage Error in the linear regression model is 11.79% for the entire dataset and 11.81% when using 70% of the data for training. The statistically significant factors include gender, socio-economic status, resources for studying at home, nutrition, student employment, and the institution. In conclusion, the possession and access to technological resources, hardware, and software, as well as the urban location of the institution, had a positive impact on scores during the COVID-19 pandemic, providing empirical evidence of a wider educational gap in populations with limited technological access or residing in rural areas.

