
Machine Learning Study Looks at Cancer Mortality in Rural…
Colorectal cancer (CRC) is the second leading cause of cancer death in the U.S. and the rural Appalachia area suffers the highest CRC incidence and mortality rates. There are several non-clinical health-related social determinant factors (SDOH) associated with cancer mortality.
This Nature.com Scientific Reports study: “Machine learning (ML) to evaluate the effects of non-clinical social determinant features in predicting colorectal cancer mortality in a medically under-served Appalachian population” describes novel predictive modeling that uses demographic, clinical, and SDOH features in health records data from Appalachian community cancer centers to predict 5-year CRC survival.
The authors trained, validated, and tested four gradient-boosted tree ensemble (XGBoost) machine learning models, which were developed using selected combinations of available features.
They found that the area under the receiver operating characteristic curve was greatest in the model that included SDOH features with demographic and clinical features (0.79; P < 0.0001). And, feature stratification showed rurality as the top SDOH feature. The study demonstrated that the ML model performs better when SDOH features are included, and that rurality significantly impacts CRC survival in Appalachia.
The study also provides preliminary indications that further data collection and evaluation of SDOH factors would strengthen our understanding of their impact on cancer survival in Appalachia and on other underserved populations, and could improve development of strategies for care delivery.
________
Montgomery, A., Vadapalli, R., Dinenno, F.A. et al. Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population. Sci Rep 15, 25781 (2025). https://doi.org/10.1038/s41598-025-11074-y