Machine Learning For Tuning, Selection, And Ensemble Of Multiple Risk Scores For Predicting Type 2 Diabetes
Received 2 August 2019
Accepted for publication 8 October 2019
Published 5 November 2019 Volume 2019:12 Pages 189—198
Checked for plagiarism Yes
Review by Single anonymous peer review
Peer reviewer comments 2
Editor who approved publication: Professor Marco Carotenuto
Yujia Liu,1 Shangyuan Ye,2 Xianchao Xiao,1 Chenglin Sun,1 Gang Wang,1 Guixia Wang,1 Bo Zhang3
1Department of Endocrinology and Metabolism, The First Hospital of Jilin University, Changchun, Jilin 130021, People’s Republic of China; 2Department of Population Medicine, Harvard Pilgrim Health Care and Harvard Medical School, Boston, MA, USA; 3Department of Neurology and ICCTR Biostatistics and Research Design Center, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
Correspondence: Bo Zhang
Department of Neurology and ICCTR Biostatistics and Research Design Center, Boston Children’s Hospital and Harvard Medical School, 21 Autumn Street, Boston, MA 02115, USA
Department of Endocrinology and Metabolism, The First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin 130021, People’s Republic of China
Background: This study proposes the use of machine learning algorithms to improve the accuracy of type 2 diabetes predictions using non-invasive risk score systems.
Methods: We evaluated and compared the prediction accuracies of existing non-invasive risk score systems using the data from the REACTION study (Risk Evaluation of Cancers in Chinese Diabetic Individuals: A Longitudinal Study). Two simple risk scores were established on the bases of logistic regression. Machine learning techniques (ensemble methods) were used to improve prediction accuracies by combining the individual score systems.
Results: Existing score systems from Western populations performed worse than the scores from Eastern populations in general. The two newly established score systems performed better than most existing scores systems but a little worse than the Chinese score system. Using ensemble methods with model selection algorithms yielded better prediction accuracy than all the simple score systems.
Conclusion: Our proposed machine learning methods can be used to improve the accuracy of screening the undiagnosed type 2 diabetes and identifying the high-risk patients.
Keywords: type 2 diabetes, risk score, machine learning, voting, stacking, prediction
This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.Download Article [PDF] View Full Text [HTML][Machine readable]