Back to Journals » International Journal of Nephrology and Renovascular Disease » Volume 12

Prediction of cardiovascular outcomes with machine learning techniques: application to the Cardiovascular Outcomes in Renal Atherosclerotic Lesions (CORAL) study

Authors Chen T, Brewster P, Tuttle KR, Dworkin LD, Henrich W, Greco BA, Steffes M, Tobe S, Jamerson K, Pencina K, Massaro JM, D'Agostino RB Sr, Cutlip DE, Murphy TP, Cooper CJ, Shapiro JI

Received 15 November 2018

Accepted for publication 6 February 2019

Published 21 March 2019 Volume 2019:12 Pages 49—58


Checked for plagiarism Yes

Review by Single-blind

Peer reviewer comments 2

Editor who approved publication: Professor Pravin Singhal

Tian Chen,1 Pamela Brewster,1 Katherine R Tuttle,2 Lance D Dworkin,1 William Henrich,3 Barbara A Greco,4 Michael Steffes,5 Sheldon Tobe,6 Kenneth Jamerson,7 Karol Pencina,8 Joseph M Massaro,8 Ralph B D’Agostino Sr,8 Donald E Cutlip,9 Timothy P Murphy,10 Christopher J Cooper,1 Joseph I Shapiro11

1University of Toledo, Toledo, OH, USA; 2Providence Health Care, University of Washington, Spokane, WA, USA; 3University of Texas Health Science Center, San Antonio, TX, USA; 4Baystate Health, Springfield, MA, USA; 5University of Minnesota, Minneapolis, MN, USA; 6University of Toronto, Toronto, ON, Canada; 7University of Michigan, Ann Arbor, MI, USA; 8Harvard Clinical Research Institute, Boston University, Boston, MA, USA; 9Beth Israel Deaconess Medical Center, Boston, MA, USA; 10Brown University, Providence, RI, USA; 11Marshall University, Huntington, WV, USA

Background: Data derived from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions (CORAL) study were analyzed in an effort to employ machine learning methods to predict the composite endpoint described in the original study.
Methods: We identified 573 CORAL subjects with complete baseline data and the presence or absence of a composite endpoint for the study. These data were subjected to several models including a generalized linear (logistic-linear) model, support vector machine, decision tree, feed-forward neural network, and random forest, in an effort to attempt to predict the composite endpoint. The subjects were arbitrarily divided into training and testing subsets according to an 80%:20% distribution with various seeds. Prediction models were optimized within the CARET package of R.
Results: The best performance of the different machine learning techniques was that of the random forest method which yielded a receiver operator curve (ROC) area of 68.1%±4.2% (mean ± SD) on the testing subset with ten different seed values used to separate training and testing subsets. The four most important variables in the random forest method were SBP, serum creatinine, glycosylated hemoglobin, and DBP. Each of these variables was also important in at least some of the other methods. The treatment assignment group was not consistently an important determinant in any of the models.
Conclusion: Prediction of a composite cardiovascular outcome was difficult in the CORAL population, even when employing machine learning methods. Assignment to either the stenting or best medical therapy group did not serve as an important predictor of composite outcome.
Clinical Trial Registration:, NCT00081731

Keywords: chronic kidney disease, cardiovascular disease, glomerular filtration rate, hypertension, ischemic renal disease, renal artery stenosis

Creative Commons License This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.

Download Article [PDF]  View Full Text [HTML][Machine readable]