Back to Journals » Clinical Epidemiology » Volume 10

A phenotyping algorithm to identify acute ischemic stroke accurately from a national biobank: the Million Veteran Program

Authors Imran TF, Posner D, Honerlaw J, Vassy JL, Song RJ, Ho YL, Kittner SJ, Liao KP, Cai T, O'Donnell CJ, Djousse L, Gagnon DR, Gaziano JM, Wilson PWF, Cho K

Received 24 December 2017

Accepted for publication 18 May 2018

Published 16 October 2018 Volume 2018:10 Pages 1509—1521


Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 4

Editor who approved publication: Professor Henrik Toft Sørensen

Tasnim F Imran,1–3,* Daniel Posner,1,4,* Jacqueline Honerlaw,1 Jason L Vassy,1,2 Rebecca J Song,1 Yuk-Lam Ho,1 Steven J Kittner,5 Katherine P Liao,1,2 Tianxi Cai,1,6 Christopher J O’Donnell,1,2 Luc Djousse,1,2 David R Gagnon,1,4 J Michael Gaziano,1,2 Peter WF Wilson,7,8 Kelly Cho1,2

On behalf of the VA Million Veteran Program

1Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Cooperative Studies Program, VA Boston Healthcare System, Boston, MA, USA; 2Department of Medicine, Division of Aging, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA; 3Department of Medicine, Cardiology Section, Boston Medical Center, Boston University School of Medicine, Boston, MA, USA; 4Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; 5Department of Neurology, Baltimore VA Medical Center and University of Maryland School of Medicine, Baltimore, MD, USA; 6Harvard T. H. Chan School of Public Health, Boston, MA, USA; 7Atlanta VA Medical Center, Decatur, GA, USA; 8Department of Medicine, Division of Cardiovascular Disease, Emory University School of Medicine, Atlanta, GA, USA

*These authors contributed equally to this work

Background: Large databases provide an efficient way to analyze patient data. A challenge with these databases is the inconsistency of ICD codes and a potential for inaccurate ascertainment of cases. The purpose of this study was to develop and validate a reliable protocol to identify cases of acute ischemic stroke (AIS) from a large national database.
Methods: Using the national Veterans Affairs electronic health-record system, Center for Medicare and Medicaid Services, and National Death Index data, we developed an algorithm to identify cases of AIS. Using a combination of inpatient and outpatient ICD9 codes, we selected cases of AIS and controls from 1992 to 2014. Diagnoses determined after medical-chart review were considered the gold standard. We used a machine-learning algorithm and a neural network approach to identify AIS from ICD9 codes and electronic health-record information and compared it with a previous rule-based stroke-classification algorithm.
Results: We reviewed administrative hospital data, ICD9 codes, and medical records of 268 patients in detail. Compared with the gold standard, this AIS algorithm had a sensitivity of 91%, specificity of 95%, and positive predictive value of 88%. A total of 80,508 highly likely cases of AIS were identified using the algorithm in the Veterans Affairs national cardiovascular disease-risk cohort (n=2,114,458).
Conclusion: Our algorithm had high specificity for identifying AIS in a nationwide electronic health-record system. This approach may be utilized in other electronic health databases to accurately identify patients with AIS.

Keywords: acute ischemic stroke, algorithm, large databases, big data, administrative health data, cerebrovascular accident

Creative Commons License This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.

Download Article [PDF]  View Full Text [HTML][Machine readable]