Back to Journals » Clinical Pharmacology: Advances and Applications » Volume 9

An integrated epidemiological and neural net model of the warfarin effect in managed care patients

Authors Jacobs DM, Stefanovic F, Wilton G, Gomez-Caminero A, Schentag JJ

Received 6 March 2017

Accepted for publication 13 April 2017

Published 18 May 2017 Volume 2017:9 Pages 55—64


Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Professor Arthur Frankel

Download Article [PDF] 

David M Jacobs,1,2,* Filip Stefanovic,3,* Greg Wilton,2 Andres Gomez-Caminero,4 Jerome J Schentag1,2

1Department of Pharmacy Practice, University at Buffalo School of Pharmacy and Pharmaceutical Sciences, 2CPL Associates LLC, 3Department of Biomedical Engineering, University at Buffalo School of Engineering and Applied Sciences, Buffalo, NY, 4Global Pharmacovigilance and Epidemiology, Bristol Myers Squibb, Princeton, NJ, USA

*These authors contributed equally to this work.

Introduction: Risk assessment tools are utilized to estimate the risk for stroke and need of anticoagulation therapy for patients with atrial fibrillation (AF). These risk stratification scores are limited by the information inputted into them and a reliance on time-independent variables. The objective of this study was to develop a time-dependent neural net model to identify AF populations at high risk of poor clinical outcomes and evaluate the discriminatory ability of the model in a managed care population.
Methods: We performed a longitudinal, cohort study within a health-maintenance organization from 1997 to 2008. Participants were identified with incident AF irrespective of warfarin status and followed through their duration within the database. Three clinical outcome measures were evaluated including stroke, myocardial infarction, and hemorrhage. A neural net model was developed to identify patients at high risk of clinical events and defined to be an “enriched” patient. The model defines the enrichment based on the top 10 minimum mean square error output parameters that describe the three clinical outcomes. Cox proportional hazard models were utilized to evaluate the outcome measures.
Results: Among 285 patients, the mean age was 74±12 years with a mean follow-up of 4.3±2.6 years, and 154 (54%) were treated with warfarin. After propensity score adjustment, warfarin use was associated with a slightly increased risk of adverse outcomes (including stroke, myocardial infarction, and hemorrhage), though it did not attain statistical significance (adjusted hazard ratio [aHR] =1.22; 95% confidence interval [CI] 0.75–1.97; p=0.42). Within the neural net model, subjects at high risk of adverse outcomes were identified and labeled as “enriched.” Following propensity score adjustment, enriched subjects were associated with an 81% higher risk of adverse outcomes as compared to nonenriched subjects (aHR=1.81; 95% CI, 1.15–2.88; p=0.01).
Conclusion: Enrichment methodology improves the statistical discrimination of meaningful endpoints when used in a health records-based analysis.

Keywords: atrial fibrillation, neural net model, warfarin, epidemiology


Atrial fibrillation (AF) is the most common cardiac arrhythmia, and its incidence is expected to continue to increase during the next decade.1 In the USA, the prevalence of AF ranges from 3 to 6 million in 2010, and is expected to rise to as high as 12 million in 2050.2,3 AF is associated with significant morbidity and mortality including increased risk of cardiovascular risk factors, structural heart disease, stroke, and heart failure.46

Oral vitamin K antagonists, such as warfarin, have been utilized for decades to reduce the risk of stroke in patients with nonvalvular AF.7 Multiple randomized controlled trials have shown a two-third reduction in risk of stroke for patients on warfarin as compared to placebo.8,9 Despite this benefit, warfarin has numerous concerns including a narrow therapeutic window that requires close monitoring of the international normalized ratio (INR).10 Furthermore, warfarin has a highly variable dose response attributable to different genetic, clinical, pharmaceutical, and dietary factors.10 A patient’s CYP2C9 and VKORC1 genotype can influence the optimal starting dose of warfarin.11 Variant alleles of these genes are associated with reduced metabolism of warfarin, therefore higher warfarin concentrations. The dose of warfarin must be tailed for each patient according to the patient’s INR response, which may be influenced by these genotypes. Frequent monitoring is a major burden in administration and it has led to underutilization of warfarin in a substantial portion of AF patients.12 For those treated with warfarin, multiple and prolonged interruptions in therapy lead to an increased risk of stroke and further complications.13

Traditionally, risk assessment tools are utilized to estimate the risk for stroke and need of anticoagulation therapy for patients with AF. This has been accomplished through the application of the CHADS2 risk score and more recently the CHA2DS2-VASc scoring system.14 These risk stratification scores are limited by the information inputted into them and a reliance on time-independent variables. Moreover, these scores only evaluate stroke risk but do not take into account possible hemorrhagic risk or other cardiac events, such as a myocardial infarction (MI). For these reasons, we sought to develop a time-dependent model to identify an enriched AF population at risk of adverse clinical outcomes including stroke, MI, and hemorrhage. Therefore, the primary objective of this study was to develop a model to identify an enriched population at high risk of poor outcomes. The secondary objective was to evaluate cardiac endpoints (stroke and MI) and safety (hemorrhage) in an AF sample utilizing both standard of care and neural net models. First we will analyze these endpoints through a standard-of-care model (warfarin users vs nonusers), then we will identify subjects as enriched through a neural net model to see whether we can better discriminate subjects at risk of adverse clinical outcomes. Our working hypothesis is that the neural net model will identify an enriched population and enrichment will be associated with a higher risk of adverse clinical outcomes.


Study design

This was a longitudinal, retrospective cohort study of patients enrolled in a local, health-maintenance organization from January 1, 1997, to December 31, 2008. This database is composed of outpatient and inpatient health care claims, laboratory data, and pharmacy data. The laboratory data include all tests performed and results of those tests. Pharmacy data include all pertinent patient demographics and medication dosing information regardless of pharmacy location or affiliation. The Institutional Review Board of the University at Buffalo approved this study, and a waiver of patient informed consent was granted by the review board. All efforts were made to protect patient identification and any sensitivity and confidential information as required by the Health Insurance Portability and Accountability Act of 1996 guidelines.

Patient population

This study included all patients aged 18 years or older who were enrolled during the study period and who met the following criteria: 1) diagnosed with AF after January 1, 1997, with at least one office visit following diagnosis; 2) greater than five INR values recorded within the database in patients on warfarin therapy; 3) at least 90 days of clinical data before and after index AF diagnosis date.

Two groups of patients were excluded from the study. First, we excluded patients with a history of valvular disease including patients with mitral stenosis or artificial heart valves because the focus of this analysis is on nonvalvular AF. These patients were identified by the following International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) code: 35.10 to 35.14, 35.20, 35.28, 394.0, 394.2, 396.0, 396.8, V43.3, and V42.2. Second, we excluded patients with hypothyroidism because it is a risk factor for AF. These patients were identified based on ICD-9-CM code 244.9 or through pharmacy records with administration of a thyroid replacement medication.

Treatment groups

Patients were divided into two groups based on warfarin exposure at index diagnosis. Warfarin users were defined as patients who received and filled a warfarin medication for at least 90 days from index diagnosis. Nonusers were defined as patients who did not receive warfarin during the study period.

Patient outcomes

An outcome event was defined as any occurrence of an ICD-9-CM code during any inpatient or outpatient health care encounter during the study follow-up period. The ICD-9-CM codes utilized for identification of stroke, MI, and hemorrhage are described in Table 1. Assessment for outcomes began concurrently for warfarin users and nonusers.

Table 1 The International Classification of Diseases, 9th Revision, Clinical Modification codes that were used in this study to define diagnosis of various conditions


Information on demographics, medication use, and clinical variables, including congestive heart failure, hypertension, and diabetes, were obtained using the outpatient and inpatient health care claims through ICD-9-CM codes and pharmacy records (Table 1). Health care claims were used to further identify episodes of stroke, heart failure, and hemorrhages occurring prior to the diagnosis of AF. CHADS2 stroke risk score was calculated for each patient at baseline.15 Laboratory values, including fasting plasma glucose, glycated hemoglobin, systolic and diastolic blood pressure (BPD), albumin, liver function tests (including alanine aminotransferase and aspartate aminotransferase), and cholesterol markers (low-density lipoprotein-cholesterol [LDL-C], high-density lipoprotein-cholesterol [HDL-C], triglycerides [TG], and total cholesterol) were obtained from the outpatient laboratory system.

Data and statistical analysis

Patient demographics and medication use were characterized at index diagnosis of AF. Continuous variables were presented as mean±standard deviation (SD) and compared using a Student’s t-test for normally distributed variables, and median (interquartile range, IQR) was presented for non-normally distributed variables and compared using the Mann–Whitney U test. Normality was assessed using the Shapiro–Wilk test. Categorical variables were described as number (percentage) and compared using the chi-square test.

Crude incidence rates of first event were calculated for each group and compared using Kaplan–Meier estimates and differences evaluated using log-rank tests. We performed Cox proportional hazard models to determine the association of warfarin exposure and incidence of event. The events were modeled separately (stroke/MI and hemorrhage) as well as an overall effect. Associations are presented as hazard ratios (HRs) with 95% confidence interval (CI). The covariates that were evaluated for adjustment included age, sex, body mass index (BMI) (continuous), baseline CHADS2 score, fasting blood glucose, glycated hemoglobin, liver function tests, cholesterol markers, systolic blood pressure (BPS), BPD, and albumin. For our multivariable models, covariates were added to the model one at a time, and those that changed the primary effect estimate by more than 10% were retained in the model. Also, we developed a propensity score-based model because patients on warfarin may be at higher risk and consequently suffer from more frequent events. The propensity score was used to balance patient characteristics between those treated with warfarin and those that were untreated. A multivariable logistic regression model was built to predict the probability of being a warfarin user, controlling for the listed covariates. The propensity score-based models were then adjusted based on the propensity scores. A similar approach was used for the enriched population to identify differences in incidence of first event between enriched and nonenriched subjects. Comparisons with a p-value of <0.05 were considered to be statistically significant. All analyses are two tailed and were performed using SAS Version 9.4 (SAS Institute, Cary, NC, USA).

Identifying enriched patients

A neural net model was developed to identify patients at high risk of adverse clinical events and defined to be an “enriched” patient. Hereto, we will describe the enrichment process specific for the MI output, though a similar process was completed for the stroke and hemorrhage outputs. Patients are identified as enriched using the combined and windowed MI output from the model. Particularly the data are normalized and the SD is computed. For each data entry the distance from norm is calculated based on the SDs. Using these data, patterns of enrichment are characterized by the triangular function defined as:


where τ is the rise and fall time of the deviation from normal in days, t is the time displacement from max deviation, μ is the minimum SD, m is the maximum SD, and α is a scaling factor.

The simulation first identifies the maximum SD from normal and the adjacent minimum SDs to compute the relation. Time values for each value are identified based on the available data and the Λ(t) is populated. Λ(t) is overlaid with the normalized SD-based windowed MI data for a qualitative review, and the data are cross-correlated to identify the quantitative relationship. The correlation coefficient CP is used as the measurement of similarity between the two signals where 0<|CP|<1. Coefficient values >0.7 are considered strong and are defined as enriched. Visual inspection of graphic output is always used to verify the population included.

Modeling the systems

Modeling is completed in the following manner. First, data are loaded into the MatLab environment via data files, which include all of the parametric patient data. The data are then organized such that for each patient all recorded factors (e.g., blood pressure and albumin levels) and their corresponding values are part of a structured time-based array. Each patient’s data array thus includes multiple factors, typically >25. Patients are assigned ID numbers so that duplicate entries are not possible. As the data are loaded, input data are verified using the unique IDs and stored in a sequential manner based on their time stamps. In addition to structuring the data, patients are also sorted in “control” and “load” groups. The populations are presorted in the load data and no unique algorithm is used for sorting during the loading process.

After the data are loaded patient lists may be modified to include or remove patients with or without certain factors. For example, if blood pressure is an essential factor for analysis, we may remove patients without these data. This is accomplished by a simple looped investigation of all patients and their parameter lists. For this particular study, filtering was limited to patients who included at a minimum the following factors: TG, HDL-C, BMI, BPS, and BPD.

Following this preliminary filtering, patient data are analyzed to ascertain the first and last dates available for all factors listed. Each patient is loaded separately into a looping function, which examines all factors in the structured array, and determines the minimum, TMin, and largest, TMax, time. The purpose is to determine time scales for events occurring, measurement data times, and time constants for the simulation as will be described later. For this analysis, discrete time intervals of δTime = 30 days are created between TMin and TMax. The number of data points for all patients is, therefore, defined by Equation 1.




Note that this represents a much larger data set of possible values than that are given in the loaded data. To deal with this, we populate arrays of time data for each patient’s list of factors by filling it in initially with existing data that falls within the time limits, and then given intervals, and then populate the remaining cells in the array with zero values to create empty data cells. The simulation searches for all zero values, and interpolates using a linear interpolant to populate the remaining cells, as described by Equation 2, where Fi is the value of the factor data in the ith cell of the time data, Ti is the time during which that factor data occur in 30-day discrete intervals, F1 and F2 are the most proximal real data before and after the empty time data cell, respectively, and T1 and T2 are the times of the most proximal real data points before and after the empty time data cell.

These time data are then split into two groups, original data, DPFO, as presented above, and normalized data, DPFN, which is defined in Equation 3, where P denotes the patient and F is the factor. Each patient will have a different mean, μ, and SD, σ, of the time data array for each factor. These normalized data are used to define models that represent the input/output characteristics of each patient. Particularly, the loaded factor lists are patient inputs, while conditions such as MI, stroke, and hemorrhage are outputs we wish to identify.





The model aims to identify the relationship between these inputs and predefined output. We accomplish this by determining transfer functions for all factors simultaneously, as well as all factors individually. The transfer functions are determined first by taking the fast Fourier transform of each input, , (i.e., each factor) and output, (i.e., MI, stroke, hemorrhage) signal and rearranging them so that the zero frequency is in the middle. All possible transfer functions are then determined for each patient by deconvolution of the output and inputs as per Equation 4, where x is a number in the set of all possible transfer functions, which is defined by the number of factors for each patient, and is a real number. Before convolution it is required that any empty data set (or zero signal) is removed to eliminate the possibility of dividing by zero. Then, we create output estimates, , using each input, , and each transfer function, as per Equation 5. This creates a matrix of output estimates, Gp, for each patient, p. By combining all Gp’s into a single matrix, G = {Gp}, we can solve for the simple optimized transfer function. This is accomplished by first taking the Moore–Penrose pseudo-inverse of G, denoted as Gp, and convolving the two to find a matrix of weights, W, as per Equation 6. We then solve for the simple optimum transfer function for each patient by solving Equation 7.

When the simple optimized transfer function is determined we rerun the simulations using a sequential optimization and a rank ordered optimization. The sequential optimization runs each input parameter individually, and the best model is selected as a residual based on the minimum calculated error (see subsequent section). Then the residual model is used to rerun all the inputs again, to find the next best model, and so on. The process is looped until all models are solved, and are rank ordered sequentially and are given minimum mean squared error scores based on the subsequent section.

This list provides a learned optimization ranking of the models and their efficacy to use the factors as a predictive marker for the output of interest. To complete the optimization, however, the learned sequential optimization is rerun one more time through an ordered optimization to create the final rank ordered output of models. The ordered optimization takes the top five factor indicators and groups them into a single model, then each factor is rerun individually to identify the lowest error (as described in the next section), and the factors are ranked in that order.

Model error calculations

For any step in the analysis, which includes a selected model (i.e., transfer function), the following method is used for calculating the error. First, an estimated output is calculated as per Equation 5. Then, a root mean square (RMS) error is computed by Equation 8. The error is then normalized based on the RMS of the signal, Equation 9.



When we use the same transfer function to determine errors for multiple factors, or patients, then we compute the cumulative sum of errors to define the overall error of the transfer function for all systems, as depicted in the results.

Additional calculated factors

There are some additional factors included in the simulations, which are calculated as a combination of other factors that are part of the existing database. Those of interest include mymap, mygap, lipidrat, and tgRatH given by the following equations:





where it is assumed that any factor combined has matching time data so that, for example in Equation 12, LDL-C and HDL-C have data taken at the same time, so that lipidrat becomes an array of data of the same size as LDL-C and HDL-C.


Baseline characteristics

The initial population within the database was 194,268 subjects. There were 1,944 subjects identified with a diagnosis of AF and above the age of 18 years. Following our additional inclusion and exclusion criteria, 285 patients were identified (154 warfarin users and 131 nonusers) (Figure 1). This population accumulated 1,217 person-years of time, with a mean of 4.27 years (SD 2.64; median 4.24, IQR 2.02, 6.42) following their diagnosis of AF. The mean age at diagnosis was 70 years (SD 13 years), more than 42% are aged 75 and older, 51% were female, and 14% were smokers. There was a prior history of heart failure in 17%, hypertension in 33%, diabetes mellitus in 43%, and 0% had a prior history of stroke or transient ischemic attack. There were 142 (49.8%) patients with a CHADS2 score ≥2, and at baseline no patients were being treated with aspirin or other antiplatelet medications.

Figure 1 Study cohort.

Abbreviations: AF, atrial fibrillation; EMR, electronic medical record; INR, international normalized ratio.

Warfarin population

Warfarin was used at initiation of AF diagnosis in 54% (n=154) of the population. Overall, those patients receiving warfarin were older, mostly male and followed for a shorter time period than those not receiving warfarin (Table 2).

Table 2 Baseline and clinical characteristics of warfarin users and nonusersa

Notes: aContinuous data are presented as median (IQR), unless otherwise noted, and categorical date are presented as N (%).

Abbreviations: ALT, alanine transaminase; AST, aspartate transaminase; HDL-C, high-density lipoprotein-cholesterol; IQR, interquartile range; LDL-C, low-density lipoprotein-cholesterol; SD, standard deviation.

Stroke, myocardial infarction, and hemorrhage in warfarin population

In the period following AF diagnosis, there were 61 first stroke or MI events. Thirty-two strokes or MIs occurred in warfarin users compared to 29 in nonwarfarin users (incidence density: 7.0% vs 3.8%; p=0.06) (Table 3). In the bivariate analysis, warfarin users had an increased risk of stroke and MI, which trended toward statistical significance (HR=1.64; 95% CI, 0.98–2.75; p=0.06). However, following the multivariable and propensity score models, this association attenuated (propensity score model: adjusted hazard ratio [aHR]=1.12; 95% CI, 0.64–1.98; p=0.69) (Table 4).

Table 3 Incidence rate of first stroke, myocardial infraction, and hemorrhagic events among warfarin users and enriched subjects

Note: aAssessed by log-rank test.

Table 4 HRs for clinical events among warfarin users and enriched subjects

Notes: aThe HRs were adjusted for albumin (continuous) and CHADS2 (tertiles). These variables were found to effect the univariate estimate by >10% and were included in the multivariable adjustment. bThe HRs were estimated with propensity score adjustment. The following comorbidities were included in the propensity score model: age, sex, body mass index (continuous), fasting plasma glucose, albumin, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, total cholesterol, and CHADS2 (tertiles). cOverall event includes a combination of the following clinical events: hemorrhage, stroke, and myocardial infarction.

Abbreviations: CI, confidence intervals; aHR, adjusted hazard ratio; HR, hazard ratio; MI, myocardial infarction.

There were 24 first hemorrhagic events occurring after AF diagnosis. There was no difference in the incidence of hemorrhagic events between warfarin users and nonusers (incidence density: 2.8% vs 1.4%; p=0.12). In the bivariate analysis, there was no difference in hemorrhagic events between warfarin users and nonusers (HR=1.92; 95% CI, 0.83–4.2; p=0.13), and similar findings were found in the multivariable and propensity score models (propensity score model: aHR=1.50; 95% CI, 0.60–3.76; p=0.39).

When the clinical outcomes were combined as an overall event, patients on warfarin had a significantly higher risk of a first adverse clinical outcome (stroke, MI, or hemorrhage) as compared to the non-warfarin group (HR=1.71, 95% CI, 1.10–2.66; p=0.02). Following adjustment, patients on warfarin still showed a higher risk of an event, though this was not statistically significant (propensity score model: aHR, 1.22, 95% CI, 0.75–1.97; p=0.42).

Demographics of enriched population

An enriched patient population was identified among the 285 AF patients. Overall, enriched patients were older, more likely to be a warfarin user and at high risk of stroke (CHADS2≥2) (Table 5).

Table 5 Baseline and clinical characteristics of enriched and nonenriched subjects

Notes: aContinuous data are presented as median (IQR), unless otherwise noted, and categorical date are presented as n (%).

Abbreviations: ALT, alanine transaminase; AST, aspartate transaminase; HDL-C, high-density lipoprotein; IQR, interquartile range; LDL-C, low-density lipoprotein; SD, standard deviation.

Predictors of clinical outcomes identified within the enriched population

Table 6 gives the results for the modeling simulation described for the enriched population. The average modeling error across all patients is 14.8%. Thus, in the table when the added error δE is 0, the average error is 14.8%, and when δE increases the total error is added to it. For an MI output, the table shows albumin, tgRatH, and mygap as the top three predictive markers for all patients. However, notice that when only the enriched population is modeled, the albumin factor has a significantly lower added error than the other factors. For example, it is 4.5% for the all-patient solution and 3.2% for the enriched population. But the factor tgRatH jumps from 9.4% to 27.1% for the all-patient and enriched populations, respectively. This suggests that albumin is the overwhelming primary contributor to the detection of MI with the model and is the dominant predictive marker.

Table 6 Simulation results for myocardial Infarction, hemorrhage, and stroke outputs

Notes: Calculated factors: mymap = BPD + 1/3 (BPS – BPD). mygap = BPS – BPD. lipidRat = LDL-C/HDL-C. tgRatH = (LDL-C + TG)/HDL-C.

Abbreviations: ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; BPS, systolic blood pressure; FPG, fasting plasma glucose; HDL-C, high-density lipoprotein-cholesterol; LDL-C, low-density lipoprotein; SCr, serum creatinine.

Moreover, a similar pattern is evident for the stroke output. Albumin is 3.7% for both the all-patient and enriched population simulations, while added errors for mygap increase from 6.3% to 16.7%. Again, this suggests that the predictive ability of the model decreases when using non-albumin factors and that albumin is the primary predictive factor when determining the risk of stroke. The hemorrhage output shows some difference in the rank order of the predictability of the factors; however, many of the same factors appear in the list.

Stroke, myocardial infarction, and hemorrhage in enriched population

In the enriched population, following AF diagnosis, there were 31 first stroke or MIs. The incidence of stroke or MI in the enriched group was 8.2% compared to 3.6% in the nonenriched group (p=0.002) (Table 3). In the bivariate analysis, subjects identified as enriched had a two times higher risk of stroke or MI compared to the nonenriched subjects (HR=2.20; 95% CI, 1.33–3.64; p=0.002) (Table 4). This relationship attenuated in both the multivariable model (aHR=2.13; 95% CI, 1.16–3.94; p=0.02) and propensity score model (aHR=1.67; 95% CI, 0.97–2.87; p=0.06).

There were 13 first hemorrhagic events in the enriched group with a higher incidence of hemorrhage among enriched subjects vs nonenriched subjects (3.5% vs 1.3%; p=0.014). In the bivariate analysis, enriched subjects had an increased risk of hemorrhage as compared to the nonenriched (HR=2.65; 95% CI, 1.18–5.99; p=0.02). Following propensity score adjustment, the enriched group had a 2.25 times higher risk of a hemorrhagic event (aHR=2.25; 95% CI, 0.94–5.41; p=0.06).

When all adverse clinical events were evaluated collectively, subjects identified as enriched had an increased risk compared to the nonenriched (HR=2.32; 95% CI, 1.51–3.56; p<0.0001). Following adjustment in the multivariable and propensity score models, enriched subjects were at a significantly higher risk of an overall adverse clinical event (propensity score model: aHR=1.81; 95% CI, 1.15–2.28; p=0.01).


Treatment of AF with warfarin comes with numerous problems, which put patients at an increased risk of adverse clinical outcomes.10,16 Furthermore, interruptions in warfarin therapy, both short and long term, can increase a patient’s risk of stroke.13,17 We identified a cohort of AF patients and followed them over the course of their disease including those treated with and without warfarin. We found that patients on warfarin were still at an increased risk of adverse clinical outcomes including stroke, MI, and hemorrhage. This may be due to interruptions in warfarin therapy or poor adherence, but irrespective of the reason this shows us that AF patients are a unique population where the complex underlying disease progression alters the risk for adverse outcomes. Warfarin may not be appropriate for all subpopulations treated for AF. For this reason, we worked to develop a model that takes into account multiple time-dependent factors in order to identify an enriched patient at high risk of adverse clinical outcome. By identifying cases as enriched, these subjects may be candidates for an alternative treatment regimen that is less likely to show failure or adverse events.

Within our analysis we found that patients treated with warfarin were at an increased risk of poor clinical outcomes than those in the non-warfarin group. Even after multivariable and propensity score adjustments, patients were still at an increased level of overall risk, though this did not reach statistical significance. Previous literature has shown the benefit of warfarin in an AF population, and our findings are not trying to dispute this effect.8,9 However, we hypothesize that an AF population is more complex than originally believed. For that reason, we developed a neural net model to take into account numerous patient-specific factors and identify an “enriched” patient. Once the enriched subjects were identified, we were able to see significant differences between groups in terms of stroke, MI, hemorrhage, and overall risk of first clinical event. As patients are becoming more complex, identification of subjects through a more complex model may be the best way to identify those at high risk of poor clinical outcomes.

Our model identified individuals within our AF population as being enriched and at high risk of adverse clinical outcomes. Furthermore, the model was able to identify primary predictive factors to determine the risk of MI, stroke, and hemorrhage. We saw an increased incidence of any first clinical event (MI, stroke, or hemorrhage) within the enriched group. Following multiple adjustments, we still found the enriched group to be at a higher risk of adverse clinical outcomes. By identifying enriched subjects within the AF population we were able to improve our analysis and provide meaningful results. AF patients are composed of several different populations with an array of comorbidities and this makes an AF population unique. Among these patients, it is difficult to distinguish those at risk for poor outcomes and those at minimal risk using standard statistical techniques. We developed this MatLab model to take into account multiple time-dependent and -independent variables in order to better discriminate within an AF population.


This study has limitations. First, this was a retrospective design that used secondary data from a health-maintenance organization data source. We relied on ICD-9-CM coding to identify strokes, hemorrhages, and MIs, which is not equivalent to medical chart review or adjudicated case reports in clinical trials. Therefore, we are subject to possible misclassification. However, advantages of a health-maintenance database include multiple years of data with a diverse population, which includes demographics, health care claims, laboratory values, and medications. Second, the relatively stringent exclusion criteria limited our sample size, but, nonetheless, we were able to show a difference between the enriched and nonenriched groups. Third, the model had some sources of error because of specific iteration instructions, which may be amplified by attempts to include some factors only having yes or no values. Further studies are being planned to address this limitation.


We were able to develop a novel model to define an enriched group among AF patients and show enrichment to be associated with a higher incidence of adverse clinical outcomes. Our study indicates the importance in defining an enriched population as this was crucial in developing a meaningful analysis, as analysis after enrichment was highly discriminatory. In essence, enrichment improves the signal-to-noise relationship in a dataset, especially when dealing with low-frequency events as outcomes. This model could be used in other disease states in order to define patients at high risk of adverse outcomes. A potential next step in this work is co-modeling of clinical trial data with real-world data from electronic health records. This would permit a broader translation footprint to the traditional approach of collecting data in precisely defined randomized trials, yet needing to predict and understand the extension to real-world populations.


JJS has received research funding from Bristol-Myers Squibb and AGC is an employee of Bristol-Myers Squibb. DMJ, FS, and GW report no conflicts of interest in this work.



Chugh SS, Blackshear JL, Shen WK, Hammill SC, Gersh BJ. Epidemiology and natural history of atrial fibrillation: clinical implications. J Am Coll Cardiol. 2001;37(2):371–378.


Miyasaka Y, Barnes ME, Gersh BJ, et al. Secular trends in incidence of atrial fibrillation in Olmsted County, Minnesota, 1980 to 2000, and implications on the projections for future prevalence. Circulation. 2006;114(2):119–125.


Go AS, Mozaffarian D, Roger VL, et al. Heart disease and stroke statistics–2013 update: a report from the American Heart Association. Circulation. 2013;127(1):e6–e245.


Benjamin EJ, Wolf PA, D’Agostino RB, Silbershatz H, Kannel WB, Levy D. Impact of atrial fibrillation on the risk of death: the Framingham Heart Study. Circulation. 1998;98(10):946–952.


Conen D, Chae CU, Glynn RJ, et al. Risk of death and cardiovascular events in initially healthy women with new-onset atrial fibrillation. JAMA. 2011;305(20):2080–2087.


Jabre P, Roger VL, Murad MH, et al. Mortality associated with atrial fibrillation in patients with myocardial infarction: a systematic review and meta-analysis. Circulation. 2011;123(15):1587–1593.


Risk factors for stroke and efficacy of antithrombotic therapy in atrial fibrillation. Analysis of pooled data from five randomized controlled trials. Arch Intern Med. 1994;154(13):1449–1457.


Petersen P, Boysen G, Godtfredsen J, Andersen ED, Andersen B. Placebo-controlled, randomised trial of warfarin and aspirin for prevention of thromboembolic complications in chronic atrial fibrillation. The Copenhagen AFASAK study. Lancet. 1989;1(8631):175–179.


Ezekowitz MD, Bridgers SL, James KE, et al. Warfarin in the prevention of stroke associated with nonrheumatic atrial fibrillation. Veterans Affairs Stroke Prevention in Nonrheumatic Atrial Fibrillation Investigators. N Engl J Med. 1992;327(20):1406–1412.


Bristol-Myers Squibb. Coumadin (prescribing information). Available from: Accessed December 30, 2015.


Dean L. Warfarin Therapy and the Genotypes CYP2C9 and VKORC1; 2012 [updated June 8, 2016]. In: Pratt V, McLeod H, Dean L, et al., editors. Medical Genetics Summaries [Internet]. Bethesda, MD: National Center for Biotechnology Information, US; 2012.


Ogilvie IM, Newton N, Welner SA, Cowell W, Lip GY. Underuse of oral anticoagulants in atrial fibrillation: a systematic review. Am J Med. 2010;123(7):638–645. e634.


Ewen E, Zhang Z, Simon TA, Kolm P, Liu X, Weintraub WS. Patterns of warfarin use and subsequent outcomes in atrial fibrillation in primary care practices. Vasc Health Risk Manag. 2012;8:587–598.


January CT, Wann LS, Alpert JS, et al. 2014 AHA/ACC/HRS guideline for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the Heart Rhythm Society. J Am Coll Cardiol. 2014;64(21):e1–e76.


Gage BF, Waterman AD, Shannon W, Boechler M, Rich MW, Radford MJ. Validation of clinical classification schemes for predicting stroke: results from the National Registry of Atrial Fibrillation. JAMA. 2001;285(22):2864–2870.


Hart RG, Boop BS, Anderson DC. Oral anticoagulants and intracranial hemorrhage. Facts and hypotheses. Stroke. 1995;26(8):1471–1477.


Gage BF, Boechler M, Doggette AL, et al. Adverse outcomes and predictors of underuse of antithrombotic therapy in medicare beneficiaries with chronic atrial fibrillation. Stroke. 2000;31(4):822–827.

Creative Commons License This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.

Download Article [PDF]