Back to Journals » International Journal of Chronic Obstructive Pulmonary Disease » Volume 13

Development and validation of a predictive model to identify patients at risk of severe COPD exacerbations using administrative claims data

Authors Annavarapu S, Goldfarb S, Gelb M, Moretz C, Renda A, Kaila S

Received 1 November 2017

Accepted for publication 5 April 2018

Published 11 July 2018 Volume 2018:13 Pages 2121—2130


Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Richard Russell

Download Article [PDF] 

Srinivas Annavarapu,1 Seth Goldfarb,1 Melissa Gelb,2 Chad Moretz,1 Andrew Renda,3 Shuchita Kaila2

1Comprehensive Health Insights, Louisville, KY, 2Boehringer Ingelheim, Ridgefield, CT, 3Humana, Louisville, KY, USA

Background: Patients with COPD often experience severe exacerbations involving hospitalization, which accelerate lung function decline and reduce quality of life. This study aimed to develop and validate a predictive model to identify patients at risk of developing severe COPD exacerbations using administrative claims data, to facilitate appropriate disease management programs.
Methods: A predictive model was developed using a retrospective cohort of COPD patients aged 55–89 years identified between July 1, 2010 and June 30, 2013 using Humana’s claims data. The baseline period was 12 months postdiagnosis, and the prediction period covered months 12–24. Patients with and without severe exacerbations in the prediction period were compared to identify characteristics associated with severe COPD exacerbations. Models were developed using stepwise logistic regression, and a final model was chosen to optimize sensitivity, specificity, positive predictive value (PPV), and negative PV (NPV).
Results: Of 45,722 patients, 5,317 had severe exacerbations in the prediction period. Patients with severe exacerbations had significantly higher comorbidity burden, use of respiratory medications, and tobacco-cessation counseling compared to those without severe exacerbations in the baseline period. The predictive model included 29 variables that were significantly associated with severe exacerbations. The strongest predictors were prior severe exacerbations and higher Deyo–Charlson comorbidity score (OR 1.50 and 1.47, respectively). The best-performing predictive model had an area under the curve of 0.77. A receiver operating characteristic cutoff of 0.4 was chosen to optimize PPV, and the model had sensitivity of 17%, specificity of 98%, PPV of 48%, and NPV of 90%.
Conclusion: This study found that of every two patients identified by the predictive model to be at risk of severe exacerbation, one patient may have a severe exacerbation. Once at-risk patients are identified, appropriate maintenance medication, implementation of disease-management programs, and education may prevent future exacerbations.

Keywords: Medicare, observational study, COPD risk factors


COPD is a progressive disorder characterized by persistent airflow limitation to the lungs.1 Key symptoms of COPD include chronic and progressive dyspnea, cough, and sputum production.1 In the USA, COPD is estimated to affect approximately 27 million adults, of which 12 million remain undiagnosed.2 Chronic lower respiratory diseases, including COPD, are the third-leading cause of death in the USA.3 It poses a substantial economic burden: in the USA, the annual cost of COPD was estimated to be $36 billion in 2010, of which $32.1 billion was direct cost.4

Patients with COPD often experience exacerbations: worsening of the typical COPD symptoms.5 The American Thoracic Society and European Respiratory Society’s 2004 guidelines for the diagnosis and treatment of COPD defined a COPD exacerbation (hereafter referred to as “exacerbation”) as “an event in the natural course of the disease characterized by a change in the patient’s baseline dyspnea, cough and/or sputum beyond day-to-day variability sufficient to warrant a change in management”.6 Exacerbations accelerate the decline in lung function and lower quality of life.79 Exacerbation frequency is also considered to be an indicator of COPD stage, with higher frequency of exacerbations indicating more severe disease.1,10 Correspondingly, exacerbations impose a significant economic burden by accounting for 50%–75% of the total COPD burden.1,6 There were more than 1.2 million hospitalizations due to acute exacerbations of COPD in the USA in 2006, associated with costs of approximately $14 billion.11

Prevention, early detection, and prompt treatment of exacerbations are important to reduce this burden.1 Predictive models to identify individuals likely to have COPD have been developed previously.12,13 Similarly, observational and retrospective claim-based studies have attempted to identify factors associated with a risk of future exacerbations. These studies suggest that patients with a history of one or more exacerbations leading to hospitalizations have a high risk of future exacerbations.1,14 This study aims to develop and cross-validate a predictive model to identify patients likely to have severe COPD exacerbations using an administrative claims database. Administrative data collected by health plans include demographic information, health care claims, and encounter records. If patient characteristics predictive of severe COPD exacerbations (leading to a hospitalization) can be determined, it may enable the identification of patients at high risk of severe exacerbations. Once “high-risk” patients are identified, appropriate treatment with COPD maintenance medications and implementation of disease-management and education programs may help to prevent future exacerbations.15,16


Study design and data source

A noninterventional observational study was conducted using the Humana administrative claims database. This database contains integrated medical claims, pharmacy claims, and enrollment data, representing more than 12 million current and former Humana members enrolled in commercial, Medicare Advantage, and prescription drug plans. The data have national coverage, with a high proportion of people residing in Texas, Florida, and Ohio. For this study, Medicare Advantage and commercially insured populations were examined. Approval for this research was provided by Schulman IRB, Research Triangle Park, NC, USA.

Study population

Patients aged 55–89 years with COPD were identified during the study period (January 1, 2010 to June 30, 2015; Figure 1). Patients were considered to have COPD if they had two or more medical claims on distinct dates with a COPD diagnosis code, ie, International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) code of 491.xx (chronic bronchitis), 492.xx (emphysema), or 496.xx (COPD, unspecified) in the primary position.12 The second medical claim with COPD diagnosis was required to be within 90 days of the first claim. The date of the second medical claim with a COPD diagnosis code was termed “diagnosis date”. This date was required to occur during the identification period from July 1, 2010 to June 30, 2013. Patients with a diagnosis of malignant neoplasms (ICD-9-CM 140.xx-172.xx, 174.xx-209.3x, or 209.7x), cystic fibrosis (ICD-9-CM 277.0x), fibrosis due to tuberculosis (ICD-9-CM 011.4x), bronchiectasis (ICD-9-CM 494.xx), pneumoconiosis (ICD-9-CM 500.xx, 501.xx, 502.xx, 503.xx, 505.xx), pulmonary fibrosis (ICD-9-CM 516.3x, 515.xx), pulmonary tuberculosis (ICD-9-CM 011.xx), sarcoidosis (ICD-9-CM 135.xx), or asthma (ICD-9-CM 493.xx) during the study period were excluded. Patients were required to have a minimum of 2 years post- and 6 months pre-COPD diagnosis, continuous enrollment in Medicare Part D or commercial health plans. The index date was defined as 1 year after the diagnosis date. The 1-year period prior to the index date was the baseline period, and the year following the index date the prediction period (Figure 1).

Figure 1 Patient-selection timeline.
Note: Patients were required to be enrolled continuously for 6 months prediagnosis and 12 months postdiagnosis.

Exacerbations can be classified as severe and not severe.1,17 In the current study, severe exacerbations were identified using medical claims for inpatient hospitalizations with either a COPD diagnosis code in the primary position or a diagnosis code for acute exacerbation in primary position and COPD diagnosis code in secondary position or respiratory failure diagnosis code in primary position and COPD diagnosis code in secondary position. Occurrences of COPD exacerbations (severe and not severe) were separately evaluated in both the baseline and the prediction periods. Claim-based definitions were used to identify the COPD exacerbation type (Table 1). Based on the occurrence of severe COPD exacerbations in the prediction period, two cohorts were created: patients with a severe COPD exacerbation and patients without a severe COPD exacerbation.

Table 1 COPD-exacerbation definitions
Abbreviations: ER, emergency room; ICD-9-CM, International Classification of Diseases, Ninth Revision, Clinical Modification.

Patient characteristics

Patient characteristics that may have been associated with the occurrence of a severe COPD exacerbation in the prediction period were evaluated in the 1-year baseline period for both cohorts: baseline COPD exacerbations and demographic, clinical, and other resource-use-related characteristics. Demographic characteristics included age, sex, race/ethnicity, line of business (Medicare or commercial), and geographical location (Northeast, Midwest, South, or West). Clinical characteristics included measures of disease burden (presence of comorbidities and Deyo–Charlson Comorbidity Index score), COPD-medication use (long-acting bronchodilators, short-acting bronchodilators, inhaled corticosteroids, systemic corticosteroids, phosphodiesterase 4 inhibitors, methylxanthines, and respiratory antibiotics), oxygen-therapy use, smoking-cessation medication use, smoking-cessation counseling, influenza vaccination, and pneumococcal vaccination. All-cause and COPD-related resource use (hospitalizations, outpatient visits, and emergency-room [ER] visits) and month of exacerbation were also evaluated. Variable definitions are provided in the Supplementary materials; Tables S1–S12. Patient characteristics were compared between the two cohorts of interest, where applicable and necessary, using Student’s t-test and χ2 tests based on the nature of the variable.


Severe COPD exacerbation leading to a hospitalization during the prediction period was evaluated as the key study outcome for this study (Table 1). Patients with medical claims for more than one severe exacerbation were classified as having severe COPD exacerbations.

Model development

An analytic data set was assembled from Humana’s inpatient, outpatient, and pharmacy data and consisted of demographic, geographic, diagnostic, treatment, pharmacy, and utilization variables. The entire cross-sectional cohort was used to inform the predictive model. Each of the two study cohorts was randomly partitioned into two data sets: development data set and validation data set, with 50% of observations in each set.

Preliminary models were developed using the development data set. Stepwise logistic regression (SLR) was employed to predict the probability of severe COPD exacerbation as a function of one or more independent inputs. For each study population, preliminary parameters were identified. The “best” model, ie, the model with the highest area under the curve (AUC) value based on receiver operating characteristic (ROC) curves, was selected as the more accurate prediction tool (the best discriminating model will have the highest AUC). Multicollinearity was checked using the Pearson correlation coefficient (multicollinear factors could remain in the optimal model for discrimination purposes). Goodness-of-fit tests, such as deviance, Hosmer–Lemeshow, and log likelihood, were conducted to ensure model fit. The Wald test and CI were used to test the significance of the variables of the model.

The preliminary models were then applied to the validation data set comprising patients with and without severe exacerbations in the prediction period. Sensitivity, specificity, negative predictive value (NPV), and positive PV (PPV) were measured, and the model with the smallest validation error was deemed optimal and selected as the final model. Then, scoring was performed, and a cutoff point chosen to optimize PPV and number of predicted positive patients.


Sample characteristics

A total of 45,722 patients with COPD met the inclusion and exclusion criteria (Figure 2). Of these, 5,317 patients had experienced severe COPD exacerbations during the prediction period (Table 2). All patients were used to inform the predictive model. A comparison of the baseline demographic characteristics between patients experiencing and not experiencing severe COPD exacerbations in the prediction period revealed no statistically significant difference in age (Table 2). A higher proportion of patients experiencing severe exacerbations compared to those not experiencing exacerbations were male (41.8% vs 39.4%, P=0.0042). Lower proportions of patients experiencing exacerbations compared to those not experiencing exacerbations were white or black (67.9% vs 68.7% and 5.8% vs 8.0%, respectively; P<0.0001). Among patients experiencing exacerbations, a lower proportion resided in the South or West compared to those not experiencing exacerbations (66.9% vs 68.3% and 4.1% vs 6.0%, respectively; P<0.0001). Conversely, a higher proportion of patients experiencing exacerbations resided in the Northeast or Midwest compared to those not experiencing exacerbations (2.2% vs 1.8% and 26.7% vs 23.9%, respectively; P<0.0001). Lower proportions of patients experiencing exacerbations compared to those not experiencing exacerbations were dual-eligible or low-income-subsidy recipients (2.5% vs 3.9%, P<0.0001 and 5.3% vs 7.0%, P<0.0001, respectively). A higher proportion of patients experiencing exacerbations were enrolled in Medicare plans compared to those not experiencing exacerbations (98.4% vs 97.6%, P=0.0001).

Figure 2 Patient attrition.
Abbreviation: ICD-9, International Classification of Diseases, Ninth Revision.

Table 2 Baseline demographic characteristics of study population
Note: aStudent’s t-test; bWilcoxon rank sum; cχ2; ddenominators Medicare patients only.
Abbreviation: IQR, interquartile range.

Comparison of the baseline clinical characteristics (Table 3) revealed significantly higher baseline COPD exacerbations (57.9% vs 32.1%, P<0.0001), baseline severe COPD exacerbations (35.0% vs 14.3%, P<0.0001), Deyo–Charlson Comorbidity Index score (mean 4.2 vs 3.1, P<0.0001), COPD-medication use, oxygen-therapy use (52.2% vs 27.3%, P<0.0001), smoking-cessation medication use (3.2% vs 1.9%, P<0.0001), and smoking-cessation counseling (42.3% vs 29.2%, P<0.0001) among patients who experienced severe COPD exacerbations during the prediction period compared to those who did not. Most comorbidities (except obesity) were more frequently found in patients who experienced severe COPD exacerbations compared to those who did not (Table 3). There was no difference in pneumococcal vaccinations or influenza vaccinations between patients who experienced exacerbations and those who did not (Table 3).

Table 3 Baseline clinical characteristics of study population
Note: aStudent’s t-test; bWilcoxon rank sum; cχ2.
Abbreviation: IQR, interquartile range.

Patients who experienced severe COPD exacerbations were more likely to have all-cause hospitalizations (57.5% vs 35.5%, P<0.0001), all-cause ER visits (51.6% vs 38.8%, P<0.0001), COPD-related resource use (88.4% vs 72.9%, P<0.0001), COPD-related outpatient visits (71.5% vs 57.0%, P<0.0001), COPD-related hospitalizations (51.6% vs 25.5%, P<0.0001), and COPD-related ER visits (34.9% vs 19.6%, P<0.0001) compared to those who did not (Table 4). There was no difference in all-cause resource use or all-cause outpatient visits (Table 4).

Table 4 Baseline health care-resource utilization of study population
Note: aStudent’s t-test; bχ2.

Predictive model

This was a complete-case analysis where 21% of cases had a race classified as unknown. The cohort was split equally between development and validation data sets. The AUC of the best-conforming SLR model was 0.77. A cutoff value of 0.04 was chosen to maximize PPV without sacrificing sensitivity and specificity. Performance parameters for this model were sensitivity 17.3% (95% CI 15.84%–18.75%), specificity 97.5% (95% CI 97.32%–97.75%), PPV 48.1% (95% CI 45.07%–51.07%), and NPV 90.0% (95% CI 89.80%–90.11%). Odds ratios (ORs) and 95% CIs for individual parameters in the SLR predictive model are provided in Table 5. The complete set of model parameters is provided in Table S13. After adjustment for covariates, the strongest predictors of severe COPD exacerbations were history of severe exacerbations during baseline (OR 1.498, 95% CI 1.365–1.645), Deyo–Charlson comorbidity score (OR 1.471, 95% CI 1.429–1.515), COPD-related inpatient stays during baseline period (OR 1.389, 95% CI 1.263–1.529), and oxygen use in baseline period (OR 1.376, 95% CI 1.312–1.442). The study population was categorized by risk score for COPD exacerbations. The risk categories with the highest proportions of patients were 0.10−<0.15 (19.1%), 0.15−<0.20 (15.9%), and 0.05−<0.10 (12.3%).

Table 5 Predictive model: stepwise logistic regression


This study describes a predictive model to identify patients with COPD at risk of severe exacerbations using administrative claims data. The optimal model selected had a PPV of 48.1%, implying that for every two patients identified by the model as being at risk of exacerbations, approximately one will have an exacerbation. The model had sensitivity of 17.3%, specificity of 97.5%, and NPV of 90.0%. The predictor with the strongest association with severe COPD exacerbations was history of severe exacerbations during baseline. This confirms the finding from studies by Hurst et al10 and Santibáñez et al14 that suggested a history of exacerbations is the best predictor of future predictions. The other predictors of severe exacerbations in our study were Deyo–Charlson Comorbidity Index score and COPD-related inpatient stays during baseline period, including oxygen use in baseline period. These predictors are representative of increased COPD severity, which has been suggested to be associated with increased exacerbation frequency.10,14 Similarly, Santibáñez et al found that comorbid heart failure, atrial fibrillation, any severe heart disease, diabetes, and lung cancer were significantly associated with exacerbations leading to hospitalizations.14 However, the current study found a significant association between presence of select comorbidities, including chronic kidney disease, type 2 diabetes mellitus, and cerebrovascular disease, in the baseline period and lower risk of severe exacerbations (OR 0.70, 0.76, and 0.86, respectively). Patients with chronic comorbidities may visit their providers more frequently, resulting in improved diagnosis and management of COPD and fewer exacerbations.

This study describes a predictive model for severe COPD exacerbations utilizing administrative claims data in a primarily US Medicare population. Some studies have developed predictive models to identify patients with undiagnosed COPD using administrative claims.12,13 A COPD-predictive model described by Mapel et al12 had a PPV of 23% and NPV of 95.4%, while a model developed by Moretz et al13 had a PPV of 73% and NPV of 66%. The current model had a PPV of 48.1%, between values reported by Mapel et al and Moretz et al, suggesting that approximately one in two patients identified will have a severe COPD exacerbation. The high PPV and NPV of the current model suggest that patients can be identified with a high level of accuracy as likely or not likely to have severe exacerbations. Previous studies have shown efficacy of self-management action plans in improving exercise capacity and reduction of exacerbation duration and hospitalizations.18,19 Individuals determined to be at high risk by the current model can be targeted for similar clinical communications or directed to their primary-care physicians for further evaluation.

The COPD exacerbation-predictive model may enable early identification of patients at risk of developing severe COPD exacerbations, which will allow Humana and other health insurers to target clinical interventions, such as messaging, self-care, and disease-management programs to optimize treatment and control disease progression. Early intervention and treatment are expected to reduce morbidity and mortality and improve quality of life.


The following limitations should be considered when interpreting the results of this study. The results of this study are based on administrative claims data from a large national health plan. Retrospective database studies using administrative claims are prone to coding errors of omission and commission and incomplete claim information. The Humana claims lack some clinical parameters, such as smoking status and COPD severity, that could influence COPD exacerbations. COPD diagnosis was determined using claims with a COPD-diagnosis code. This operational classification may have resulted in misclassification in some cases, since airflow testing (eg, forced expiratory volume in 1 second) results were not available to confirm COPD diagnosis.

Predictive models developed as part of this study may have limited generalizability outside the Medicare population. However, approximately 5.3 million patients with COPD receive Medicare benefits.20,21 The predictive models may not perform as well in other clinical settings, when the available data are substantially different than the medical, pharmacy, and enrollment data used to develop these models. Furthermore, predictive models developed as part of this study were rather complex: 103 different variables were assessed, of which 19 were included in the final model. Clinician perception of the utility of the predictive models and uptake may be enhanced if the models were limited to a smaller number of variables that have a large impact on the results.


This study describes a predictive model to identify patients at risk of severe COPD exacerbations. Of every two patients identified by the model to be at risk of severe exacerbations, one may have a severe exacerbation. This model may provide an efficient method of using claims data to identify patients with COPD who are at risk of future severe exacerbations. Once at-risk patients are identified, targeted and timely support may be provided to improve lung function and quality of life and reduce risk of exacerbations. Disease management and education programs, such as pharmacologic interventions, transition-of-care programs, and smoking-cessation counseling, may be implemented to prevent future exacerbations.


This study was funded by Boehringer Ingelheim. This paper was presented at the AMCP Managed Care and Specialty Pharmacy Annual Meeting, Denver, Colorado, March 27–30, 2017 and was published in the Journal of Managed Care and Specialty Pharmacy in March 2017.


MG and SK are employees of Boehringer Ingelheim. SA and SG are employees of Comprehensive Health Insights, which conducted the study. CM was an employee of Comprehensive Health Insights at the time of this study and is now employed by GlaxoSmithKline. AR is an employee of Humana and provided project consultation. The authors report no other conflicts of interest in this work.



Global Initiative for Chronic Obstructive Lung Disease. Global Strategy for the Diagnosis, Management, and Prevention of COPD. Bethesda (MD): GOLD; 2015.


National Heart, Lung, and Blood Institute. Morbidity and mortality: 2012 chart book on cardiovascular, lung, and blood diseases. 2012. Available from: Accessed September 29, 2015.


Hoyert DL, Xu J. Deaths: preliminary data for 2011. Natl Vital Stat Rep. 2012;61(6):1–51.


Ford ES, Murphy LB, Khavjou O, Giles WH, Holt JB, Croft JB. Total and state-specific medical and absenteeism costs of COPD among adults aged ≥18 years in the United States for 2010 and projections through 2020. Chest. 2015;147(1):31–45.


American Thoracic Society. Exacerbation of COPD. 2014. Available from: Accessed April 30, 2018.


American Thoracic Society, European Respiratory Society. Standards for the diagnosis and management of patients with COPD. 2004. Available from: Accessed June 04, 2018.


Halpin DM, Decramer M, Celli B, Kesten S, Liu D, Tashkin DP. Exacerbation frequency and course of COPD. Int J Chron Obstruct Pulmon Dis. 2012;7:653–661.


Makris D, Moschandreas J, Damianaki A, et al. Exacerbations and lung function decline in COPD: new insights in current and ex-smokers. Respir Med. 2007;101(6):1305–1312.


Hoogendoorn M, Feenstra TL, Hoogenveen RT, Al M, Mölken MR. Association between lung function and exacerbation frequency in patients with COPD. Int J Chron Obstruct Pulmon Dis. 2010;5:435–444.


Hurst JR, Vestbo J, Anzueto A, et al. Susceptibility to exacerbation in chronic obstructive pulmonary disease. N Engl J Med. 2010;363(12):1128–1138.


Perera PN, Armstrong EP, Sherrill DL, Skrepnek GH. Acute exacerbations of COPD in the United States: inpatient burden and predictors of costs and mortality. COPD. 2012;9(2):131–141.


Mapel DW, Frost FJ, Hurley JS, et al. An algorithm for the identification of undiagnosed COPD cases using administrative claims data. J Manag Care Pharm. 2006;12(6):457–465.


Moretz C, Zhou Y, Dhamane AD, et al. Development and validation of a predictive model to identify individuals likely to have undiagnosed chronic obstructive pulmonary disease using an administrative claims database. J Manag Care Pharm. 2015;21(12):1149–1159.


Santibáñez M, Garrastazu R, Ruiz-Nuñez M, et al. Predictors of hospitalized exacerbations and mortality in chronic obstructive pulmonary disease. PLoS One. 2016;11(6):e0158727.


Spiriva [prescribing information]. 2014. Available from: Accessed April 30, 2018.


Advair Diskus [prescribing information]. 2014. Available from: Accessed April 30, 2018.


Dhamane AD, Moretz C, Zhou Y, et al. COPD exacerbation frequency and its association with health care resource utilization and costs. Int J Chron Obstruct Pulmon Dis. 2015;10:2609–2618.


Effing T, Zielhuis G, Kerstjens H, van der Valk P, van der Palen J. Community based physiotherapeutic exercise in COPD self-management: a randomised controlled trial. Respir Med. 2011;105(3):418–426.


Lenferink A, Frith P, van der Valk P, et al. A self-management approach using self-initiated action plans for symptoms with ongoing nurse support in patients with chronic obstructive pulmonary disease (COPD) and comorbidities: the COPE-III study protocol. Contemp Clin Trials. 2013;36(1):81–89.


National Public Radio. Many COPD patients struggle to pay for each breath. 2017. Available from: Accessed April 30, 2018.


Alliance for Retired Americans. Social Security and Medicare current facts and figures. 2016. Available from: Accessed April 30, 2018.

Creative Commons License This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.

Download Article [PDF]