External Validation Of The Updated ADO Score In COPD Patients From The Birmingham COPD Cohort

Background Reviews suggest that the ADO score is the most discriminatory prognostic score for predicting mortality among chronic obstructive pulmonary disease (COPD) patients, but a full evaluation and external validation within primary care settings is critical before implementation. Objectives To validate the ADO score in prevalent and screen-detected primary care COPD cases at 3 years and at shorter time periods. Patients and methods One thousand eight hundred and ninety-two COPD cases were recruited between 2012 and 2014 from 71 United Kingdom general practices as part of the Birmingham COPD Cohort study. Cases were either on the practice COPD register or screen-detected. We validated the ADO score for predicting 3-year mortality with 1-year and 2-year mortality as secondary endpoints using discrimination (area-under-the-curve (AUC)) and calibration plots. Results One hundred and fifty-four deaths occurred within 3 years. The ADO score was discriminatory for predicting 3-year mortality (AUC= 0.74; 95% CI: 0.69–0.79). Similar performance was found for 1- (AUC= 0.73; 0.66–0.80) and 2-year mortality (0.72; 0.67–0.76). The ADO score showed reasonable calibration for predicting 3-year mortality (calibration slope 0.95; 0.70–1.19) but over-predicted in cases with higher predicted risks of mortality at 1 (0.79; 0.45–1.13) and 2-year (0.79; 0.57–1.01) mortality. Discussion The ADO score showed promising discrimination in predicting 3-year mortality in a primary care population including screen-detected cases. It may need to be recalibrated if it is used to provide risk predictions for 1- or 2-year mortality since, in these time-periods, over-prediction was evident, especially in cases with higher predicted mortality risks.


Plain Language Summary
Prediction models are tools that can be used to provide estimates of likely outcomes, such as death, over a specified time period in individual patients. This information can then be used to inform treatment decisions. For example, the intensity of treatment (or monitoring) may be increased for those with higher individual risks. These tools are usually developed using data from one group of people. However, because other groups of people may have different characteristics, the accuracy of the tool needs to be checked in these other groups. The ADO (age, dyspnoea (i.e. breathlessness), and obstructed airways) score was developed to predict death within 3 years in people with COPD. Our aim was to check whether the ADO score is accurate in predicting the risk of death in a group of people with COPD identified in general practices in the UK. We also wanted to determine whether it was accurate for predicting the risk of death at time periods shorter than 3 years. Previous studies have shown that the ADO score distinguishes well between likelihood of being dead or alive (i.e. the discrimination of a model). In our sample of people with newly diagnosed and existing COPD in primary care, we confirmed these results. However, previous studies have not properly assessed the degree of agreement between the expected and observed individual risk of death (i.e. the calibration of a model). It is essential to report calibration in prognostic models because it tells you how accurate mortality predictions are likely to be for individual with a particular disease. We found that the ADO score over-predicts individual risk of death for periods <3 years. Unless adjusted, this reduces its usefulness for clinical decision-making. In addition, this has implications for other COPD prognostic scores that have been tested and used at shorter time periods than they were developed for.

Introduction
Chronic obstructive pulmonary disease (COPD) is the third leading cause of mortality worldwide. 1,2 Prognostic scores to predict mortality risk in people with COPD are useful in order to assess disease severity, define intervention options, and facilitate consultations with patients about their prognosis. 3 Knowledge of the risk of mortality also allows the benefits of treatments for COPD to be weighed against potential harms, such as side effects, costs, and inconvenience 2 in order to enable informed clinical decision-making. The extent of airflow obstruction, usually assessed by forced expiratory volume in the first second (FEV 1 ), has long been recognised as an important measure of prognosis and is used for disease staging. 2 However, the complex and multifaceted nature of COPD 4,5 has led to the identification of other important predictors of mortality and the recognition that combining these in multicomponent indices 6-10 improves prognostic ability. However, before implementation in clinical practice, it is important to evaluate the predictive ability of the prognostic index in different populations. 3 There are two important aspects to such evaluation, including assessment of how well the index can differentiate between those who die and those who remain alive (i.e. discrimination) and the extent of agreement between predicted and observed mortality (i.e. calibration). The latter is particularly important for prognostication. 11 Amongst prognostic indices, the ADO (age, dyspnoea, airflow obstruction) score has wide applicability as it is made up of only three easily measured components, 9 overcoming the limitation of many other indices. 12 The original ADO score was developed in 2009 9 to predict 3-year mortality in patients with moderate-to-severe COPD from secondary care and was updated in 2012 in an international cohort from a variety of healthcare settings to improve its generalisability. 13 The updated ADO has been externally validated several times. [13][14][15][16] However, only two validation studies were in primary care populations, 14,16 where most people with COPD are cared for. 17 In one of these studies, calibration was not assessed. 14 The other study only considered 2-year mortality as the outcome and adjusted the intercept of the ADO score. 16 A further two studies used populations across primary, secondary and tertiary settings. 13,15 However, no analyses were undertaken to assess the differential performance of the ADO score in each setting.
Our aim was to validate the updated ADO score in COPD cases from a large primary care research cohort (the Birmingham COPD cohort) which included both previously and newly diagnosed cases and where dyspnoea and lung function were measured under standardized conditions.

Methods
This paper was written in accordance with the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis statement. 18 Design External validation study of a published prognostic score.

Source And Study Population
The characteristics of the Birmingham COPD cohort, which is part of the Birmingham Lung Improvement Studies (BLISS), have been summarized in a previous publication. 19 Briefly, COPD cases were recruited from 71 UK general practices across the West Midlands, United Kingdom. For this analysis, cohort cases with diagnosed COPD (aged 40 and over) on practice Quality and Outcomes Framework COPD registers (i.e. prevalent cases) and those with newly detected COPD identified through a case-finding trial (i.e. incident cases were screen-detected) 20 were included. The definition of COPD in incident cases was based on reporting of relevant symptoms in those with airflow obstruction (forced expiratory volume in the first second (FEV 1 )/forced vital capacity (FVC) <0.7 according to recommendations in UK guidelines). Baseline assessments took place at cohort entry from 31 May 2012 to 25 June 2014.

Exposure And Outcome Measurements
The ADO score (0-14) was computed from three variables taken at baseline: age, dyspnoea (modified MRC score), and obstruction (FEV 1 % predicted). Age was calculated from patient-reported date of birth, and dyspnoea was assessed by a questionnaire using the British Medical Research Council guidelines. 21 A researcher trained to international standards to measure FEV 1 administered the nddEasy One Spirometer (ndd, Switzerland) before (max eight blows) and after (max six blows) 400µg salbutamol, aiming for three blows within 100 mL. FEV 1 and FVC recording were considered useable if they met ATS acceptability criteria and were within 200 mL. The highest recording was then taken. 19 Quality assurance was maintained using real-time quality assessment, with over-reading of spirometry measurements. FEV 1 % predicted was estimated using the Global Lung Function Initiative equations. 22 Linked mortality data were obtained through the Office of National Statistics for the period of recruitment until 31 March 2016 through NHS Digital. 23 Other patient characteristics including ethnicity, level of deprivation (using Index of Multiple Deprivation derived from home postcode), smoking status, quality of life, and medical history (including self-reported comorbidities and previous exacerbations) were obtained by patient self-report through standardized questionnaires. Body mass index (BMI derived from height and weight measurements) and exercise capacity (using sit-to-stand test) were obtained by trained researchers using standardised protocols at the baseline visit. 19

Patient Selection Criteria
The ADO score was developed for participants 40 years and older. Missing baseline mMRC scores or FEV 1 % predicted observations were imputed using multiple imputation (MI) by chained equations so that all remaining incident and prevalent cases (N= 1892) could be included in the final analyses (baseline tables show data prior to imputation). Additional auxiliary variables (cardiovascular disease history, cardiovascular disease medication, chronic cough, chronic phlegm, ethnicity, and gender) were used to aid the imputation. The number of imputed datasets used was based on the fraction of missing data for all variables (11%, so 11 MI datasets were used). 24 Death data were complete for all cases under the assumption that patients without a date of death remained alive.

Analysis
Baseline characteristics were compared between prevalent and incident cases as well as between those who died within 3 years of study entry compared to those who did not. Chi-square and Student's t-tests were used for categorical and continuous variables, respectively.
The updated ADO score regression coefficients and intercept 13 were used to compute the predicted probability of 3-year mortality for each eligible cohort participant (Supplementary Table 1). To assess discrimination, areaunder-the-curve (AUC) was estimated with a 95% confidence interval (95% CI) and plotted using AUC-ROC plots. 25 Calibration was assessed by comparing the predicted probability to the observed probability of mortality and examined with a calibration plot and calibration slope with 95% CI. Calibration plots (STATA function: pmcalplot) displayed observed risk by deciles of the predicted risk and also examined risk at the individual level using Locally Weighted Scatterplot Smoothing algorithms. 26 An estimate of the Calibration-in-the-large (CITL) was used to indicate whether the predictions were systematically too high or too low. 26 As MI datasets were used, the AUC and calibration slope were estimated in each individual MI dataset, before Rubin's rule was used to combine estimates. 27 A Kaplan-Meier plot was created according to ADO score group (0 to 5, 6 and 7, 8 and 9, and 10 to 14). Scores were grouped based on the number of patients. Separation of Kaplan-Meier curves for ADO score groups indicates better discriminative performance.
In secondary analyses (using the same discrimination and calibration methods as above), we evaluated the ability of the ADO index to predict mortality at 1 and 2 years. The period end dates for each case were 1, 2, and 3 years after study entry. If the end date for the period fell after the 31 March 2016, then the case was excluded from that period. Period exclusions were made regardless of whether and at what time cases died to ensure that dead and alive cases were treated the same. However, a sensitivity analysis was performed by re-introducing cases that died within a certain period despite a period end date that fell after the 31 March 2016. Two additional sensitivity analyses were conducted: 1) We estimated the discrimination and calibration estimates for prevalent cases alone and 2) for complete cases (non-missing obstruction and dyspnoea). Prevalent cases were studied alone because the accuracy of the ADO score may be affected by the inclusion of screen-detected cases (which might not reflect usual primary care populations). All analyses were undertaken using STATA (StataCorp, College Station TX, USA).

Results
Out of 1894 cases in the cohort, two were younger than 40 years of age at baseline, 111 (5.9%) had missing mMRC score, and 102 (5.4%) had missing FEV 1 % predicted values (22 (1.2%)) were missing both ( Figure 1). Before imputing missing mMRC and FEV 1 % predicted, there were 1392 prevalent and 309 incident cases (total 1701). The median observation time was 3.02 years (minimum 1.77 and maximum 3.83 years). The average age was 68.4 years old and 651 (38.3%) cases were female. The majority (79.5%) had mild-to-moderate airflow obstruction (50.6% with GOLD stage II) and the mean ADO score at baseline was 7.0 (SD 2.4). One hundred and twenty-four (7.3%) deaths occurred within 3 years of observation time, 116 (94%) of which occurred in the prevalent cases.
When compared to incident cases, prevalent cases tended to have a worse baseline ADO score (older age, more severe obstruction and worse breathlessness), lower exercise capacity scores, more comorbidities, were more likely to report a worse health-related quality of life score, had more former smokers, and were more likely to report a respiratory hospitalisation and at least one exacerbation in the previous 12 months (Table 1). Table 2 shows a comparison of characteristics of cases according to whether or not they were alive within 3 years of observation time. Those who died were older and had more severe obstruction and dyspnoea (all P<0.001) which resulted in a higher baseline ADO score (mean (SD) score 8.98 (2.14)) compared to those who remained alive (6.85 (2.39)). They were also less likely to be female, had poorer exercise capacity, lower BMI, were more likely to have a severe impact of COPD on health-related quality of life, were more likely to have cardiovascular comorbidity, and were more likely to report respiratory hospitalisation in the 12 months before baseline compared to those who remained alive.   Figure 2 shows a Kaplan-Meier plot of the survival of cases according to their ADO score at baseline. The survival curves are well separated which indicates good discrimination. Cases with an ADO score of 10 or higher had nearly 12 times the rate of death when compared to patients with an ADO of 0 to 5. Figure 3 shows AUC and calibration plots for prevalent and incident cases. One thousand eight hundred and ninety-two cases were available after imputing missing mMRC and FEV 1 % predicted observations which added 30 more deaths (total equal to 154 deaths) within 3 years of observation time (Figure 1). For 3-year mortality    Table 2). In the complete cases, the calibration slope was decreased to 0.73 at 1year mortality when compared to the analysis that included all cases. At 3-year mortality, calibration slope increased to 1.08 while discrimination increased to 0.77.

Discussion
In this external validation study in a primary care COPD population which included screen-detected and prevalent cases, we found that the updated ADO score 13 was discriminatory with an AUC of 0.74 for predicting 3year mortality. Discrimination remained stable when predicting 1-and 2-year mortality. However, we found that the ADO score tended to over-predict mortality, especially among the few patients with higher predicted risks of mortality at 1-and 2-year time frames.
Our finding of an AUC of 0.74 is lower than the development model (AUC= 0.85) 13 but consistent with estimates from two other studies that validated the ADO score for predicting 3-year mortality, one in primary care (AUC= 0.724, 95% CI: 0.719-0.730; mean FEV 1 % predicted of participants: 59.5) 14 and the other across multiple healthcare settings (AUC= 0.73, 95% CI: 0.70-0.76; FEV 1 % predicted 65.9). 13 However, a third study used a network meta-analysis to pool data on patients across many healthcare settings and found that the discriminative performance of the ADO score was below 0.70 but still better than nine other prognostic scores. 15 Our findings are consistent with the results of one primary care study for 1-year (AUC= 0.720; 95% CI: 0.710-0.729) and 2-year (AUC= 0.725; 95% CI: 0.718-0.731) mortality, 14 but slightly less accurate than a second study for 2-year (AUC= 0.78; 95% CI: 0.71-0.84) 16 mortality since the upper CI of our 2-year AUC estimate was slightly lower than 0.78.
Accurate calibration is particularly important for evaluating prognostic models because predicted and observed risk need to closely match for predictions to be clinically useful. 11 This is the first study that reports the calibration slope of the ADO score when predicting 3-year mortality. In addition to 3-year mortality, predictions using shorter time frames are important because clinicians rely on multicomponent prediction models to identify patients nearing the end of life who may benefit from palliative care. 29 No other studies have assessed calibration for shorter time periods without adjusting the model. We have shown that over-prediction was more pronounced in patients with higher predicted risks of mortality for these time periods. Thus, our findings suggest that recalibration, for example, by using statistical shrinkage techniques, 30 is needed, in order for the ADO score to better predict mortality over a shorter time frame.
Our study overcomes several limitations found in previous validation studies. For example, we used recommended statistical approaches for predicting mortality in a validation study. 26 Using a research dataset, such as the Birmingham COPD cohort, had the advantage of more accurate and higher quality measurements at prescribed time points, particularly for spirometry. On the other hand, the Birmingham COPD cohort is not completely representative of all primary care patients with COPD. Ethnic diversity was limited. Additionally, patients needed to be mobile to take part in the cohort study and, therefore, patients with more severe disease who were housebound were more likely to be excluded. Since we used a fixed ratio (based on UK guideline recommendations) instead of a lower limit of normal of FEV1/FVC to define COPD, overdiagnosis may have occurred in older patients. 31 Figure 3 Area-under-the-curve and calibration plots comparing observed and predicted mortality for incident and prevalent cases.
However, the ADO score was developed in a population where COPD was defined using the fixed ratio 9 and using the lower limit of normal could lead to underdiagnosis compared to expert opinion. 32 Furthermore, in a study of 24,207 US adults from 4 cohorts, COPD-related hospitalization and mortality were not significantly different when using the fixed ratio of FEV1/FVC < 0.70 compared to the lower limit of normal to define COPD. 33 This indicates that our results would not be very different if we had used a lower limit of normal to define our cohort. We included screen-detected COPD cases who, predictably, had very few deaths. However, other studies have not included screen-detected cases despite at least 50% of the COPD population remaining undiagnosed worldwide. 34 It is important to assess the validity of prognostic indices to predict mortality in this population to inform treatment decisions. Finally, a very small number of deceased patients may have had delayed death registration due to a variety of reasons such as suspicious, unexpected, or accidental deaths. 35 In addition to the loss of power (i.e. fewer deaths), if patients were considered alive when they were truly dead, then this would result in weaker prognostic accuracy.

Conclusion
It is well-known that prognostic scores are rarely used in clinical practice for managing people with COPD, especially in primary care. 3 Although the ADO score is attractive because of its accurate discriminative ability and ease of measurement and calculation in a primary care setting recalibration is needed to improve risk prediction for shorter time frames. Currently, when predicting 1-and 2-year mortality, the ADO score may not be accurate in primary care populations because over-prediction was evident, especially in those with higher predicted risks of mortality and people with COPD may be given treatment that is not needed as a result.

Ethics Approval
The cohort received ethical approval from the National Research Ethics Service Committee West Midlands, Solihull (ref.: 11/WM/0304) Data Availability STATA code used for data manipulation and analyses can be provided upon request.

Disclaimer
The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. The Birmingham COPD Cohort study is part of The Birmingham Lung Improvement StudieS -BLISS.

Funding
This work was supported by the National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (grant number: RP-PG-0109-10061). NIHR had no involvement in the study design, the data collection, analysis, interpretation, writing of the manuscript, or in the decision to submit the article for publication.

Disclosure
FV currently supervises two Ph.D. students who are employed with F. Hoffmann La Roche Ltd. Basel, Switzerland. He has not received any reimbursements for this and the Ph.D. topics are not related to this paper. FMEF reports personal fees from AstraZeneca, personal fees from Boehringer Ingelheim, personal fees from Chiesi, personal fees from GlaxoSmithKline, grants and personal fees from Novartis, grants and personal fees from MedImmune, personal fees from TEVA, outside the submitted work. AS reports grants from Astra Zeneca, outside the submitted work, grants from NIHR, during the conduct of the study. AMT reports grants from Linde REAL fund, grants from Alpha 1 Foundation, non-financial support from GSK, personal fees and non-financial support from Boehringer Ingelheim, grants, personal fees and non-financial support from Chiesi, grants, personal fees and non-financial support from AstraZeneca, grants from Grifols Biotherapeutics, outside of submitted work, personal fees from CSL Behring, personal fees from Pfizer, during the conduct of the study. PA, DF, AS, and REJ hold a grant from NIHR (Programme Grant, 2010-2018) that supported the development of the Birmingham COPD Cohort analysed in the submitted paper. PA holds other NIHR grants and is Deputy chair of the NIHR PHR Funding committee. The authors report no other conflicts of interest in this work.