Validity of COPD diagnoses reported through nationwide health insurance systems in the People’s Republic of China

Background COPD is the fourth leading cause of death worldwide, with particularly high rates in the People’s Republic of China, even among never smokers. Large population-based cohort studies should allow for reliable assessment of the determinants of diseases, which is dependent on the quality of disease diagnoses. We assessed the validity of COPD diagnoses collected through electronic health records in the People’s Republic of China. Methods The CKB study recruited 0.5 million adults aged 30–79 years from ten diverse regions in the People’s Republic of China during the period 2004–2008. During 7 years of follow-up, 11,800 COPD cases were identified by linkage with mortality registries and the national health insurance system. We randomly selected ~10% of the reported COPD cases and then undertook an independent adjudication of retrieved hospital medical records in 1,069 cases. Results Overall, these 1,069 cases were accrued over a 9-year period (2004–2013) involving 153 hospitals across ten regions. A diagnosis of COPD was confirmed in 911 (85%) cases, corresponding to a positive predictive value of 85% (95% confidence interval [CI]: 83%–87%), even though spirometry testing was not widely used (14%) in routine hospital care. The positive predictive value for COPD did not vary significantly by hospital ranking or calendar period, but was higher in men than women (89% vs 79%), at age ≥70 years than in younger people (88%, 95% CI: 85%–91%), and when the cases were reported from both death registry and health insurance systems (97%, 95% CI: 94%–100%). Among the remaining cases, 87 (8.1%) had other respiratory diseases (chiefly pneumonia and asthma; n=85) and 71 (6.6%) cases showed no evidence of any respiratory disease on their clinical records. Conclusion In the People’s Republic of China, COPD diagnoses obtained from electronic health records are of good quality and suitable for large population-based studies and do not warrant systematic adjudication of all the reported cases.


Introduction
COPD is the fourth leading cause of death worldwide. 1 In the People's Republic of China, COPD is the third leading cause of mortality and morbidity after cerebrovascular and ischemic heart diseases, but the disease rates vary substantially between different regions. 2 Although the Global Initiative for Chronic Obstructive Lung Disease (GOLD) criteria for COPD, 3 an international endeavor designed to indicate best clinical practice, 4 has recently been adopted in the People's Republic of China, adherence to GOLD guidelines may be suboptimal due to established patterns of clinical practice and unequal distribution of health resources in the People's Republic of China. 4 For example, spirometry, now required to diagnose COPD, 3 is carried out in less than one-third of COPD cases in the People's Republic of China and is rarely available in rural areas. 5,6 Such variations in clinical practice may hamper the validity of COPD diagnoses obtained from routine clinical care, which may adversely affect observational analyses of COPD in large cohort studies such as the China Kadoorie Biobank (CKB) study. 7 Population-based prospective cohort studies are essential to investigate the relevance of lifestyle, environmental, and genetic factors for a wide range of disease outcomes. To enable efficient and cost-effective collection and ascertainment of large number of disease outcomes, many studies are increasingly using routinely collected electronic medical records during follow-up. However, the quality of such disease outcome data may vary greatly between different countries, and between different settings within the same countries, and hence need to be carefully assessed, perhaps through independent review and adjudication of a sample of the reported disease cases to inform strategies for analyses. [8][9][10] In many of the previous studies, the validity of COPD cases has rarely been assessed properly, and even when they did so, it was often assessed through questionnaire, self-reported diagnoses, or medical records from primary care rather than hospital settings.
The CKB is a nationwide prospective cohort study of 0.5 million adults from ten diverse Chinese regions, 11 in which incident cases of disease outcomes (including COPD) are collected periodically through death registries, disease registries, and a newly established national health insurance (HI) system. The aims of the present report were: 1) to examine the validity of incident cases of COPD in a subset of the reported cases and 2) to identify clinical, socioeconomic, and health care system-related factors that may affect the validity of diagnosis of COPD.

study design
Details of the CKB study design, procedures, and study participants have been previously described. 7,11 Briefly, the baseline survey was conducted in ten geographical regions ( Figure S1) chosen to include a range of behavioral, lifestyle, and environmental risk factors and disease patterns. In each region, temporary assessment clinics were set up within various local residential centers during the period 2004-2008. Individuals aged 35-74 years from 100 to 150 administrative units (rural villages or urban residential committees) in each region were invited to attend the survey clinics. Approximately, 30% responded and a total of 512,891 participants were enrolled, including a few volunteers just outside the specified age range. All participants provided written informed consent. Approvals from international (Oxford Tropical Research Ethics Committee), national (Chinese Academy of Medical Sciences), and local ethics (from ten Centers for Disease Control and Prevention [CDC] of each region) committees were obtained prior to start of the study.

Follow-up for mortality and morbidity
The morbidity and mortality of each participant was monitored regularly through the People's Republic of China's CDC Disease Surveillance Points (DSP) system, checked annually against local residential records and HI records, and by active confirmation through street committee or village administrators. Causes of death from official death certificates were reported to the local CDC and coded using the tenth International Classification of Diseases (ICD-10) by trained staff, blinded to baseline information. If necessary, information from death certificates was supplemented by a review of medical records. For four major diseases (stroke, ischemic heart diseases, diabetes, and cancer), information on incidence was also collected through linkage with existing disease registries. In addition, electronic record linkage was established with the HI system that records details of all hospital admissions (including description of diagnoses, procedures, and ICD-10 codes). All records for COPD from any source were checked and standardized. By January 1, 2014, a total of 11,799 COPD (ICD-10: J41-J44) cases were identified from various sources ( Figure S2), with 87% obtained from HI records and the remainder from death registries.

Collection of clinical information for COPD
Among the 11,799 reported cases of COPD during ~7 years of follow-up, we randomly selected ~10% for retrieval of medical records. In the event that the relevant medical records could not be retrieved for certain cases, especially those who were admitted to hospital many years ago, a backup list of cases was provided to ensure that at least 1,000 cases (ie, 100 cases in each of the ten regions) were adjudicated. Based on the information generated and provided centrally by the CKB coordinating centers, the medical notes were collected by trained CKB staff who visited the hospital following formal approval from local health authorities and relevant hospital administration. Electronic photographs of all relevant sections of the medical records were collected and sent to the National Coordinating Centre for review of the data completeness. Although a total of 1,138 medical records were retrieved, 69 cases were subsequently excluded as they were duplicates, leaving 1,069 cases with relevant medical records for adjudication.

421
Validation of COPD diagnoses in People's republic of China adjudication of COPD Following verification of completeness of data by the National Coordinating Centre, the collected medical records were sent for independent adjudication to five physicians with a working knowledge of respiratory diseases, who, in turn, were supervised by a senior consultant with specialist accreditation in respiratory diseases. Based on the medical records, the physicians then completed a specific electronic database designed on the basis of extracted information and completed a disease validation form ( Figure S3) that included sections on sociodemographic, clinical, and adjudicated outcome for each case.
Although multiple medical and other related criteria help inform the diagnosis of COPD, the disease remains a clinical diagnosis and no single test result is, on its own, diagnostic for COPD. COPD cases were thus adjudicated on the basis of the clinical judgment of the respiratory physicians, blinded to any other study-related information collected. Each case was independently reviewed by one respiratory physician taking account of information collected from the following sources, where available: 1) medical history (including risk factors and respiratory symptoms such as chronic phlegm and breathlessness); 2) radiological examinations; and 3) spirometry (prebronchodilator [forced expiratory volume in 1 second {FEV 1 }/ forced vital capacity {FVC} ,70%]). In addition, based on the medical records, confirmed COPD cases were classified into the following subcategories: 1) chronic bronchitis, 2) emphysema, and 3) mixture of chronic bronchitis and emphysema. Similarly, the adjudication aimed to identify the actual medical condition(s) in misdiagnosed COPD cases (absence of COPD according to medical records). Finally, to ascertain the completeness of the electronic database generated by the adjudicators, ~10% of the adjudicated cases were randomly selected for central review at the Clinical Trial Service Unit (CTSU), Oxford, UK. Following the review, we observed that completeness of data acquisition was high, with 95% of the cases meeting the requirements of the adjudication process and consensus reached on the remainder following discussion.

statistical analysis
Baseline characteristics were compared between individuals with and without COPD events, standardized by 5-year age group, region, and sex of the overall baseline population. Positive predictive value (PPV), defined as the proportion of participants with an original diagnosis of COPD that was confirmed, was used as a direct measure of the validity of COPD diagnoses. We used SAS 9.3 (SAS Institute Inc., Cary, NC, USA) for all the statistical analyses.

Results
Overall, relevant medical records were retrieved for 1,069 cases from 153 hospitals for adjudication, which covered a 9-year period from 2004 to 2013. Table 1 shows a comparison of the baseline characteristics of 1,069 adjudicated cases with the total of 11,799 COPD cases. Overall, the adjudicated cases had mean age, education, household income, and smoking prevalence similar to the overall COPD cases. Conversely, adjudicated cases were more likely to be urban dwellers and have lower lung function and more severe COPD, as assessed by GOLD. With the exception of ischemic heart disease prevalence, which was higher in the adjudicated cases than in all reported COPD cases, there was little difference in the reported prevalence of hypertension, stroke, and diabetes between adjudicated COPD cases and all reported cases.
Among the 1,069 cases, 71 (6.6%) had no mention of any respiratory disease in their medical records (Figure 1). In the remaining 998 cases, COPD was confirmed in 911 (85.2% of 1,069) following adjudication ( Figure 1) and misdiagnosed in 87 (8.1%), as other respiratory diseases (85 cases), mainly pneumonia (58 cases and/or asthma [26 cases]), and pulmonary heart disease (2 cases). Of the 911 confirmed COPD cases, 520 had chronic bronchitis, 27 had emphysema, and the remaining 364 had both chronic bronchitis and emphysema.

Discussion
This outcome validation study of more than 1,000 COPD cases covered 153 hospitals across ten regions, and it showed that COPD diagnoses reported through routine health record systems are of good quality in the People's Republic of China, with an overall PPV of 85%. Invalid diagnoses arose from either misdiagnoses (~8%) of other respiratory diseases or reporting errors (~7%). The high validity of COPD diagnoses in CKB should facilitate reliable assessment of the determinants of COPD in the population. From an international perspective, the 85% true positive estimate for COPD diagnoses in the present study is higher than that reported previously in several studies 12-14 on Western populations. For example, in a Dutch study including 257 cases of chronic lung diseases from general practices in 1988, the PPV was 62.5%, 12   involved 951 cases, 13 or the 80% among selected 313 cases from the CPCSSN study, 14 which was initiated in 2004. The Dutch study was conducted before the launch of the GOLD initiative, and the adjudication used a combination of pulmonary function testing and X-rays to ascertain the presence of the disease. 12 The CPRD-GOLD study 13 used different algorithms including, as in the present study, COPD-related clinical codes, respiratory symptoms, spirometry results, and medication use. Likewise, the Canadian study conducted in the Saskatchewan province 15 developed case-finding diagnostic algorithms to identify cases with COPD using ICD-9 codes (490-496) from billing data, laboratory test results, and medications. The findings of the present study are lower than the 91.2% reported in the Swedish Inpatient Registry, 16 probably due to higher and more systematic use of pre-and postbronchodilator spirometry. Interestingly, in the Canadian survey, the validity of the diagnoses varied between 64.0% and 87.7% depending on the subtype of COPD based on ICD-9 codes. 15 These findings are consistent with the present study, with ICD-10 code J44 (including various forms of chronic and obstructive bronchitis) yielding the highest percentage of true positives followed by J42 (unspecified chronic bronchitis) and J43 (emphysema).
The difference in the reported validity of COPD diagnoses between different studies may reflect the calendar period when the disease was diagnosed and continuous improvement in diagnosis of COPD over the last few decades. The GOLD Initiative, launched in 1997, represented an important strategy to address the worldwide burden of COPD. Even before the GOLD initiative was proposed, a study 17 of secular trends of COPD admissions in four hospitals in Barcelona over two different periods reported that the kappa values for validity of diagnosis of COPD increased from 0.20 to 0.65 between 1985-1987 and 1989. In the People's Republic of China, GOLD guidelines were only endorsed in 2013. Therefore, the present study, which covers the period prior to 2013, is not able to address whether endorsement of the GOLD guidelines has had any measurable effect on how COPD patients are managed. In the present study, 8% of the reported COPD cases were actually due to misdiagnoses of other respiratory diseases, which reinforces that COPD is a challenging diagnosis, particularly in the early stages of the disease and when the alternative diagnosis is asthma. This is particularly true when spirometry is not widely used in many low-and middleincome countries such as the People's Republic of China.
The validity of COPD diagnoses in the present study was 84% for cases that were solely reported through the electronic HI system, but increased to 97% when a combination of death registry and electronic HI data were used, even though the latter accounted for only a small proportion of the reported cases. Although the HI system followed common frameworks and procedures, it was developed mainly to facilitate reimbursement of hospital care, with the data collected by different HI agencies in each region lacking a uniform reporting system. This may explain some reporting errors that could have occurred either during the recording of the cases in the different regional systems or during the coding processes themselves. The HI agencies are currently endeavoring to develop a uniform and standardized reporting system, and some of the agencies have merged; therefore, administrative errors should decrease in the future.
The validity of the COPD diagnoses was slightly higher in rural regions than in the urban ones. This observation is surprising since rural health care facilities in the People's Republic of China are less well equipped (including poor access to spirometry testing). 6 It is possible that in rural areas COPD cases may present in more advanced stages of

425
Validation of COPD diagnoses in People's republic of China disease 2 and, hence, are more easily diagnosed. In addition, our results could have been biased toward the rural regions as only 15% of the total COPD cases were from urban regions, whereas ~37% of the adjudicated cases were from the urban areas.
There are some limitations to this study. First, the sample of adjudicated cases may not be representative of all the COPD cases in CKB or, indeed, in the People's Republic of China. Indeed, some baseline characteristics differed between adjudicated and nonadjudicated cases. For example, lung function was lower in the sample analyzed, which could have yielded more severe cases of COPD that could have been more easily diagnosed and have had more comorbidity. This situation could reflect the fact that the medical records of participants with more hospital admissions, and consequently with more recent ones, may have been more likely to be retrieved. Second, the vast majority of COPD cases hospitalized were not assessed with spirometry, reflecting a well-recognized phenomenon of COPD management in the People's Republic of China.

Conclusion
In conclusion, COPD diagnoses reported through electronic HI systems in the People's Republic of China are generally of high quality, facilitating the conduct of large-scale epidemiological investigations of determinants of COPD, and do not warrant systematic adjudication of all reported COPD cases.

Members of the China Kadoorie Biobank collaborative group
Kadoorie Charitable Foundation, Hong Kong; long-term continuation: UK Wellcome Trust (088158/Z/09/Z, 104085/ Z/14/Z); Chinese National Natural Science Foundation (81390541, 81390544); The British Heart Foundation; UK Medical Research Council and Cancer Research UK provided core funding to the Oxford CTSU; support for the present respiratory study was partly provided by GlaxoSmithKline (WEUKBRE5848). JV was personally supported by the Swiss National Science Foundation (P2LAP3_155086), Lausanne University Hospital, and Société Industrielle et Commerciale de Produits Alimentaires Foundation.

Disclosure
KJD is an employee of GlaxoSmithKline. The authors report no other conflicts of interest in this work.

427
Validation of COPD diagnoses in People's republic of China Figure S1 The location of ten survey sites in China Kadoorie Biobank (CKB).