Back to Journals » International Journal of Chronic Obstructive Pulmonary Disease » Volume 15

The Construction of Primary Screening Model and Discriminant Model for Chronic Obstructive Pulmonary Disease in Northeast China

Authors Li X, Guo Y, Li W , Wang W, Zhang F, Li S 

Received 17 February 2020

Accepted for publication 12 June 2020

Published 31 July 2020 Volume 2020:15 Pages 1849—1861


Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Richard Russell

Xiaomeng Li,1 Yuhao Guo,2 Wenyang Li,1 Wei Wang,1 Fang Zhang,1 Shanqun Li3

1Department of Respiratory and Critical Care Medicine, The First Hospital of China Medical University, Shenyang 110000, People’s Republic of China; 2Department of Mathematics and Statistics, Xi’an JiaoTong University, Xi’an 710049, People’s Republic of China; 3Department of Pulmonary and Critical Care Medicine, Zhongshan Hospital, Fudan University, Shanghai 200020, People’s Republic of China

Correspondence: Wei Wang Email [email protected]

Objective: The diagnosis of chronic obstructive pulmonary disease (COPD) is challenging, especially in the primary institution which lacks spirometer. To reduce the rate of COPD missed diagnoses in Northeast China, which has a higher prevalence of COPD, this study aimed to establish efficient primary screening and discriminant models of COPD in this region.
Patients and Methods: Subjects from Northeast China were enrolled from December 2017 to April 2019 from The First Hospital of China Medical University. Pulmonary function tests and questionnaire were given to all participants. Using illness or no illness as the goal for screening models and disease severity as the goal for discriminant models, multivariate linear regression, logical regression, linear discriminant analysis, K-nearest neighbor, decision tree and support vector machine were constructed through R language and Python software. After comparing effectiveness among them, the most optimal primary screening and discriminant models were established.
Results: Enrolled were 232 COPD patients (124 GOLD I–II and 108 GOLD III–IV) and 218 normal controls. Eight primary screening models were established. The optimal model was Y = − 1.2562– 0.3891X4 (education level) + 1.7996X5 (dyspnea) + 0.5102X6 (cooking fuel grade) + 1.498X7 (smoking index) + 0.8077X9 (family history)-0.5552X11 (BMI) + 0.538X13 (cough with sputum) + 2.0328X14 (wheezing) + 1.3378X16 (farmers) + 0.8187X17 (mother’s smoking exposure history during pregnancy)-0.389X18 (kitchen ventilation) + 0.6888X19 (childhood heating). Six discriminant models were established. The optimal model was decision tree (the optimal variables: dyspnea (x5), cooking fuel grade (x6), second-hand smoking index (x8), BMI (x11), cough (x12), cough with sputum (x13), wheezing (x14), farmer (x16), kitchen ventilation (x18), and childhood heating (x19)). The code was established to combine the discriminant model with computer technology.
Conclusion: Many factors were related to COPD in Northeast China. Stepwise logistic regression and decision tree were the optimal screening and discriminant models for COPD in this region.

Keywords: chronic obstructive pulmonary disease, screening, discriminant, severity, model


Chronic obstructive pulmonary disease (COPD) is a common, preventable and treatable disease that is characterized by persistent respiratory symptoms and airflow limitation. The incidence of COPD is particularly high in developing countries. In 2015, the number of COPD patients in China was nearly 100 million.1 In 2017, a large-scale prospective study conducted by Zhou et al2 confirmed that early intervention and treatment for COPD significantly slows the decline of lung function in patients with early COPD (stage I and II), delays the onset of acute exacerbation, reduces the hospitalization rate and improves the quality of life. These studies make early screening and evaluation for COPD a key issue.

The most accurate and specific tool for early screening, diagnosis and evaluation of COPD is pulmonary function test. However, the high operating cost of the spirometry and its technical requirements for operators make it unavailable for wide use in primary care institutions. It might lead to missed diagnosis and delayed treatment of COPD. Research shows that the all-cause mortality in patients with COPD who were misdiagnosed was 3.1 times as high as in people without airflow limitation, and the risk of contracting pneumonia was increased by 2.7 times in people with COPD. Since spirometry is not recommended in Global Strategy for Diagnosis, Management and Prevention of Chronic Obstructive Lung Disease (GOLD 2019, screening for COPD among asymptomatic individuals, simple, efficient and accurate screening tools for primarily screening COPD at an early stage are important. Some studies have used questionnaires or special spirometric measures in certain patients,3,4 but the results could not be applied widely because of differences in environment, weather, air pollution and life style among regions. For example, in Southern China, COPD is related to a wet environment and smoking, while in Northern China, cold weather, fuel exposure and closed space during winter contribute to the development of COPD. Thus, a screening tool should consider these factors, especially for primary hospitals.

With the development of information technology, the application of screening models combined with computer technology have become a preferred means of screening for diseases,5,6 and allowed countless patients to benefit from early diagnosis. Because of the high prevalence of COPD in Northeastern China, most patients with COPD must spend winters in Southern China. This study explored screening and discriminant models specialized for patients with COPD in Northeastern China, especially who live in regions where pulmonary function tests are not available, to help patients be diagnosed and managed as early as possible.

Patients and Methods


The research data were from a database at the outpatient and physical examination center of The First Hospital of China Medical University. From December 2017 to April 2019, 232 patients with COPD were first diagnosed by pulmonologists of The First Hospital of China Medical University and enrolled. GOLD was used to diagnose and categorize COPD severity. Entry criteria were: not diagnosed with COPD before being recruited; age between 40 and 80; airflow limitation ≤70% indicated by forced expiratory volume in one second (FEV1)/forced vital capacity (FVC); and FEV1 reversibility following inhalation of salbutamol <12% of pre-bronchodilator FEV1. Patients with acute exacerbation within the prior 3 months or other respiratory diseases were excluded. Matching by age and gender, we recruited 218 control individuals without respiratory diseases from our hospital. Enrollment criteria for controls were: age between 40 and 80, and no diseases affecting questionnaire filling and lung function tests. All participants were Chinese and we included only permanent residents of Northeast China. All participants were assessed by board-certified pulmonologists. Those with conditions such as mental disease or bronchodilator usage that could influence the results of questionnaires and pulmonary function were excluded. This study was approved by the Ethics Committee of The First Hospital of China Medical University. All participants were informed and agreed to the study.


The questionnaire we designed (Table 1) was based on the burden of obstructive lung disease (BOLD) study epidemiological questionnaire, the IPAG-recommended symptom-based COPD questionnaire,7,8 St. George’s respiratory questionnaire,9 the modified British Medical Research Council questionnaire10 and COPD assessment test.11 The questionnaire was adjusted according to GOLD guidelines especially for risk factors in Northeastern China, including demographic data, smoking status, history of fuel exposure, family history, related respiratory symptoms and understanding of the disease. All participants completed the questionnaire on their own or with the assistance of relatives.

Table 1 The Questionnaire

Pulmonary Function Test

Before the test, the safety and accuracy of implementation was evaluated. Participants were required to meet inclusion criteria and take no bronchodilators within 2 weeks. The contra-indications of spirometry testing were as following: had undergone chest, abdomen or eye surgery in the last 3 months; had a heart attack in the last 3 months (eg, angina, myocardial infarction, malignant arrhythmia); hospitalized for heart disease in the last 1 month; massive hemoptysis in the last 1 month; stroke in the last 1 month; receiving anti-TB drug treatment or having active pulmonary tuberculosis; uncontrolled severe hypertension in patients with diastolic pressure greater than 100 mm Hg and a systolic pressure greater than 200 mm Hg; aortic aneurysm; severe hyperthyroidism; medication for seizures; history of retinal detachment; or facial paralysis. Standard pulmonary function instruments (YAEGER, Vmax, Germany) were used. We used a 3-L syringe to calibrate the spirometer daily. Participants were seated, wearing a nose clip, and using a disposable mouthpiece. Participants were required to have an error of ≤0.15 L between the best value and the next best value of FVC and FEV1 in three acceptable tests. If FVC ≤ 1.0 L, the error was ≤0.10 L. We used the same criteria and administered a bronchodilator (salbutamol 400 µg) via inhalation through a 500-mL spacer and repeated spirometry after 20 min.12 Criteria for airflow limitation and grading were according to GOLD 2019. Airflow limitation was defined as the fixed ratio of FEV1/FVC <0.70 (post bronchodilator). Severity of airflow limitation was defined as GOLD I (mild, FEV1 ≥80% predicted), GOLD II (moderate, 50% ≤FEV1 <80% predicted), GOLD III (severe, 30% ≤FEV1 <50% predicted) and GOLD IV (very severe, FEVl <30% predicted).


Screening and discriminant variables were determined according to questionnaires (Table 1). They were resident type (x1), gender (x2), age (x3), education level (x4), dyspnea (x5), cooking fuel grade (x6), smoking index (x7), second-hand smoking index (x8), family history (x9), infectious history at child age (x10), body mass index (BMI) (x11), cough (x12), cough with sputum (x13), wheezing (x14), birth quarter (x15), farmers (x16), maternal pregnancy exposure (x17), kitchen ventilation (x18), childhood heating (x19), and current heating (x20). Discriminate factors were also quantitatively assigned. For further standardization, all variables except age, smoking index, second-hand smoking index and BMI were assigned from 0, and the order was based on their influence on the occurrence and development of COPD by GOLD guidelines (Table 2).

Table 2 The Variable Assignment

Establishment and Verification of Optimal Primary Screening and Discriminant Models

Before building the model, we completed missing data from the questionnaire collection process. Missing data completion methods were: variables 3, 7, 8, and 11 were completed using the mean method; variables 1, 2, and 3 were completed using the mode method; and the remaining variables were completed using the median method. Second, we performed Z-score standardization on the data set, X` = (X-mean)/standard deviation, and converted the corresponding variables to a distribution with mean 0 and variance 1 to eliminate the influence of dimension. Regularizing operations on different models were performed to eliminate the effect of overfitting the model on the prediction results. COPD primary screening and discriminant models were constructed using general linear regression (multivariate linear regression), generalized linear regression (logistic regression), linear discriminant analysis, K-nearest neighbor, decision tree, conditional decision tree and support vector machine method. Two hundred and thirty-two COPD patients and 218 control groups were randomly selected. The data set was split 4:1 by stratified random sampling and four-fifths was used as a training group to establish models (360, training set). One-fifth was used as a test group to test models (90, test set). Due to the resampling methods, bootstrapping or cross-validation was more powerful than splitting the sample for internal validation.13 We applied cross-validation on the basis of random stratification. By comparing F1 value, accuracy, recall rate, area under curve (AUC) value and precision, the optimal primary screening model was chosen. Multicollinearity was calculated to assess the feasibility of the optimal model. Receiver operating characteristic (ROC) curves and confusion matrix were constructed to describe the screening effectiveness of the optimal model. By analyzing total accuracy, the optimal discriminant model was chosen. The establishment and verification of the optimal model was:

(1) Primary screening model

We used l2 regularization to constrain the objective function in the optimization process of logistic regression to eliminate the influence of overfitting the model on the prediction results and improve the generalization ability of the model. (Regularization parameter C = 1)

According to a given set of patient samples T={x1i,x2i, … …,y2i}ni=1, x1i,x2i … … was a series of characteristic attributes of the i-th patient and y2i∈{0,1} was a two-category attribute variable (y2i = 0 indicated that the i-th patient did not have COPD, y2i=1 indicated that the i-th patient had COPD). The optimal COPD primary screening model was established by stepwise logistic regression.

According to the logic function (1)




the original form became:


Here, indicated the probability of a patient having COPD under Zi condition. Hence, presented the probability that the patient did not have COPD under Zi condition.


To calculate the partial coefficient, we did the regression, which yielded:


was the probability ratio (odds ratio) of a patient with COPD to a patient without COPD. The logarithm gave the equation:


This was the logistic regression model reflecting the probability of having COPD in terms of multiple related factors.

Based on the above model, stepwise logistic regression was carried out according to the principle of the lowest Akaike information criterion (AIC) value. The AIC information criterion is a standard to measure the goodness of fit of a statistical model. AIC encouraged the fitting goodness of data and tried to avoid overfitting, so the preferred model had the smallest AIC value.

(2) Discriminant model

According to a given set of patient samples T={x1i,x2i, … …,y3j}ni=1, x1i, x2i … … was a series of characteristic attributes of the i-th patient, y3j∈{1,3} was a three-category attribute variable (y3j = 1 meant the i-th patient did not have COPD, y3j = 2 meant that the i-th patient had mild/moderate COPD, and y3j=3 meant i-th patient had severe/very severe COPD). The optimal discriminant model was established by decision tree.


We calculated the empirical entropy H (T) of the data set T, which indicated the uncertainty of the data set T.

We calculated the empirical conditional entropy H (T|xj) of the feature xj versus the data set T. The uncertainty of classifying the data set T was given by feature xi.


We calculated the information gain, which was the reduced degree of uncertainty in the classification of the data set using the feature xi


Therefore, when we chose features for the model, the lower the uncertainty degree, the more the information gain. For the data set T, different features tended to have different information gains, and features with more information gains had stronger classification capabilities. After selecting the optimal feature recursively and dividing the training data according to the feature, the best classification for each subdata set under the current conditions was made. To eliminate the influence of overfitting, we pruned the original decision tree and set the maximum depth to 4 to generate the final decision tree.


Enrolled were 232 COPD patients aged from 40 to 80 years, 128 males and 104 females. Among them, 124 patients had mild and moderate COPD, and 108 had severe and very severe COPD. In addition, 218 normal subjects aged from 40 to 80 years were enrolled as the control group, 114 males and 104 females. Information about the 232 patients with COPD and 218 control participants is listed in Table 3. Compared to the control group, COPD patients showed some risk factors such as low weight, living in a bungalow, low level of education, and exposure to coal and firewood instead of electricity. After adjusting for gender and age, parameters of birth season, education level, BMI, dyspnea, family history, cough and sputum, wheeze, farmers, resident type, smoking/passive smoking, mother’s smoking history during pregnancy, fuel exposure level, childhood heating and current heating had statistical differences between COPD patients and normal controls. Education level, dyspnea, BMI, cooking fuel exposure, cough and sputum, wheeze, farmers, mother’s smoking history during pregnancy, kitchen ventilation and current heating were related to the severity of COPD.

Table 3 The Comparison of Basic Information Among Three Groups

Establishment and Verification of the Primary Screening Model for COPD

Recorded information was used to construct primary screening models. The effectiveness of each model was evaluated to find the optimal screening model for patients with COPD in Northeast China (Tables 4 and 5). Since test set substitution had more practical significance, the stepwise logistic regression prediction model was determined to be the optimal primary screening model.

Table 4 The Summary of Primary Screening Models for COPD (Training Set with Cross-Validation)

Table 5 The Summary of Primary Screening Model for COPD (Test Set)

Logistic regression was performed on the training set with 20 influencing factors (x1-x20) used as independent variables. After standardization, corresponding mean and standard deviation were determined and are in Table 6. “Illness or not (y)” was used as a dependent variable for logistic regression, and backward stepwise logistic regression based on AIC values was used to filter the variables. This yielded the equation:

Table 6 Corresponding Mean and Standard Deviation of Primary Screening Model Variables After Standardization

Dependent variables were tested and no multicollinearity (√vif < 2) was found (Table 7). We calculated variable parameters and significance of the model. The null hypothesis for the regression equation significance test was rejected because the P value of some selected variables was less than 0.05 (Table 8), so the relationship between dependent variables was statistically significant. Finally, the model was tested with the test and training sets. The ROC curve of the primary screening model was in Figures 1 and 2. The model had excellent predictability and 0 was the optimal critical point. According to the confusion matrix (Tables 9 and 10), in the training set, sensitivity was 0.9569, specificity was 0.948, positive predictive value was 0.951, and negative predictive value was 0.953. In the test set, sensitivity was 0.956, specificity was 0.977, positive predictive value was 0.978and negative predictive value was 0.956. In the test set, accuracy was 0.9667, F1 value was 0.9670and AUC was 0.967.

Table 7 Multicollinearity Analysis of Independent Variables

Table 8 The Variable Parameters and Significance of Primary Screening Model for COPD

Table 9 The Confusion Matrix of Primary Screening Mode l (Training Set with Cross-Validation)

Table 10 The Confusion Matrix of Primary Screening Model (Test Set)

Figure 1 The ROC curve of the primary screening model (training set with cross-validation).

Figure 2 The ROC curve of the primary screening model (test set).

Establishment and Verification of the COPD Discriminant Model

According to Table 11, the decision tree model (test set accuracy 0.8333, training set with cross-validation accuracy 0.8361) was the optimal discriminant model because the results of the test set had more clinical significance and research value. Thus, we used Python software to establish a decision tree model for the training set (Figure 3) and tested our discriminant model. Since the branch of the tree was not complicated, we decided not to prune the model in order to protect the amount of variables (Figure 3). The information value of the COPD discriminant model is in Table 9. We chose 10 for Nsplit because the error no longer changed at this level. The parameter trend graph of COPD discriminant model is in Figure 3. The confusion matrix of the model is in Table 12 (training set with cross-validation accuracy) and Table 13 (test set). In the training set, sensitivity was GOLD I–II 0.737 and GOLD III–IV 0.666, specificity was 0.977, positive predictive value was GOLD I–II 0.73 and GOLD III–IV 0.7, negative predictive value was 0.918, and accuracy was 0.8361. In the test set, sensitivity was GOLD I–II 0.8 and GOLD III–IV 0.619, specificity was 0.95, positive predictive value was GOLD I–II 0.666 and GOLD III–IV 0.866, negative predictive value was 0.933, and accuracy was 0.8333. With computer technology, we turned the optimal discriminant model to the coding program to apply conveniently (Appendix 1).

Table 11 The Effectiveness of Different Discriminant Models

Table 12 The Confusion Matrix of Decision Tree (Training Set with Cross-Validation)

Table 13 The Confusion Matrix of Decision Tree (Test Set)

Figure 3 The discriminant model for COPD.


This study explored high-risk factors for COPD in Northeast China. Several factors were found to be related to development of the disease such as the season of birth, BMI, family history, living environment, mother’s smoking history during pregnancy, biofuel exposure and current heating style. Due to the severe cold winter in the Northeast China, especially rural people who live in bungalows usually have a longer wood or coal-burning heating time, and a higher chance for exposure to biofuel. This study found that the above factors were closely related to occurrence and severity of COPD and were different from factors in the southern area of China. In Northeast China, among the primary screening and discriminant models constructed with data on high-risk factors of COPD patients, a logistic regression model was the most effective for primary screening and a decision tree model was the best for discrimination. Combined with the computer technology, both models could be applied conveniently and accurately for COPD assessment by inputting the related factors. This study investigated the influence of regional characteristics of Northeast China on patients with COPD. An optimal primary screening model and a discrimination model were established by statistically comparing different models that were particularly suitable for primary hospitals.

Besides age, gender, education and smoking history, which were known as risk factors, some special factors were related to the occurrence of COPD in Northeast China such as family history, mother’s smoking history during pregnancy, birth season, resident type, BMI, fuel exposure, kitchen ventilation and heating style. The last four factors also contributed to COPD severity. First, our research found that family history was an important factor for predicting the occurrence of COPD, consistent with a study by McCloskey et al.14 Although no studies documented hereditary deficiency of alpha-1 antitrypsin (AATD), we hypothesize that Asians may be affected by certain genes since a change in the gene encoding matrix metalloproteinase 12 (MMP12) is reported to be related to COPD in Asians. Chinese scholars15 reported that the glutathione S-transferase M1(GSTM1) null, GSTT1 null, and combined GSTM1/glutathione S-transferase theta 1(GSTT1) null genotypes might be risk factors for development of COPD. The GSTT1 null polymorphism showed association with only Asian COPD patients. Thus, genetics with environmental factors may influence the susceptibility to disease among specific populations.14 The exact factors that led to “familial aggregation” in Asia such as similar living environments and lifestyle or some potential genes, deserved to be further investigated. Second, Tager et al16 found that smoking during pregnancy-imposed risks on the fetus and affected the development of the lungs and immune system during the first 18 months of life. Some studies further observed that exposure to smoking during childhood and adolescence affects lung growth.17,18 In our research, exposure from maternal smoking pregnancy was related to occurrence of COPD and also contributed to its severity. This is important information for COPD prevention. Third, the relationship between BMI and the incidence and severity of COPD is still under discussion, with no conclusion drawn for now.19,20 However, in Northeast China, BMI was an important factor in both the screening and discriminate models of COPD. This importance may be related to the fact that people in the north are usually stronger than those in the south. Low BMI and particularly low fat-free mass are associated with worse outcomes,21 which might explain different prognoses between the north and the south for COPD patients. Fourth, fuel exposure, heating style and resident type were considered to be important screening variables in our study, but removed in the COPD model established for the southern part of China. These indicated regional differences in COPD pathogenesis should be considered during COPD study.

The establishment of COPD-related models has been reported previously. Acute exacerbation is known to have a detrimental impact on COPD prognosis. Garcia-Aymerich et al22 studied 340 patients with COPD and acute exacerbation at four tertiary hospitals in the Barcelona area of Spain. The study established a Cox proportional hazards model to obtain independent relative risks of readmission for patients with COPD. Furthermore, since the main characteristic of COPD is irreversible flow limitation (decreased FEV1), ZafariZ23 acquired data about 5594 patients and developed an individualized prediction model for FEV1 in smokers with mild-to-moderate COPD. Su et al24 implemented the prediction model for COPD among people more than 40 years old with respiratory symptoms and smoking history (≥20 pack-years). In contrast to these studies, which were mainly aimed at smokers, Chen et al25 used the data of 4167 participants from the Framingham Offspring Cohort as an accurate tool to predict long-term lung function trajectories and the risk of airflow limitation in a general population using 20 common predictors. Further, Cui et al26 established a discriminant-function model based on Bayes’ Rule by stepwise discriminant analysis of the data from 243 patients with COPD and 112 non-COPD individuals in urban and rural communities and local primary care settings in Guangdong Province, China. However, these studies established different models to assess FEV1 or COPD by different methods. The optimal model is not known without statistical comparison. Melanie et al27 conducted a detailed study of 30 articles in the 4481 COPD model records and found that only 4 studies were of good quality and included for review. During the analysis of these four studies, scientists discovered that the studies have significant differences in the included predictive indicators and the statistical methods selected. Guerra et al28 analyzed 25 studies with 27 prediction models and found that only 3 models used high-quality statistical approaches. Therefore, our study established different models and did statistical comparisons to determine the optimal primary screening model and the best discrimination model to evaluate COPD in Northeast China. The verification process also showed the high effectiveness of these two models.

A limitation for this study was that we established COPD models for Northeast China instead of all of China. In view of the large regional differences between the north and the south such as the environment and weather, which is crucial in the development of COPD, we decided it was necessary to analyze risk factors and set models separately. It will be helpful to understand the different phenotypes of COPD.

In brief, COPD in Northeast China had special regional risk factors such as mother’s smoking history during pregnancy, BMI, resident type, fuel exposure and current heating style. Among the primary screening and the discriminant models constructed with these high-risk factors, optimal models were a logistic regression model for primary screening and a decision tree model for discrimination. By using these models, doctors can easily primarily screen COPD and assess its severity, especially during COPD surveys in Northeast China.

Ethics Statement

Procedures and experiment protocols were performed in accordance with the National Institute of Health Guide for Care and were approved by the Ethics Committee of China Medical University in accordance with the Declaration of Helsinki. All participants provided written informed consent.

Author Contributions

All authors made substantial contributions to conception and design, acquisition of data, or analysis and interpretation of data; took part in drafting the article or revising it critically for important intellectual content; gave final approval of the version to be published; and agree to be accountable for all aspects of the work.


The authors declared that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


1. Wang C, Xu J, Yang L, et al. Prevalence and risk factors of chronic obstructive pulmonary disease in China (the China pulmonary health [CPH] study): a national cross-sectional study. Lancet. 2018;391(10313):1706–1717. doi:10.1016/S0140-6736(18)30841-9

2. Zhou Y, Zhong NS, Li X, et al. Tiotropium in early-stage chronic obstructive pulmonary disease. N Engl J Med. 2017;377(10):923–935. doi:10.1056/NEJMoa1700228

3. Dilektasli AG, Porszasz J, Casaburi R, et al. A novel spirometric measure identifies mild COPD unidentified by standard criteria. Chest. 2016;150(5):1080–1090. doi:10.1016/j.chest.2016.06.047

4. Mirsadraee M, Boskabady MH, Attaran D. Diagnosis of chronic obstructive pulmonary disease earlier than current Global initiative for obstructive lung disease guidelines using a feasible spirometry parameter (maximal-mid expiratory flow/forced vital capacity). Chron Respir Dis. 2013;10(4):191–196. doi:10.1177/1479972313507461

5. Caubet Fernandez M, Drouin S, Samoilenko M, et al. A Bayesian multivariate latent t-regression model for assessing the association between corticosteroid and cranial radiation exposures and cardiometabolic complications in survivors of childhood acute lymphoblastic leukemia: a PETALE study. BMC Med Res Methodol. 2019;19(1):100. doi:10.1186/s12874-019-0725-9

6. Vavougios GD, Doskas T, Konstantopoulos K. An electroglottographical analysis-based discriminant function model differentiating multiple sclerosis patients from healthy controls. Neurol Sci. 2018;39(5):847–850. doi:10.1007/s10072-018-3267-8

7. International Primary Care Airway Group. Ipag diagnosis management handbook–chronic airways disease. A guide for primarycarephysician[M/OL]. (2005—1)[2009-3—15]. Available from:

8. Levy ML, Fletcher M, Price DB, et al. International Primary Care Respiratory Group (IPCRG) guidelines: diagnosis of respiratory diseases in primary care. Prim Care Respir J. 2006;15:20–34.

9. Jones PW, Quirk FH, Baveystock CM, Littlejohns P. A self-complete measure of health status for chronic airflow limitation. The St. George’s respiratory questionnaire. Am Rev Respir Dis. 1992;145(6):1321–1327. doi:10.1164/ajrccm/145.6.1321

10. Fletcher CM. Standardised questionnaire on respiratory symptoms: a statement prepared and approved by the MRC Committee on the Aetiology of Chronic Bronchitis (MRC breathlessness score). BMJ. 1960;2:1662.

11. Jones PW, Harding G, Berry P, et al. Development and first validation of the COPD assessment test. Eur Respir J. 2009;34(3):648–654. doi:10.1183/09031936.00102509

12. Miller MR, Hankinson J, Brusasco V, et al. Standardisation of spirometry. Eur Respir J. 2005;26(2):319–338. doi:10.1183/09031936.05.00034805

13. for the PROBAST Group†; Wolff RF, Moons KG, Riley RD, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51–58. doi:10.7326/M18-1376.

14. McCloskey SC, Patel BD, Hinchliffe SJ, Reid ED, Wareham NJ, Lomas DA. Siblings of patients with severe chronic obstructive pulmonary disease have a significant risk of airflow obstruction. Am J Respir Crit Care Med. 2001;164(8):1419–1424. doi:10.1164/ajrccm.164.8.2105002

15. Ding Z, Wang K, Li J, Tan Q, Tan W, Guo G. Association between glutathione S-transferase gene M1 and T1 polymorphisms and chronic obstructive pulmonary disease risk: a meta-analysis. Clin Genet. 2018;95(1):53–62. doi:10.1111/cge.13373

16. Tager IB, Ngo L, Hanrahan JP. Maternal smoking during pregnancy. Effects on lung function during the first 18 months of life. Am J Respir Crit Care Med. 1995;152(3):977–983. doi:10.1164/ajrccm.152.3.7663813

17. Barker DJ, Godfrey KM, Fall C, Osmond C, Winter PD, Shaheen SO. Relation of birth weight and childhood respiratory infection to adult lung function and death from chronic obstructive airways disease. BMJ. 1991;303(6804):671–675. doi:10.1136/bmj.303.6804.671

18. Todisco T, de Benedictis FM, Iannacci L, et al. Mild prematurity and respiratory functions. Eur J Pediatr. 1993;152(1):55–58. doi:10.1007/BF02072517

19. Harikkhan RI, Fleg JL, Wise RA. Body mass index and the risk of COPD. Chest. 2002;121(2):370–376. doi:10.1378/chest.121.2.370

20. Liu Y, Pleasants RA, Croft JB, et al. Body mass index, respiratory conditions, asthma, and chronic obstructive pulmonary disease. Respir Med. 2015;109(7):851–859. doi:10.1016/j.rmed.2015.05.006

21. Guo Y, Zhang T, Wang Z, et al. Body mass index and mortality in chronic obstructive pulmonary disease: a dose-response meta-analysis. Medicine (Baltimore). 2016;95(28):e4225. doi:10.1097/MD.0000000000004225

22. Garcia-Aymerich J, Farrero E, Félez MA, Izquierdo J, Marrades RM, Antó JM. Risk factors of readmission to hospital for a COPD exacerbation: a prospective study. Thorax. 2003;58(2):100–105. doi:10.1136/thorax.58.2.100

23. Zafari Z, Sin DD, Postma DS, et al. Individualized prediction of lung-function decline in chronic obstructive pulmonary disease. Can Med Assoc J. 2016;188(14):1004–1011. doi:10.1503/cmaj.151483

24. Su KC, Ko HK, Chou KT, et al. An accurate prediction model to identify undiagnosed at-risk patients with COPD: a cross-sectional case-finding study. NPJ Prim Care Respir Med. 2019;29(1):22. doi:10.1038/s41533-019-0135-9

25. Chen W, Sin DD, FitzGerald JM, Safari A, Adibi A, Sadatsafavi M. An individualized prediction model for long-term lung function trajectory and risk of COPD in the general population. Chest. 2019;157(3):547–557.

26. Cui J1, Zhou Y, Tian J, et al. A discriminant function model as an alternative method to spirometry for COPD screening in primary care settings in China. J Thorac Dis. 2012;4(6):594–600. doi:10.3978/j.issn.2072-1439.2012.11.06

27. Melanie M, Gayan B, Jennifer P, et al. Prediction models for the development of COPD: a systematic review. Int J Chron Obstruct Pulmon Dis. 2018;13:1927–1935. doi:10.2147/COPD.S155675

28. Guerra B, Gaveikaite V, Bianchi C, Puhan MA. Prediction models for exacerbations in patients with COPD. Eur Respir Rev. 2017;26(143):pii:160061. doi:10.1183/16000617.0061-2016

Creative Commons License This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.