Back to Journals » Clinical Epidemiology » Volume 11

Identifying pediatric diabetes cases from health administrative data: a population-based validation study in Quebec, Canada

Authors Nakhla M , Simard M , Dube M, Larocque I , Plante C, Legault L, Huot C , Gagné N , Gagné J, Wafa S, Benchimol EI , Rahme E 

Received 1 June 2019

Accepted for publication 13 August 2019

Published 11 September 2019 Volume 2019:11 Pages 833—843


Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 3

Editor who approved publication: Professor Henrik Sørensen

Meranda Nakhla,1,2 Marc Simard,3 Marjolaine Dube,3 Isabelle Larocque,3 Céline Plante,3 Laurent Legault,1,2 Celine Huot,4 Nancy Gagné,5 Julie Gagné,6 Sarah Wafa,2 Eric I Benchimol,7–9 Elham Rahme10

1Department of Pediatrics, Division of Endocrinology, Montreal Children’s Hospital, Montreal, QC, Canada; 2Centre for Outcomes Research & Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC, Canada; 3Institut National de Santé Publique du Québec, Québec, QC, Canada; 4Department of Pediatrics, Division of Endocrinology, Centre Hospitalier Universitaire Sainte-Justine, Montreal, QC, Canada; 5Department of Pediatrics, Division of Endocrinology, Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, QC, Canada; 6Department of Pediatrics, Division of Endocrinology, Centre Hospitalier de l’Université Laval, Quebec City, QC, Canada; 7Children’s Hospital of Eastern Ontario IBD Centre, Division of Gastroenterology, Hepatology and Nutrition, Children’s Hospital of Eastern Ontario, Ottawa, Canada; 8Children’s Hospital of Eastern Ontario Research Institute, Ottawa, Canada; 9Faculty of Medicine, School of Epidemiology and Public Health, University of Ottawa, Ottawa, Canada; 10Department of Medicine, Division of Clinical Epidemiology, McGill University, Montreal, QC, Canada

Correspondence: Meranda Nakhla
Center of Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, 5252 de Maisonneuve Blvd, W, 3rd floor, office E3.08, Montreal H4A 3S5, QC, Canada
Email [email protected]

Background: Type 1 diabetes is one of the most common chronic diseases in childhood with a worldwide incidence that is increasing by 3–5% per year. The incidence of type 2 diabetes, traditionally viewed as an adult disease, is increasing at alarming rates in children, paralleling the rise in childhood obesity. As the rates of diabetes increase in children, accurate population-based assessment of disease burden is important for those implementing strategies for health services delivery. Health administrative data are a powerful tool that can be used to track disease burden, health services use, and health outcomes. Case validation is essential in ensuring accurate disease identification using administrative databases.
Aim: The aim of our study was to define and validate a pediatric diabetes case ascertainment algorithm (including any form of childhood-onset diabetes) using health administrative data.
Research design and methods: We conducted a two-stage method using linked health administrative data and data extracted from charts. In stage 1, we linked chart data from a large urban region to health administrative data and compared the diagnostic accuracy of various algorithms. We selected those that performed the best to be validated in stage 2. In stage 2, the most accurate algorithms were validated with chart data within two other geographic areas in the province of Quebec.
Results: Accurate identification of diabetes in children (ages ≤15 years) required four physician claims or one hospitalization (with International Classification of Disease codes within 1 year (sensitivity 91.2%, 95% confidence interval [CI] 89.2–92.9]; positive predictive value [PPV] 93.5%, 95% CI 91.7–95.0) or using only four physician claims in 2 years (sensitivity 90.4%, 95% CI 88.3–92.2; PPV 93.2%, 95% CI 91.7–95.0). Separating the physician claims by 30 days increased the PPV of all algorithms tested.
Conclusion: Patients with child-onset diabetes can be accurately identified within health administrative databases providing a valid source of information for health care resource planning and evaluation.

Keywords: pediatric, validation, health administrative data, diabetes

Corrigendum for this paper has been published



Diabetes mellitus is one of the most common chronic diseases in childhood with significant morbidity and mortality.1,2 Type 1 diabetes (T1D) accounts for approximately 95% of the childhood diabetes with an incidence (32/100,000) and prevalence (250/100,000) in Canada, that is one of the highest in the world.3,4 Worldwide the incidence of T1D is increasing by 3% per year in children and by 5% per year in preschoolers.5 Furthermore, the incidence of type 2 diabetes (T2D), traditionally viewed as an adult disease, is increasing at alarming rates in children.6 Thus, the health care burden of childhood diabetes is high and is increasing.

As the incidence and prevalence of diabetes continue to rise in children, accurate population-based assessment of the ongoing burden of diabetes is essential for policy-makers and for those implementing and evaluating strategies for health services delivery. As an alternative to diabetes-specific registries which are costly and time-consuming, health administrative data provide an efficient tool for population-based surveillance and health services research. However, the validation of the best combination of health administrative codes (algorithm) which accurately identify a disease is an essential step before using administrative data within a defined population.7,8

An algorithm for identifying diabetes in the adult population (≥20 years of age) has been validated within health administrative data and is currently used by the Canadian Chronic Disease Surveillance System (CCDSS).9 However, algorithms validated in adults have been previously shown to be less accurate in children.3,10,11 Therefore, to assess disease burden, health care utilization and outcomes in children with diabetes, it is essential to develop and validate a pediatric-specific algorithm with excellent validity parameters. The few studies that have systematically validated an algorithm for pediatric diabetes have been limited by lack of identification of reliable reference cohorts against which to validate an algorithm as well as deficits in reporting markers of diagnostic accuracy, such as positive predictive values (PPVs) and negative predictive values (NPVs). Further, algorithms validated in one jurisdiction may not necessarily apply to other jurisdictions.12 Such limitations can lead to a risk of misclassification and incorrect estimations of prevalence and health services utilization.3,13,14 Our goal was to systematically develop and validate a pediatric diabetes algorithm for any form of childhood diabetes within health administrative data from Québec, Canada using a reliable reference cohort and appropriate validity parameters.


Ethics issues

Our study was approved by the research ethics boards of the Montreal Children’s Hospital (MCH), Centre hospitalier universitaire Sainte-Justine (HSJ), Maisonneuve–Rosemont Hospital (HMR), Centre hospitalier universitaire de Sherbrooke (CHUS), and Centre hospitalier de l’Université Laval (CHUL). The health administrative data are housed at the Institut national de santé publique du Québec (INSPQ), which has a comprehensive collection of longitudinal population-based administrative databases via a data-sharing agreement with the Quebec Ministry of Health and Social Services [Ministère de la Santé et des Services Sociaux du Quebec] and the Quebec Health Insurance Board [Régie de l’assurance maladie du Québec (RAMQ)]. Linkage of clinical data to the health administrative data at the INSPQ was approved by the Quebec Commission d’accès à l’information.

Administrative data sources

The health administrative databases included the RAMQ health insurance registry (demographic information including postal codes), RAMQ Physician Claims Database which contains all physician billings for remunerated services provided in outpatient clinics, emergency departments or hospitals, and the MED-ECHO Database (Québec Discharge Abstract Database) which contains data mandatorily collected from all Québec hospitals. The Physician Claims Database contains the diagnosis code (9th revision of the International Classification of Disease (ICD)) and date of service provision while the MED-ECHO database includes primary and secondary diagnosis codes (ICD-9 until 2006, ICD-10 thereafter), dates of admission to and discharge from hospital. All three databases were linked deterministically using a unique confidential patient identifier. As we do not have pharmaceutical or laboratory administrative data, we could not include measures such as insulin prescriptions or blood glucose values in our algorithms.

Study design

We conducted a two-stage method using linked administrative and clinical data. Stage 1 was to determine the diagnostic accuracy of a variety of algorithms and select those that performed the best within the population of Montreal and Laval; the second stage was to validate the selected algorithms within two other geographic areas in the province of Quebec (Sherbrooke and Quebec City). This method allowed us to maximize the internal validity of the case-definition in a region where the gold-standard population is reliable and ensures the external validity of the algorithm by applying it to other regions of Quebec. This two-stage methodology has been previously used to validate algorithms for adult and pediatric inflammatory bowel disease within health administrative data.10,12

The first step of stage 1 involved initially creating the health administrative database comprising the total population of children and youth over the study period using the Health Insurance Registry data. The second step involved identifying the true cases of diabetes from the electronic diabetes databases and medical charts in the three pediatric diabetes centers in Montreal, which are the only referral centers for children and adolescents with diabetes residing in Montreal and Laval. MCH and HSJ are two of four pediatric tertiary care hospitals in Quebec with a specialized pediatric diabetes clinic. HMR is a secondary care hospital that also follows children with diabetes living in Montreal and Laval. True cases of diabetes included cases of T1D, T2D, cystic fibrosis-related diabetes, and monogenic forms of diabetes. The third step was to link the true cases of diabetes to the administrative data obtained in step 1. We used this design as pediatric diabetes services are centralized and children and youth with DM living in Montreal and Laval regions are followed at one of the three centers described above. Non-cases of diabetes were identified by subtracting the true cases of diabetes from the total population of children and youth living in Montreal and Laval during our study period. The last step of stage 1 was to identify the most accurate algorithms. The algorithms that performed best were then validated in stage 2 among patients from pediatric care hospitals with pediatric diabetes clinics in CHUS (Sherbrooke) and CHUL (Quebec City).

Stage 1: case-definition algorithm development

Identification of the reference standard population

We used the MCH, the HMR, and the HSJ diabetes electronic databases and medical charts of patients with diabetes (incident and prevalent cases), ages 1–15 years, followed at these diabetes clinics between April 1st, 2002 and March 31st, 2011 to identify true cases of diabetes. Diabetes status, as well as type and date of diagnosis, was confirmed through chart abstraction using standard diagnostic criteria as per the Canadian Diabetes Association’s 2013 Clinical Practice Guidelines.15 The charts were reviewed by a trained research coordinator (SW). We limited the analyses in stage 1 to patients diagnosed before their 16th birthday as some patients between the ages 16–17 years may be followed by adult endocrinologists and therefore may not appear in pediatric diabetes clinic databases.

The overall population of individuals, ages 1–15 years that resided in Montreal or Laval and had a valid RAMQ health card number from 2002–2011 were identified using the RAMQ Health Insurance Registry. From this population, all individuals present from the list of true cases of diabetes were considered as cases and used as the positive reference standard. All other individuals, absent from the list of true cases of diabetes, were considered as non-cases and used as the negative reference standard. Individuals with prediabetes (impaired glucose tolerance, impaired fasting glucose) or transient diabetes (medication-induced diabetes, neonatal or gestational diabetes) were classified as non-cases. As seen in Figure 1, to be included in the reference standard, individuals required available health administrative data for at least 3 consecutive years between April 1st, 2002 (or 1st birthday, or 1st year with a postal code in Montreal or Laval) and March 31st, 2013 (or up to their 18th birthday).

Figure 1 Identification of the reference standard population for algorithm development.Notes: *Prediabetes, transient diabetes, drug-induced diabetes. **For a given year, a resident of Montréal/Laval is a child who lived in Montreal or Laval during the 365 days of the year and had a valid health insurance card at least half of the year.

Diagnostic accuracy

True cases of diabetes (positive reference standard) and non-cases (negative reference standard) were used to assess for potential case ascertainment. We determined the diagnostic accuracy (sensitivity, specificity, PPV, NPV) of a variety of algorithms, using combinations of physician billings and hospital admissions over 1 or 2 years bearing a diagnosis code of diabetes mellitus (ICD-9 250.X, 251.X; ICD-10 E10.X-14. X). The algorithm used in the adult population (one hospital admission or two physician claims over a 2-year period) was included in this list.5 The date of diagnosis was either the first hospitalization or first physician claim coded for diabetes. This date was required to occur before the 16th birthday (Ages 1–15 years) or before April 1st, 2011.

The most accurate algorithms were selected and agreed upon by a committee with expertise in the field of pediatric endocrinology, health services research, epidemiology, and health administrative data. The selection was based on having the highest PPV while maximizing specificity and maintaining sensitivity above 90%. For disease surveillance purposes, the most accurate sensitivity and specificity are needed to estimate an accurate prevalence (prevalence = (p-(1-c))/(s+c-1), where s=sensitivity, c=specificity, and p=proportion with a positive test).1621 While PPV is important in minimizing false positives in health outcomes research so as to accurately examine health care utilization or complications within the diabetes population. Using the selected algorithms, we also determined the accuracy according to diabetes subtype and age-groups (ages 1–4, 5–10, and 11–15 years).

Stage 2: case-definition validation in two other geographic areas in the province of Quebec (Quebec City and Sherbrooke)

We next validated the algorithms selected in stage 1, among patients from the pediatric care hospitals with pediatric diabetes clinics in two other geographic areas in the province. Individuals with diabetes, ages 1–17 years between April 1st, 2002 and March 31st, 2011, were identified from electronic health records and clinic lists of CHUS and CHUL. Charts of these patients were reviewed by the same reviewer as in stage 1 as well as a hospital archivist. Patients confirmed as having diabetes, and residing in the Sherbrooke or Quebec City area, served as the positive reference sample. For every true case of diabetes’ chart included, one chart from the general pediatric clinics was randomly selected and reviewed to confirm non-diabetes status during the same time frame to act as the negative reference standard. All chart abstractions were combined to generate the reference standard, which was then linked by their health card number to health administrative data. Using chart information as the reference standard, we calculated the diagnostic accuracy of the algorithms previously selected from stage 1.

Statistical analysis

We constructed 2×2 tables to calculate the diagnostic accuracy of various algorithms. In the stage 1 cohort, we calculated sensitivity, specificity, NPV, and PPV, as well as diabetes prevalence of differing algorithms using various combinations of physician billings and hospital records. For each, we calculated 95% confidence intervals using the Wilson score method. We also tested sensitivity by diabetes type (i.e., T1D vs T2D). For a subpopulation of the stage 1 cohort (children who were diagnosed between ages 1–15 years or before April 1st, 2011), we also tested the diagnostic accuracy of various algorithms across age-groups (1–4, 5–10, and 11–15 years) as of March 31st, 2011. For the stage 2 validation cohort, we calculated the sensitivity, specificity, LR+, and LR of the algorithms for both the 1–15-year age-groups as well as the 1–17-year age-groups to ensure accuracy of the algorithms in the older age-groups. Predictive values were not calculated because the prevalence of disease in this cohort did not approximate the prevalence of the general population.7,22 The algorithms selected by the committee were then applied to the entire provincial administrative data to calculate prevalence estimates, which were compared to the diabetes prevalence in the reference standard cohort from stage 1 to ensure that there was minimal bias in our PPV and NPV estimations. All analyses were conducted using SAS version 9.4 (SAS Institute Inc., Cary, USA).


Stage 1 algorithm development

Within the MCH, HSJ, and HMR hospital databases and chart reviews, 889 children and adolescents (ages <16 years) were identified as true cases of diabetes in Montreal or Laval and acted as the positive reference standard. Among them, 48.7% were male, 93.7% had T1D, 4.4% T2D with the remainder comprising other forms of diabetes including cystic-fibrosis-related diabetes and monogenic forms of diabetes. A total of 429,366 children served as the negative reference standard. Details of included and excluded patients are presented in Figure 1.

Relevant algorithms are presented in Table 1. We found that algorithms with physician claims separated by 30 days had a higher PPV as compared to those with claims separated by 1 day. For this reason, only the algorithms with claims separated by 30 days are presented, except for the currently used CCDSS definition. The algorithm that maximized both the sensitivity and PPV was having at least four physician claims or one hospitalization over a 1-year period (sensitivity 91.2%, 95% CI 89.2–92.9; PPV 93.5%, 95% CI 91.7–95.0). Using only physician claims (four physician claims in 2 years) decreased the sensitivity to 90.4% while maintaining the PPV at 93.2%. The CCDSS definition (one hospitalization or two physician visit claims in 2 years) was found to have a high sensitivity (97.8%) but a lower PPV (79.4%). Separating the visit claims by 30 days improved the PPV by 3.8–83.2%. The algorithm with the highest PPV (97.4%) was five physician visits in 1 year; however, the sensitivity decreased to 71.8%. The algorithm with the highest sensitivity (98.1%) was one physician claim visit or one hospitalization, but this resulted in a low PPV (44.2%). The algorithms of four physician claims or one hospitalization within a 1-year period, four physician claims within a 2-year period and the CCDSS definition were selected for validation in stage 2.

Table 1 Accuracy of different diabetes algorithms in a cohort of children and adolescents ages 1–15 years living in Montreal or Laval, 2002–2011 (stage 1)

Stage 2 algorithm validation

With chart review, 345 patients (ages 1–17 years) were confirmed to have diabetes and 366 did not have diabetes from the Sherbrooke and Quebec City hospitals. Among true cases of diabetes, 52.8% were male while 57.1% in the non-diabetes group were male. Amongst those with diabetes, 94.7% had T1D, 4.6% T2D, with the remainder comprising other forms of diabetes including cystic-fibrosis-related diabetes and monogenic forms of diabetes. The validity parameters of the two algorithms with the best performance in stage 1 as well as that of the CCDSS definition are shown in Table 2. We tested the selected algorithms in those ages 1–17 years and in those ages 1–15 years. Among those ages 1–17 years the algorithm with four physician visit claims or one hospitalization in 1 year achieved a sensitivity of 93.0%, specificity 100% and a negative likelihood ratio of 0.07. The CCDSS definition achieved a sensitivity of 98.8%, specificity 100%, and a negative likelihood ratio of 0.01. We could not calculate the positive likelihood ratio (sensitivity/1-specificity) as our specificity for all algorithms was equal to 1. Results were similar in those ages 1–15 years (Table 3).

Table 2 Validation of selected algorithms in a cohort of children and adolescents ages 1–17 years living in Quebec City or Sherbrooke, 2002–2011 (stage 2)

Table 3 Validation of selected case definitions in a cohort of children and adolescents ages 1–15 years living in Quebec City or Sherbrooke, 2002–2011 (stage 2)

Diabetes prevalence

The prevalence of true cases of diabetes in the reference standard population in children and adolescents (ages ≤15 years) was 0.17%. Diabetes prevalence using four physician claims or one hospitalization over a 1-year period in the overall Quebec population (ages ≤15 years) using health administrative data was 0.16% (95% CI 0.15–0.17%). The prevalence using four physician claims over a 2-year period was similar at 0.15% (95% CI 0.15–0.16%). The prevalence using the CCDSS definition was highest at 0.21% (95% CI 0.20–0.22%).

Performance of case-definition by diabetes type

Performance characteristics of the selected algorithms by diabetes type were higher for identifying T1D (Table 4). The four physician claims or one hospitalization over a 1-year period, yielded a high sensitivity for identifying T1D (93.0%, 95% CI 91.1–94.6%) but a lower sensitivity for identifying T2D (64.1%, 95% CI 48.4–77.3%). The algorithm of four physician claims, over 2 years, yielded a sensitivity of 92.4% (95% CI 90.4–94.0%) for T1D and 59.0% (95% CI 43.4–72.9%) for T2D.

Table 4 Diagnostic accuracy of selected algorithms by diabetes type in a cohort of children and adolescents ages 1–15 years living in Montreal or Laval, 2002–2011 (Types 1 and 2)

Performance of case-definition by age group

Diagnostic accuracy of the selected algorithms did not vary significantly by age group (Table 5). The algorithm of four physician claims or one hospitalization over a 1-year period yielded a high PPV while maintaining a good sensitivity for all age groups. Using this algorithm, the prevalence of diabetes amongst the 1–4, 5–10, and 11–15-year age-groups was 0.05%, 0.13%, and 0.27%, respectively. This was similar to the prevalence found in the reference standard cohort by age-group wherein in the 1–4, 5–10, and 11–15-year age-groups prevalence was 0.05%, 0.14%, and 0.27%, respectively. The CCDSS definition lowered the PPV to 72.3%, 82.5%, and 80.8% for the 1–4, 5–10, and 11–15-year age-groups, respectively, while increasing the sensitivity. The estimated prevalence was highest with the CCDSS definition across all ages.

Table 5 Diagnostic accuracy of selected algorithms by age-group in a cohort of children and adolescents ages 1–15 years living in Montreal or Laval in 2011


Our study supports the use of health administrative data as a powerful tool for population-based surveillance and evaluation of chronic disease. Consistent with current guidelines for conducting and reporting health administrative data validation studies, we systematically developed and validated an algorithm in two separate populations to identify children and adolescents with diabetes.7 We found two optimal algorithms that achieved each an excellent PPV with optimal sensitivity and specificity. These algorithms included four physician claims or one hospitalization within a 1-year period or four physician claims within a 2-year period. In both algorithms, physician claims were 30 days apart. Diagnostic accuracy of these algorithms was consistent across all age-groups but varied by diabetes-type. Further, diabetes prevalence determined by using these algorithms was similar to that of the reference standard cohort.

Previous validation studies in children identified the CCDSS definition of one hospitalization or two physician claims in a 2-year period as having the best performance characteristics.13,14 In Manitoba, the CCDSS definition provided a specificity of 99.9%, sensitivity of 94.2%, and PPV of 81.6%.14 While the specificity was high, maximizing PPV is important in reducing misclassification bias, which could threaten study validity.8 We found that the CCDSS definition did not perform as well as other algorithms with an insufficient PPV of 79.4%, despite a high sensitivity of 97.8% and specificity of 99.9%. Separating the physician claims by 30 days, increased the PPV to 83.2% while minimally decreasing the sensitivity to 96.7%. However, with the lower PPV the CCDSS definition we observed overestimated diabetes prevalence resulting in an overestimation of diabetes burden in the general population of children and adolescents as there may be an accumulation of false-positive cases over time.7

A systematic review of the quality of validation studies identified significant deficits in the validation and reporting of algorithms used to identify patients within health administrative data, particularly around the reporting of accurate PPV and NPV.7 High PPV is important in capturing true cases in epidemiologic research and in limiting false-positives. However, these predictive values are inaccurate when the prevalence in the validation cohort is not the true prevalence of disease in the population. Our study is the first pediatric diabetes validation study to compare the prevalence of disease in the reference standard cohort with that of the general population. The prevalence of diabetes in the Quebec population was similar to that of our validation cohort ensuring the accuracy of our predictive values. Further, previous validation studies did not clarify the optimal length of time between physician claims to accurately capture diabetes cases. We demonstrated that separating physician claims by 30 days significantly decreased the number of false-positives for all algorithms tested resulting in a maximal PPV while minimally decreasing the sensitivity. For instance, the Ontario definition of four claims within 2 years performed well in our study cohort; however, by separating the claims by 30 days the number of false-positives decreased, improving the PPV from 89.3% (95% CI 87.1–91.1%) to 93.2% (95% CI 91.3–94.7%).

The diagnostic accuracy of our optimal case-definitions did not vary by age group among children ages 1–15 years. The sensitivity of the optimal algorithms varied by diabetes type. Neither of the selected algorithms performed well in capturing T2D (sensitivity 59.0–64.1%) while both algorithms yielded a high sensitivity for T1D (92.4–93.0%). However, health administrative data of physician claims and hospitalizations are not sufficient to distinguish between type 1 and 2 diabetes. A previous Canadian study using administrative drug data linked with hospital and physician claims data attempted to distinguish between type 1 and 2 diabetes; however, the best performing case-definition for identifying T2D had a low PPV of 73.7% (95% CI 64.5–81.5%) with a sensitivity of 83.2% (95% CI 74.1–90.0%).13 Further, due to variations of clinical practice in pediatric T2D management wherein some individuals are managed with lifestyle alone, lifestyle with insulin or insulin with oral hypoglycemic agents, drug data may not be a reliable measure to distinguish one sub-type from another.

Strengths of our study include the use of a large population-based sample, validation of our algorithm in a separate cohort and ensuring that the prevalence of our reference standard cohort was similar to that of the general population. Despite these strengths, our study has limitations. First, we were unable to differentiate between T1D and T2D in our algorithms. Neither physician claims nor hospital discharge data are primarily collected for surveillance and information from ICD-9 diagnosis codes do not distinguish between diabetes type in physician claims. Therefore, misclassification of T2D may increase as the prevalence of T2D increases in the pediatric population. However, the vast majority of the pediatric population with diabetes continue to have T1D and as such misclassifying those with T2D will not affect the diagnostic accuracy of our algorithms. Having a diabetes registry would be potentially useful to distinguish between diabetes types; however, prospectively collecting clinical data or retrospective chart review comprising all residents of a province are costly and time-consuming to establish and maintain. Alternatively, health administrative data are not as costly and are easily accessible on a continuous basis, thus more feasible to study. Second, the gold standard that we had used for validation may not be a perfect gold standard and may miss those that have not yet come into contact with the health care system, such as adolescents with T2D. However, the percentage with pediatric T2D is small and those that have not yet been diagnosed with T2D would be very few such that the decrease in sensitivity would be minimal with no effect on the specificity. Third, our algorithm may not be as robust in older adolescents; however, this was not observed in the validation portion of the study (stage 2). Finally, the diagnostic accuracy of our validation algorithms may be improved if we had clinical and pharmacologic data such as insulin utilization, insulin pump use, or Hemoglobin A1c results. Future algorithms could incorporate these data resulting in better classification of diabetes type.

In summary, we have developed a pediatric diabetes algorithm specific for children and adolescents using a robust methodology that will allow for better case ascertainment in other regions with comparable physician claims and hospital data. For disease surveillance purposes, accurate sensitivity and specificity are needed to estimate an accurate prevalence, as such researchers can use the diagnostic parameters of our algorithms to determine accurate diabetes prevalence in their populations. Our validated algorithms achieved high diagnostic accuracy and their use within health administrative data will provide an efficient way of assessing the epidemiology, health services use, and outcomes of pediatric diabetes.


Meranda Nakhla was funded by Chercheur-boursier clinician Junior 2 from the Fonds de Recherche du Québec – Santé and the Ministère de la Santé et des Services Sociaux du Québec. The funders played no role in the conduct of the study, collection of data, management of the study, analysis of data, interpretation of data, or preparation of the manuscript.


Laurent Legault has served on advisory boards for Medtronic and Lilly; has received grants for unrelated research from Merck, Sanofi and AstraZeneca; and holds a share of intellectual property not related to this work. No competing interests were declared by any other authors in this work.


1. Dahlquist G, Kallen B. Mortality in childhood-onset type 1 diabetes: a population-based study. Diabetes Care. 2005;28(10):2384–2387. doi:10.2337/diacare.28.10.2384

2. Edge JA, Ford-Adams ME, Dunger DB. Causes of death in children with insulin dependent diabetes 1990-96. Arch Dis Child. 1999;81(4):318–323. doi:10.1136/adc.81.4.318

3. Guttmann A, Nakhla M, Henderson M, et al. Validation of a health administrative data algorithm for assessing the epidemiology of diabetes in Canadian children. Pediatr Diabetes. 2010;11(2):122–128. doi:10.1111/j.1399-5448.2009.00539.x

4. Nakhla M, Rochette L, Rahme E, Larocque I Trends in diabetes prevalence and incidence in children and youth in Quebec, Canada 2000-2008: a population-based study. International Society of Pediatric and Adolescent Diabetes. Oct 19–22, 2011; Miami.

5. Patterson CC, Dahlquist GG, Gyurus E, Green A, Soltesz G. Incidence trends for childhood type 1 diabetes in Europe during 1989-2003 and predicted new cases 2005-20: a multicentre prospective registration study. Lancet (London, England). 2009;373(9680):2027–2033. doi:10.1016/S0140-6736(09)60568-7

6. Amed S, Dean HJ, Panagiotopoulos C, et al. Type 2 diabetes, medication-induced diabetes, and monogenic diabetes in Canadian children: a prospective national surveillance study. Diabetes Care. 2010;33(4):786–791. doi:10.2337/dc09-1013

7. Benchimol EI, Manuel DG, To T, Griffiths AM, Rabeneck L, Guttmann A. Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data. J Clin Epidemiol. 2011;64(8):821–829. doi:10.1016/j.jclinepi.2010.10.006

8. Benchimol EI, Smeeth L, Guttmann A, et al. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Med. 2015;12(10):e1001885. doi:10.1371/journal.pmed.1001809

9. Responding to the Challenge of Diabetes in Canada: First Report of the National Diabetes Surveillance System (NDSS) 2003. Public Health Agency of Canada. Ottawa,ON: Her Majesty the Queen in Right of Canada, represented by the Minister of Health; 2003.

10. Benchimol EI, Guttmann A, Griffiths AM, et al. Increasing incidence of paediatric inflammatory bowel disease in Ontario, Canada: evidence from health administrative data. Gut. 2009;58(11):1490–1497. doi:10.1136/gut.2009.188383

11. Cummings E, Dodds L, Cooke C, et al. Using administrative data to define diabetes cases in children and youth. Can J Diabetes. 2014;33(3):228. doi:10.1016/S1499-2671(09)33116-0

12. Benchimol EI, Guttmann A, Mack DR, et al. Validation of international algorithms to identify adults with inflammatory bowel disease in health administrative data from Ontario, Canada. J Clin Epidemiol. 2014;67(8):887–896. doi:10.1016/j.jclinepi.2014.02.019

13. Amed S, Vanderloo SE, Metzger D, et al. Validation of diabetes case definitions using administrative claims data. Diabet Med. 2011;28(4):424–427. doi:10.1111/j.1464-5491.2011.03238.x

14. Dart AB, Martens PJ, Sellers EA, Brownell MD, Rigatto C, Dean HJ. Validation of a pediatric diabetes case definition using administrative health data in manitoba, Canada. Diabetes Care. 2011;34(4):898–903. doi:10.2337/dc10-1572

15. Booth G, Cheng AY; Canadian Diabetes Association Clinical Practice Guidelines Expert C. Canadian Diabetes Association 2013 clinical practice guidelines for the prevention and management of diabetes in Canada. Methods. Can J Diabetes. 2013;37(Suppl 1):S4–S7. doi:10.1016/j.jcjd.2013.01.010

16. Dendukuri N, Rahme E, Belisle P, Joseph L. Bayesian sample size determination for prevalence and diagnostic test studies in the absence of a gold standard test. Biometrics. 2004;60(2):388–397. doi:10.1111/j.0006-341X.2004.00183.x

17. Ladouceur M, Rahme E, Pineau CA, Joseph L. Robustness of prevalence estimates derived from misclassified data from administrative databases. Biometrics. 2007;63(1):272–279. doi:10.1111/j.1541-0420.2006.00665.x

18. Leong A, Dasgupta K, Chiasson JL, Rahme E. Estimating the population prevalence of diagnosed and undiagnosed diabetes. Diabetes Care. 2013;36(10):3002–3008. doi:10.2337/dc12-2543

19. Ng R, Bernatsky S, Rahme E. Observation period effects on estimation of systemic lupus erythematosus incidence and prevalence in Quebec. J Rheumatol. 2013;40(8):1334–1336. doi:10.3899/jrheum.121215

20. Slim Z, Rahme E, Bernatsky S. Capture of Rheumatoid Arthritis Cases within Quebec Health Administrative Database. J Rheumatol. Epub 2019 Mar 15. doi: 10.3899/jrheum.181121.

21. Walter SD, Irwig LM. Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. J Clin Epidemiol. 1988;41(9):923–937.

22. Manuel DG, Rosella LC, Stukel TA. Importance of accurately identifying disease in studies using electronic health records. BMJ. 2010;341:c4226. doi:10.1136/bmj.c4226

Creative Commons License © 2019 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.