Analysis of binary responses with outcome-specific misclassification probability in genome-wide association studies

Romdhane Rekaya; Shannon Smith; El Hammidi Hay; Nourhene Farhat; Samuel E Aggrey

doi:10.2147/TACG.S122250

Back to Journals » The Application of Clinical Genetics » Volume 9

Original Research

Analysis of binary responses with outcome-specific misclassification probability in genome-wide association studies

Authors Rekaya R, Smith S, Hay EH, Farhat N, Aggrey SE

Received 13 September 2016

Accepted for publication 28 October 2016

Published 30 November 2016 Volume 2016:9 Pages 169—177

DOI https://doi.org/10.2147/TACG.S122250

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 3

Editor who approved publication: Prof. Dr. Martin Maurer

Download Article [PDF]

Romdhane Rekaya,^1–3 Shannon Smith,⁴ El Hamidi Hay,⁵ Nourhene Farhat,⁶ Samuel E Aggrey^3,7

¹Department of Animal and Dairy Science, College of Agricultural and Environmental Sciences, ²Department of Statistics, Franklin College of Arts and Sciences, ³Institute of Bioinformatics, The University of Georgia, Athens, GA, ⁴Zoetis, Kalamazoo, MI, ⁵United States Department of Agriculture, Agricultural Research Service, Beltsville, MD, ⁶Carolinas HealthCare System Blue Ridge, Morganton, NC, ⁷Department of Poultry Science, College of Agricultural and Environmental Sciences, University of Georgia, Athens, GA, USA

Abstract: Errors in the binary status of some response traits are frequent in human, animal, and plant applications. These error rates tend to differ between cases and controls because diagnostic and screening tests have different sensitivity and specificity. This increases the inaccuracies of classifying individuals into correct groups, giving rise to both false-positive and false-negative cases. The analysis of these noisy binary responses due to misclassification will undoubtedly reduce the statistical power of genome-wide association studies (GWAS). A threshold model that accommodates varying diagnostic errors between cases and controls was investigated. A simulation study was carried out where several binary data sets (case–control) were generated with varying effects for the most influential single nucleotide polymorphisms (SNPs) and different diagnostic error rate for cases and controls. Each simulated data set consisted of 2000 individuals. Ignoring misclassification resulted in biased estimates of true influential SNP effects and inflated estimates for true noninfluential markers. A substantial reduction in bias and increase in accuracy ranging from 12% to 32% was observed when the misclassification procedure was invoked. In fact, the majority of influential SNPs that were not identified using the noisy data were captured using the proposed method. Additionally, truly misclassified binary records were identified with high probability using the proposed method. The superiority of the proposed method was maintained across different simulation parameters (misclassification rates and odds ratios) attesting to its robustness.

Keywords: binary responses, misclassification, specificity, sensitivity

Creative Commons License © 2016 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.

Download Article [PDF]