Evaluation of novel candidate variations and their interactions related to bipolar disorders: Analysis of GWAS data
Authors Acikel C, Aydin Son Y, Celik C, Gul H
Received 11 May 2016
Accepted for publication 2 August 2016
Published 24 November 2016 Volume 2016:12 Pages 2997—3004
Checked for plagiarism Yes
Review by Single-blind
Peer reviewers approved by Prof. Dr. Roumen Kirov
Peer reviewer comments 2
Editor who approved publication: Dr Roger Pinder
Cengizhan Acikel,1 Yesim Aydin Son,2 Cemil Celik,3 Husamettin Gul4
1Department of Biostatistics, Gulhane Military Medical Academy, 2Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, 3Department of Medical Psychiatry, 4Department of Medical Informatics, Gulhane Military Medical Academy, Ankara, Turkey
Background: Multifactor dimensionality reduction (MDR) is a nonparametric approach that can be used to detect relevant interactions between single-nucleotide polymorphisms (SNPs). The aim of this study was to build the best genomic model based on SNP associations and to identify candidate polymorphisms that are the underlying molecular basis of the bipolar disorders.
Methods: This study was performed on Whole-Genome Association Study of Bipolar Disorder (dbGaP [database of Genotypes and Phenotypes] study accession number: phs000017.v3.p1) data. After preprocessing of the genotyping data, three classification-based data mining methods (ie, random forest, naïve Bayes, and k-nearest neighbor) were performed. Additionally, as a nonparametric, model-free approach, the MDR method was used to evaluate the SNP profiles. The validity of these methods was evaluated using true classification rate, recall (sensitivity), precision (positive predictive value), and F-measure.
Results: Random forests, naïve Bayes, and k-nearest neighbors identified 16, 13, and ten candidate SNPs, respectively. Surprisingly, the top six SNPs were reported by all three methods. Random forests and k-nearest neighbors were more successful than naïve Bayes, with recall values >0.95. On the other hand, MDR generated a model with comparable predictive performance based on five SNPs. Although different SNP profiles were identified in MDR compared to the classification-based models, all models mapped SNPs to the DOCK10 gene.
Conclusion: Three classification-based data mining approaches, random forests, naïve Bayes, and k-nearest neighbors, have prioritized similar SNP profiles as predictors of bipolar disorders, in contrast to MDR, which has found different SNPs through analysis of two-way and three-way interactions. The reduced number of associated SNPs discovered by MDR, without loss in the classification performance, would facilitate validation studies and decision support models, and would reduce the cost to develop predictive and diagnostic tests. Nevertheless, we need to emphasize that translation of genomic models to the clinical setting requires models with higher classification performance.
Keywords: Bipolar disorders, GWAS, MDR, Data Mining, SNP, Decision Support
This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.Download Article [PDF] View Full Text [HTML][Machine readable]