Back to Journals » International Journal of Nanomedicine » Volume 8 » Supplement 1 Nanoinformatics

Feature selection and survival modeling in The Cancer Genome Atlas

Authors Kim H, Bredel M

Received 26 November 2012

Accepted for publication 11 January 2013

Published 16 September 2013 Volume 2013:8(Supplement 1 Nanoinformatics) Pages 57—62

DOI https://doi.org/10.2147/IJN.S40733

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2



Hyunsoo Kim,1 Markus Bredel2

1Department of Pathology, The University of Alabama at Birmingham, Birmingham, AL, USA; 2Department of Radiation Oncology, and Comprehensive Cancer Center, The University of Alabama at Birmingham, Birmingham, AL, USA

Purpose: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling.
Patients and methods: The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1) 1-nearest neighbor (1-NN) survival prediction method; (2) random patient selection method and a Cox-based regression method with nested cross-validation; (3) least absolute shrinkage and selection operator (LASSO) optimization using whole-genome gene expression profiles; or (4) gene expression profiles of cancer pathway genes.
Results: The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone.
Conclusion: The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology.

Keywords: brain, feature selection, glioblastoma, personalized medicine, survival modeling, TCGA

Creative Commons License © 2013 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.