Statistical analysis of exon lengths in various eukaryotes
Alexander Kaplunovsky1, Anatoliy Ivashchenko2, Alexander Bolshoy1
1Department of Evolutionary and Environmental Biology, Genome Diversity Center, Institute of Evolution, University of Haifa, Israel; 2Department of Biotechnology, Biochemistry, Plant Physiology, Al-Farabi Kazakh National University, Kazakhstan
Purpose: The principal goals of this research were to investigate correlations between certain properties of exons in a gene (ie, between exon density and the corresponding protein length) and to compare genomic trees obtained with different approaches of clustering based on exonic parameters. The aim was a better understanding of exon–intron structures and their origin and development. The exon–intron structures of eukaryote genes are quite different from each other, and the evolution of such structures raises many problematic questions. As a preliminary attempt to address some of these questions, we performed a statistical analysis of gene exon–intron structures.
Methods: Taking whole genomes of eukaryotes, we went through all the protein-coding genes in each chromosome separately and calculated the portion of intron-containing genes and average values of the net length of all the exons in a gene, the number of the exons, and the average length of an exon. Comparing those chromosomal and genomic averages, we developed a technique of clustering based on characteristics of the exon–intron structure. This technique of clustering separates different species, grouping them according to eukaryote taxonomy.
Conclusion: Our conclusion is that the best approach is based on distances among four principal components obtained by factor analysis and followed by application of clustering algorithms, such as neighbor-joining, k-means, and partitioning around medoids.
Keywords: comparative genomics, exon–intron structure, eukaryotic clustering, principal component analysis
This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.Download Article [PDF]