Recent advances in computational epigenetics

php and incorporate the Creative Commons Attribution – Non Commercial (unported, v3.0) License (http://creativecommons.org/licenses/by-nc/3.0/). By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms (https://www.dovepress.com/terms.php). Advances in Genomics and Genetics 2018:8 1–12 Advances in Genomics and Genetics Dovepress


Introduction
Following the breakthrough in the typing of the human genome, 1,2 the need to understand how chemical modifications can alter gene involvement and function has prompted the study of the control system for gene switching. Developmental traits and differences, as well as disease initiation and progression, are all intrinsically linked to phenotypic plasticity (the degree to which non-genotypic factors determine phenotype form). 3,4 Genetic and genomic attributes have been studied in detail, and current knowledge is captured in databases, which include ENCODE, Ensembl, Gene, GEO and GWAS for humans, as well as other organism-specific examples. [5][6][7][8][9] The collated data are extensive and draw heavily both on state-of-the-art methods of gene sequencing and on extensive gene expression measurements, providing a basis for investigation of molecular evolution, disease-specific mutations and other. Despite this wealth of data, however, it is now clear that gene factors alone are insufficient to explain the complex mechanisms producing diversity and heritable changes in phenotype. In consequence, the last few decades have seen increased focus on the way in which reprogramming of the transcriptional regulatory network can occur. [10][11][12] Correspondence: Heather J Ruskin Advanced Research Computing Centre for Complex Systems Modelling, School of Computing, Dublin City University (DCU), Dublin 9, ireland Tel +353 1 700 5513 email heather.ruskin@dcu.ie 2 Ruskin and Barat Cells control gene expression by wrapping DNA double strands around clusters of core histone proteins in order to form nucleosomes, the building blocks of chromatin. Chromatin structure is known to be affected in the neighborhood of expressed genes, particularly in the case of promoter and enhancer genomic regions. 13 Hence, alterations in chromatin structure (caused by chemical modifications of both DNA and histone proteins) influence gene activity, causing speed up, slow down or even suppression of transcriptional initiation. These heritable alterations in the chromatin structure (which regulates transcription through gene expression or activation of protein-and RNA-encoding genes) leave the genetic code unaffected. Epigenesis is thus a second-order effect, which goes beyond the content of the genome to the way in which its message is compiled and implemented during development, cell proliferation and division. [14][15][16] Various enzymes are involved in the chemical modification process and are associated with the epigenetic mechanisms, "signatures" or markers of change, which "punctuate" the genetic code. 4,17 These fall broadly into chemical and protein groupings, with the former including DNA methylation and the latter various covalent posttranscriptional histone modifications such as methylation, acetylation, phosphorylation, ubiquitylation and sumoylation, with the first two being the more intensively studied to date. [18][19][20] DNA methylation involves the addition of a methyl group to a DNA strand and commonly acts as a mechanism to switch the gene "off " permanently, while histone modifications directly impact chromatin structure and affect gene expression values. These changes are molecule-as well as modification-specific; for example, histone H3 acetylation and deacetylation promote increase and decrease in gene expression, respectively; histone H3 trimethylated at lysine 36 (H3K36me3) or at lysine 4 (H3K4me3) as well as histone H3 dimethylated at lysine 4 (H3K4me2) are marks associated with enhanced gene expression, while H3K27me3 is associated with its repression. An extra level of complexity is added by different histone variants (coded by separate genes) being differentially represented in "open" versus "closed" or "compact" chromatin domains. 21,22 Epigenetic marks also reflect imprinting of genes by environmental factors such as diet and lifestyle, with such information also passed on to subsequent generations. [23][24][25][26] In recent years, heterogeneous micromolecular abnormalities have become increasingly associated with risk, onset and progression of a range of conditions and diseases, such as obesity, 11,27 mood disorders and other psychopathologies, [28][29][30] autoimmune and cardiovascular diseases, 31,32 as well as can-cers 33 and ageing. [34][35][36] Moreover, the reversible nature and faster dynamics of epigenetic changes are of major interest in the targeting of intervention, providing key motivation for pharmaceutical development over the last decade. 37,38 The need, therefore, to understand epigenetic changes and their influence on disease has stimulated development of numerous computational approaches and tools, for application to data generation, mapping and management, as well as analysis and therapy. While the Human Genome Project (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003) focused on sequencing all genes in human DNA (~20,000, with some three billion base-pairs), a similar largescale project, the Epigenomics Road Map (2008-date), is exploring specific patterns of epigenetic modifications, with the principal aim of creation of a map of the epigenome for multiple tissue types and cancers. 30,39 In particular, the data richness of many biological and medical fields, fueled by new technology and improvements in computing power, has meant that analysis of the patterns of changes, which disrupt normal gene regulation, is now feasible. The challenges presented by the "extra layer" of control have also served to emphasize the importance of interdisciplinary approaches, combining related fields of genomics, such as biochemistry and proteomics, with the hybrid discipline of bioinformatics as well as with traditional aspects of computer science, mathematics and the physical sciences. The formulation of models to explain the biological processes, together with interrogation of diverse data sources and in-depth integrated analysis, is an essential feature of the new paradigm. [40][41][42][43][44] In the following sections, modeling epigenetic dynamics from DNA methylation to histone modifications is considered, with achievements and challenges being discussed. The computational resources employed to manage various available data-types are considered, and technology-dependent methods and tools to produce quantitative epigenetic data are discussed next. Attempting illustration of the range of computational epigenetic methods is inevitably selective, due to restrictions of space here as well as to the exponentially growing literature base in different fields. In consequence, this review gives detail on a cancer example, while new directions and developments in other areas are briefly summarized.
Advances in Genomics and Genetics 2018:8 submit your manuscript | www.dovepress.com

3
Advances in computational epigenetics on different signature mechanisms and the multiple interactions between instances of these, which affect gene expression and functionality. Epigenetic regulation corresponds, in consequence, to emergent system behavior, with complex overall dynamics. [45][46][47] Some histone modifications take place much faster than others and the time scale for histone acethylation is much shorter than that for DNA methylation for which the system remains relatively stable, 48,49 but rates of change do not apply universally. Epigenetic dynamics have seen heightened interest from the computational modeling community, often in the context of disease initiation and progression. While early work focused on single changes, more recent efforts aim to explore interdependencies and the way in which system evolution occurs.
In an early application for gastric cancers, 50 computational modeling played a complementary role for in vivo/in vitro experiments in hypothesis testing, providing insight into overall methylation dynamics. Moreover, the role of aberrant promoter methylation in transcription pattern modification, subsequently explored, 51 demonstrated the long-term objective for a system model, namely multiple scaling of effects from aberrant cell changes to initiation and progression of disease. Phenomenological models (widely used in the physical and complexity sciences, including systems biology) 52 were developed for epigenetic mechanisms, in order to support formulation of hypotheses based on limited data, which could be later refined. Computational micromodels, such as that described, 45,53 utilized the Markov Chain Monte Carlo class of algorithms to mimic interdependence of epigenetic events through random sampling of states. Transcription information (as a function of histone modification levels and DNA methylation) permitted movement to new histone states, corresponding to associated transition probabilities, based on empirical data.
The importance of DNA methylation data in genomic stability and cellular plasticity, as well as in genomic imprinting and other normal cell processes, has motivated considerable efforts in modeling, prediction and analysis. Thus, in a study 54 on epigenetic leukemia therapy, a dynamic multicompartmental model of DNA methylation levels based on the activity of the Dnmt methyltransferase and other proteins is described. The numerical solution of the first-order partial differential equation model highlighted the mechanism for CpG island hypo-methylation via local modulation of such proteins. Further, in a discussion of asthma etiology, 55 the authors considered epigenome-wide effects and distinguished between the accepted form of "integrator model" (for epigenetic changes leading to disease) and a two-stage model process. In the former, all factors including genetic variants and stochastic events have equal weight in influencing the production of intermediate phenotypes, while, in the latter, initial exposure of methylation quantitative loci leads to genetic variants, which are then modified further by DNA methylation and through additional exposure. (Genetic variants, such as SNP haplotypes, are notably a focus of GWAS. 56 ) More recently still, 43 the modeling of DNA methylation dynamics using phylogenetic approaches was proposed, with specific focus on changes in CpG dinucleotides, vital to cell differentiation, as well as the structure of precursor and dependent cell types. A continuous-time Markov chain was used to draw inferences on CpG methylation dynamics.
Identification of intraindividual epigenetic variation with a view to understanding the molecular basis for disease risk has motivated epigenome-wide association studies (EWAS), 57,58 and the comparative basis is echoed in development of models such as AgentCrypt. 59 Here, the agent-based approach is used to explore intra-and interdependencies in human intestinal crypt structure and dynamics, together with the effect of potential inhibitors on methylation modification of intestinal tissue in disease onset.
Relatively recently, investigation of the interrelationship between histone modifications and DNA methylation has indicated that specific epigenetic combinations determine whether chromatin structure is open or compact. 21,22 Specific models for histone modification patterns, the histone code and contributions to the epigenome dynamics have also attracted increased attention in recent years. Thus, in a study, 46 the authors explored modifications to the histone code and specialized enzyme recruitment leading to alternative and stable heritable states, which "mark" the DNA sequence and control functions, such as gene expression. The dichotomy of such "marks" whether active or repressive was also considered by Ku et al 60 who developed a mathematical model of histone modification dynamics, where bivalent domains are thought to play an important role in stem cell differentiation and are related to known features of chromatin states. Further, a stochastic mathematical model proposed 42 describes molecular mechanisms involved in establishing histone modification patterns for a single gene, with non-phenomenological physical parameters.
In efforts to relate histone modifications, DNA methylation and higher-order chromatin structure, a model of transcriptional regulation of epigenetic processes was proposed with a view to reconciling earlier models with experimental data. 61 Performance was assessed in terms of stability properties and memory effects, with emphasis placed on experimental validation of theoretical predictions and the need for extension to multi-scale models to explore self-organization of chromatin. Cross talk between DNA methylation pathways and histone modifications has also been considered, 20 as have multi-scale effects in a generalized nucleation and looping model for epigenetic memory of histone modifications. 44 Cell mechanisms, producing and sustaining these patterns, were investigated as a prerequisite for predicting efficacy of epigenetic drug therapy.
It has been suggested that inheritance of the "epigenetic code" can be cumulatively summarized in terms of the "Epigenetic Code REplication Machinery" (ECREM), a macromolecular complex, consisting of enzymes such as the DNA methyltransferases, of chromatin organization and noncoding regulatory RNA. 32,62 The four mechanisms, identified, include (a) DNA methylation, (b) histone modification, (c) chromatin remodeling and (d) involvement of small (21-26 nt) and noncoding RNAs, whose role in cellular development and protection has been shown to be vital to the epigenetic regulatory network. 63 It seems clear that epigenetic events thus closely control gene expression and genomic regulation through multiple generations, with deregulation resulting in phenotype variability and increased susceptibility to disease. 64,65 A ideal, comprehensive model, compliant with ECREM principles, would encompass information on all mechanisms involved and dynamically monitor cumulative changes. To date, this goal has not been realized.

Data management -new developments Data resources and types
While PubMed records over 50,000 papers on epigenetics to date, with more than a third of these appearing since 2013, data generation discussions have been dominated by the intra-generational rather than the inter-generational processes (ie, inheritance of modified phenotype). 26 High-level descriptions of biological processes and their concomitant entities are available from the literature and experimental studies, but quantitative data, mainly captured through molecular and epigenetic databases, are increasingly abundant. Such resources range from the small and specialized to extensions of well-established and wide-ranging repositories of nucleotide sequences, transcriptional regulatory sites and transcription factors for human genes and diseases, as well as microarray and gene expression data. Current lists and descriptions are available from Internet hub sites, such as EMBL-EBI (https://www.ebi.ac.uk/training/online/course/ bioinformatics-terrified/what-database/relational-databases/ primary-and-secondary-databases), NAR (http://www. oxfordjournals.org/nar/database/c/), NGS (https://www.nextgenerationsequencing.info/bioinformatics/genetic-databases/ general-genetic-databases), HSLS (https://www.hsls.pitt. edu/obrc/index.php?page=human_genome), TCGA (https:// cancergenome.nih.gov) and NCBI (https://www.ncbi.nlm. nih.gov/genbank/).
Nucleotide data, 66 generated through next-generation sequencing methods, including RNA, whole genome and exome, as well as targeted technologies, 67,68 predominates. Substantial data on gene expression through microarrays and RNA sequencing (including nanopore variants) are also available. 69 Specific efforts for epigenetic measures have focused on DNA methylation content and patterns, as well as chromatin-associated proteins and methylated genes in various cancer types and other diseases. The value of the data types has been discussed in articles, including the one by Bock and Lengauer 70 which describes inferences on epigenetic states from DNA sequences and the one by Lim et al 71 which also reviews contemporary databases and tools, and more recently still through shared internet resources, such as Epigenie 72 which incorporates information on current large-scale projects, databases by data type and statistical data analysis and visualization tools.

Databases: graph databases -a new approach
Epigenetic and epigenomic databases have expanded enormously over the last two decades. 38 79 MethDB provides information on DNA methylation content and patterns across a number of species, tissues and phenotypes. 80 Other methylation information is contained in MethBank, 81 which focuses on integrated next-generation methylation programming data, MethPrim-erDB, 82 which captures primer sets for human and murine DNA methylation analyses, and REBASE, 83

5
Advances in computational epigenetics data from GenBank on thousands of DNA methyltransferase genes. Histone sequences (H1, H2 and H2B, H3 and H4) are available in total for nearly 900 species from the Histone Database, 84 while HIstome 85 contains data on human histone proteins and modifying enzymes. Chromatin-associated protein information and chromatin-remodeling factor sequences in eukaryotes are available from ChromDB 86 and CREMO-FAC, 87 respectively, while CR Cistrome 88 contains CHiP-seq data for human and murine histone modification linkages and chromatin regulators.
Representation and querying of these complex systems requires relational statements, linking the multiple interdependencies between genetic and epigenetic modifications. Very recently, however, it has been recognized that structured data management is not the only requirement; the complementary need is 1) for linkage and integration of multiple data types, designed to comply with different data schema, and 2) investigation of advanced hypotheses requiring complex and time-consuming query forms. In consequence, a novel graph database approach, which supports both integration and query speed-up, but also has wide-ranging context, has been proposed. 38 Nodes and edges in the graph database, respectively, represent concepts and associations, with the framework readily adaptable to highly interconnected data. Advantages include the use of graphical search algorithms and next-neighbor node-linked traversal searches, which give additional flexibility compared to the more conventional relational databases. Various graph database frameworks exist, including FlockDB, AllegroGraph and Neo4j amongst others, [89][90][91] with the last permitting both multi-relational graphs and directional relationships as well as supporting a flexible declarative query language (Cypher). In particular, the Neo4j framework has found considerable application in the biomedical sciences and can be used to complement data integration as well as exploratory analysis and visualization. Examples of interactive query tools for integration and management of different medical and biological data types are given, together with Neo4j linkages for data management and analysis (Figure 1). 38 Neo4j-based frameworks have also been used to assess performance of in silico models of biological systems, notably computational and mathematical models of cancer 92,93 (and the BioModels database). 94 Moreover, FlockDB supports application to reset rather than traversal searches (based on adjacency graph storage), providing a platform, similar to that of, for example, GraphLab, MapReduce and Scope (amongst others), for scalable execution. A comparative review of these machine learning paradigms for Cloud is provided in the work by Low et al. 95 Corbellini et al 96 provide an overview of graph databases for graph processing frameworks and for large-scale (predominantly social) networks in their work. Notes: important to understanding human diseases, multi-scale computational models link the micromolecular layer and genetic-epigenetic alteration to organism development. extraction and collation of data describing biomedical systems draws heavily on key publicly available databases (such as UniProt KB, Human Protein Atlas, Reactome, IntAct) as well as project-specific experimental datasets. Graph databases facilitate integration and querying of these heterogeneous datasets. The Neo4j graph database manages and presents incorporated data for analysis (using primarily R, Java and Python), in order to explore and visualize the interconnectivity of the integrated concepts. The Neo4j output graph (available in the JavaScript Object Notation format) can be processed further and linked to network sharing frameworks.

Targeted analysis, methods and tools
The increasing richness of resources for different data types has required corresponding elaboration of algorithms and tools. 72,73 Recommendations for the design and analysis of EWAS as well as the interpretation of the complex data generated have been put forward by Michels et al 97  In epigenome mapping initiatives, computational modeling has motivated use of a range of new technologies in an attempt to correlate characteristic behavior and explore joint methylation profiles for multiple targets. Thus, DNA methylation arrays (ChIP-Chip), ChIP-seq, methylationtargeted sequencing (eg, methylated DNA immunoprecipitation sequencing), bisulfite sequencing and others all feature widely, while various tools have been developed. These include ACME for identification of ChIP enrichment sites 99 and aids for mapping both short and long bisulfite sequence to the reference genome, 100,101 as well as tools for quantitative measurement of cytosine methylation levels, with examples including Bismark 102 and MOABS. 103 The latter is based on a beta-binomial hierarchical model for differential methylation, while a similar regression basis is used to model wholegenome bisulfite data in detection of differentially methylated sites in RADMeth. 104 Bisulfite sequence-mapped data of count form have a complicated variance-covariance structure, but recently MACAU, 105 a tool to identify differential DNA methylation, based on a binomial mixed model which takes account of both over-dispersion and genetic relatedness, has been described. Bayesian-based model tools include Bis-SNP, 41 which identifies allele-specific epigenetic events, as well as a faster version BS-SNPer. 106 Novel high-throughput nanopore sequencing variants, as well as diversity in technological platforms and in required sequencing depth, have also stimulated contributions to the recent literature. 107,108 Increasingly, sophisticated bioinformatic methods are required for downstream analyses as researchers attempt to interpret multi-locus methylation information from multiple samples, for example, methods such as model-based clustering described by Houseman et al, 109 tailored for data obtained with methylation-specific microarrays. Multivariate statistical methods, particularly for both supervised and unsupervised clustering, principal component analysis, regression and visualization tools, such as heatmaps, have proved vital to interpretation of outcomes for these complex data, 110 which are generated by combinations of epigenetic changes and molecular events. 111,112 A major challenge faced by EWAS is intra-sample celltype heterogeneity (different fractions of component cell types in the sample), and a number of statistical algorithms have been developed to address this issue. These algorithms can be classified as reference-based (with defined a priori DNA methylation profile for the tissue of interest) and reference-free (with a tissue-specific DNA methylation profile unavailable). [113][114][115] Text-and data-mining examples for extraction of epigenetic information from the literature, together with appropriate computational, mathematical and statistical methods, are widely reported. [116][117][118] Pooling summary-level genomewide and epigenome-wide studies may provide powerful new insights, 119 with combined query criteria highlighting multiple levels of control, which can apply to even a single change, as noted. 38 Furthermore, the previous focus of the epi-informatic approach, on DNA methylation and histone modification and the patterns that apply in various disease manifestations, has now extended to the integration of data within a scaffold network such as that for protein interaction, which specifies correlation between methylation and gene expression. Abnormal epigenetic marks may appear in cells of different types, with increased phenotypic plasticity associated with these anomalies and crucially linked to network properties, which can offer insight for diagnostics and therapy. 120,121 In this context also, genetic tools, such as genome-scale libraries, are attracting epi-informatic efforts, with other posttranscriptional modifications also used to investigate modulators of protein stability and mediate lossof-function screening, for example for cancers.

The case for cancer and data integration
Many epigenetic and epigenomic studies have focused on cancer, but distinction between cancerous and healthy states is not straightforward as cancer is neither a single disease nor uniform in progression or markers. 64,122 Epigenetic variability is intrinsic in normal tissue, so that achieving reliable targets for diagnosis and treatment of malignancies is heavily dependent on this and on the molecular properties that distinguish cell classes. 123,124 Major disruption in cell-cycle mechanisms of molecular adhesion and regulation results in abnormal gene expression and mutation of tumor-suppressor genes in tumors and neighbor tissue. 40,65 Transcriptional states and gene mutations are some of the many properties, operating at the genome level, that characterize cancer phenotypes, but refinement of these classifications requires additional Advances in computational epigenetics epigenetic information. Core relationships between DNA and histone proteins contribute to de-regulation of nuclear events in cells, including DNA damage repair as well as replication and transcription, and are prominent in disease initiation and progression. 19,125 In a recent review of an earlier model, its developers suggest that some genes are epigenetically disrupted even before occurrence of mutations leading to malignancies, causing altered differentiation throughout tumor evolution. 33 Many correlations between DNA-dependent events and histones also occur at the level of histone posttranslational modifications, leading to recruitment of non-histone proteins via specialized binding domains, rather than to alterations in nucleosome structure (the Histone Code hypothesis). Changes to chromatin conformation, effected through these histone modifications and binding of methyl residues on DNA cytosines from CpG dinucleotides, lead to "closure" and impedance of transcription. Considerable emphasis is also given in the literature to abnormal DNA methylation and hypermethylation of CpG islands situated close to promoter regions, as well as concomitant methylation of multiple loci, with strong indications that downregulation of expression of core genes results, together with distinctive phenotypes. 12,16,126,127 In the particular case of cancer, molecular subtypes (or stratification) reflect both disease etiology (with different molecular mechanisms disrupted in distinct subtypes) and different cell compositions. The former is evidenced, for example, by levels of differential gene expression and different sets of somatic mutations, and the latter by cell fractions; for example, in colon cancer, the mesenchymal 128 subtype contains a larger fraction of stromal cells than other subtypes. The characterization of subtypes is important as these can respond differentially to various treatments, but the importance of methylation data in terms of cancer molecular subtyping has been recognized only recently. Current computational methods for determining neoplastic disease subtypes are based on identifying groups of differentially expressed genes (ie, biomarkers) that can best discriminate between these. However, these methods can be unreliable since they yield different biomarker sets when applied to data from different studies. 129 Thus, in addition to using network approaches [128][129][130] to refine and better characterize existing subtype signatures, integrating -omics data of different types can enhance molecular subtyping of malignant neoplastic disease. In an analysis on colon cancer data, for example, consideration of genome-wide methylation in the context of expression-based subtype data derived from different datasets revealed that two molecular subtypes, little differentiated by expression, were distinguishable with respect to locusspecific methylation 131 (illustrated in Figure 2 for subtypes: Goblet-like/C2 and Inflammatory/C3 cells), and confirmed for a larger set of samples. 128 It is clearly important to ensure quality of methylation data in integrative analyses. A large proportion of cancer archival data (with extensive histological and clinicalpathological records and other -omics data linkage) is available as formalin-fixed paraffin-embedded (FFPE) samples, and concerns have been raised as to the impact of this preservation method on quality of sequencing. 135 Nevertheless, recent research 135,136 has shown that targeted sequencing can be successfully used to assess genome-wide methylation from FFPE samples, with an investigation of methylation calling in matched fresh-frozen and FFPE samples. Genomescale DNA methylation assessment has led to mixed results, however, in terms of establishment of methylation prognostic profiles, for example, on oral carcinoma, 137 where the authors also discuss problems associated with pre-processing, filtering and data normalization for downstream analysis. No consensus pre-processing guidelines currently exist for some quantitative platforms (such as Illumina HM450K in this example), with prediction heavily reliant on machine-learning methods, and only partial information typically available for screening or to signpost clinical outcomes. 116,127,130,133,137 Successful application of machine learning and data-mining methods for complex genomic data inevitably relies on exploiting information on the inherent data structure and different data types, as well as attention to practical implementation and interpretation.

Widening the scope: new directions
Ageing, as a major risk factor in many diseases, has stimulated considerable epigenetic research efforts over recent years. The role of stochastic epigenetic variation as a driving force in evolving health and development of disease has been considered, 25 while quantitative aspects of human ageing rates have been investigated through genome-wide methylation profiles. 34 In an epigenome-wide study, the authors discuss both cross-sectional and longitudinal DNA methylation changes and have identified more than 60 novel age-associated CpG sites, endorsing increased susceptibility to disease. 138 The dynamics of DNA methylation in ageing have also been explored through integrative data analyses, 139 while an investigation of epigenetic regulation of ageing has looked at the relationship between environmental inputs and genomic stability. 36 133 and consensus 134 (involving 5, 6 and 3 subtypes, respectively). Notes: The clustering analysis on the upper part of the figure shows that the first two subtyping schemes classify most samples from the two highly methylated clusters HM1 and HM2 in two different expression-based subtypes: CRCA stratifies the samples to Infl and Goblet-like subtypes and CCMS to C2 and C3 respectively. For the third signature (consensus), 134 these samples are classified to one single subtype (Goblet/Infl). The lower part of the figure illustrates FCA for the methylation clusters and these three expression-based subtyping signatures (panels A-C), spatial proximity between two labels on the factorial plane illustrating closeness/correspondence of the labeled modalities. FCA shows how subtypes Infl and C2 are very close to HM1 but clearly distinct from Goblet-like and C3 subtypes, which are in turn very close to HM2 (panels A and B), demonstrating how these subtypes can be distinguished by their respective methylation profiles. Note how for the third 'consensus' signature (panel C) the HM1 and HM2 labels do not appear separated any more, but are brought close together by their correspondence to the fused subtype 'Goblet-Infl'. SSM, stem/serrated/mesenchymal 134 -a subtype belonging to the consensus subtyping scheme. 134

9
Advances in computational epigenetics Epigenetic imprinting, regulation, modulation and inheritance questions have also been investigated for metabolic diseases, such as diabetes and obesity, 11,27 heart disease, 31,32 respiratory impairment 140 and others. Important evidence in recent years has also linked neuropsychiatric disorders with epigenetic marks as biomarkers of disease mechanisms and progression and of lifestyle exposure. In a recent paper, for example, the authors consider reconciliation of diverse data and discuss the efficacy of cross-tissue analysis, particularly combined with blood-based studies, for assessment of effectiveness of longitudinal courses of treatment. 30 Epigenetic investigation in brain function and behavioral studies is at a relatively early stage, but interest is growing rapidly, for example in pediatric psychology 29,141 and more generally. 39,142 The impact of early life experience on the epigenetics of neural development can have persistent effects into adulthood 143 and is being examined in the context of childcare, family structure and parenting practices. Computational models are also being developed for behavior and neural activity associated with anxiety traits and mental illness, and a recent review discusses the interplay of environmental factors with epigenetic regulation and plasticity in order to explore development of psychiatric disorders. 144 New model paradigms are still being explored to describe fundamental epigenetic mechanisms and processes involved in phenotypic plasticity. One such proposed 145 advocates the use of insect-based models to represent environmental or lifestyle insults affecting epigenetic regulation, since insects have the ability to produce distinct phenotypic variants from the same genotype through transcriptional reprogramming. The authors argue that not only does this imply relative cost-effectiveness in realizing experimental results, but also enables epigenetic trans-generational effects of environmental factors to be investigated in relation to cancers, neurodegeneration, ageing and infectious diseases.
Epigenetic inheritance and its role in evolutionary biology continue to pose many unanswered questions. It has been suggested, for example, that epigenetic drift has distinct evolutionary advantages, 35 while investigation of epigenetic modulators and their implications for gene expression and therapeutics, in particular, is a major target for future research. 33,37 Conclusion Many discussions on computational epigenetics have focused on generation of data and the ever-increasing wealth and diversity of large-scale databases and tools to mine them. However, this is still only part of the story. Managing and mapping for "-omics" studies are important steps, but the real challenge now is interpretation of these data in order to quantify risk and drive therapeutic development and disease management. System complexity means that questions posed are already challenging basic analysis, and it is clear that more sophisticated model frameworks and novel bioinformatic approaches will be demanded in order to draw meaningful statistical inferences for disease groups and individual profiles. There are indications in recent work that this overarching challenge is now being targeted. Newly emerging theories and model paradigms, efforts at integrative analyses involving multiple data types and the emergence of epigenetic biomarkers offer potential to address disease in novel ways, developing new directions for therapeutic strategies and preventive medicine. The role of computational epigenetics in developing the theories, models and methods required to make sense of complex biological and medical data cannot be overestimated.