Methylation as an epigenetic source of random genetic effects in the classical twin design
Received 12 May 2015
Accepted for publication 1 August 2015
Published 18 September 2015 Volume 2015:5 Pages 305—315
Checked for plagiarism Yes
Review by Single anonymous peer review
Peer reviewer comments 6
Editor who approved publication: Dr John Martignetti
Conor V Dolan,1,3 Michel G Nivard,1,3 Jenny van Dongen,1,3 Sophie van der Sluis,2 Dorret I Boomsma,1,3,4
1Department of Biological Psychology, Netherlands Twin Register, VU University Amsterdam, 2Section Complex Trait Genetics, Department of Clinical Genetics, VU Medical Center, 3EMGO+ Institute for Health and Care Research, VU University Medical Center, 4Neuroscience Campus Amsterdam, Amsterdam, the Netherlands
Abstract: The epigenetic effects of cytosine methylation on gene expression are an acknowledged source of phenotypic variance. The discordant monozygotic (MZ) twin design has been used to demonstrate the role of methylation in disease. Application of the classical twin design, featuring both monozygotic and dizygotic twins, has demonstrated that individual differences in methylation levels are attributable to genetic and environmental (including stochastic) factors, with the latter explaining most of the variance. What implications epigenetic sources of variance have for the twin modeling of (non-epigenetic) phenotypes such as height and IQ is an open question. One possibility is that epigenetic effects are absorbed by the variance component attributable to unshared environmental. Another possibility is that such effects form an independent source of variance distinguishable in principle from standard genetic and environmental sources. In the present paper, we conceptualized epigenetic processes as giving rise to randomness in the effects of polygenetic influences. This means that the regression coefficient in the regression of the phenotype on the polygenic factor, as specified in the twin model, varies over individuals. We investigate the consequences of ignoring this randomness in the standard twin model.
Keywords: classical twin design, epigenetics, methylation, parameter randomness, heritability
The classical twin design is an important tool in the study of the genetics of complex (non-Mendelian, polygenic) traits.1,2 It involves the inference of genetic and environmental effects from the comparison of the phenotypic correlation among monozygotic (MZ) and dizygotic (DZ) twins. A derived design is the discordant MZ (DMZ) twin design, which involves the study of MZ twins, who are phenotypically discordant. The discordancy may pertain to a binary disease status, ie, affected versus non-affected status, or to a metric phenotype, such as birth weight or intelligence. The DMZ design is a case-control design, with closer than usual matching between cases and controls, as the MZ twins are clones, and share many environmental influences, including prenatal environment. In contrast to the classical twin design, which is used to decompose phenotypic variance in genetic and environmental components, the DMZ design is used to identity unshared effects that may shed light on the causes or consequences of the MZ discordancy with respect to the phenotype of disease of interest.3
One might think that the advent of (increasingly cheaper) high-throughput genotyping technologies would have reduced the role of the twin design in the study of complex traits. In one sense this is indeed the case: the availability of large volumes of common genetic variants (GVs; notably, single nucleotide polymorphisms; SNPs) has enabled researches to go from estimating total genetic variance components using twins to estimate the variance explained by the measured SNPs,4,5 and to identify the GVs contributing to the genetic variance in genome-wide association studies (GWAS).6 However, the twin design has remained relevant given the shift toward the study of genome-wide gene expression and its epigenetic regulation. The twin design has been used to determine the contributions of genetic and environmental influences to individual differences in gene expression7 and epigenetic marks, such as cytosine methylation, which are relevant to expression.8 The DMZ twin design, if anything, has gained relevance given the availability of measured (genome-wide) gene expression and epigenetic data.9,10 Such data can be mined for causes or consequences of discordancy.11,12 This extends the scope of unshared effects, the focus of the DMZ twin design, from the environmental to the genetic realm, including both structural differences affecting the DNA sequence (point mutations, de novo copy number variation (CNV) differences, and aneuploidy13–15) and functional differences (gene expression and epigenetic marks). The presence of such differences emphasizes the fact that twins may be MZ, but are not necessarily genetically identical.16 Studies of whole genome sequencing in MZ pairs suggested that sequence-level differences are rare.17,18 A recent review on de novo mutations concluded that post-zygotic de novo mutations are rare.19
Studies showing that complex-trait-associated variants identified through GWAS are largely enriched in regulatory regions of the genome suggest that variation in transcriptional control in relevant tissues plays a key role in individual differences in complex traits. The role of epigenetics and gene expression in complex phenotypes raises the question of how such effects should be represented in the twin model. The genetic effects in the twin model are based on the biometric model, which relates genotypes to a fixed genotypic effect and derives a polygenic effect from the summation of multiple fixed genetic effects.20 Environmental effects are defined as effects not genetic, where shared effects account for any phenotypic resemblance exceeding that attributable to allele sharing, and unshared effects form a residual term of all non-genetic effects that cause differences between twins. As such, the twin model comprises little more than a fixed effects regression model in which a given phenotype is regressed on latent genetic and environmental variables. This model does not include an explicit account of how environmental effects impinge on the phenotype. Epigenetic processes are of interest in this regard, because internal and external environmental causes of methylation, for instance, provide a basis for an explicit account of how environmental factors may exert their influences.21 For instance, Castillo-Fernandez et al10 note that
use of epigenetic markers of environmental risk would greatly improve our understanding of the molecular basis of disease, as many complex traits have an environmental risk component that is often difficult to define and assess. Therefore, using epigenetic markers of environmental disease risk would help to identify environmentally driven disease mechanisms, including gene–environment interactions.
Given that epigenetic control of gene expression is a function of stochastic events, environmental effects (shared and unshared), and genetic effects,8,22–24 what bearing does this have on the interpretation of the results of standard twin studies?
The aim of the present paper is to consider the implications of the results of recent genetic and epigenetic studies based on the DMZ design and the twin design. Specifically, we ask how we should represent epigenetic effects in the classical twin model, and what the implications are for the results obtained with the classical twin design. One interpretation of epigenetic effects on gene expression is that they form a third source of phenotypic variance, which in humans cannot be distinguished from unshared environmental effects.25,26 An alternative interpretation, which takes into account the possibility that epigenetic effects may in part be under genetic control, is that they form a distinct source of variance besides the traditional genetic effects attributable to variation in the DNA sequence.23,27 We present a conceptualization of epigenetic effects based on moderation of genetic effects. Specifically, we identify epigenetic effects as sources of interaction between the effect of genotype (DNA sequence variation) and any other effect (environmental, stochastic, and genetic), which gives rise to changes in the effects of the genotype on the phenotype by affecting gene expression.
The outline of this paper is as follows. We present the classical twin design briefly, outlining the fact that the twin model can be viewed as a fixed effects model. Subsequently, we review twin studies of methylation levels and discordant twin studies of the role of methylation in disease. As methylation is viewed as a source of variation in gene expression, we propose that variation in methylation levels may give rise to randomness in the genetic parameters in the twin model. We consider the consequences of ignoring this randomness in the twin design. Specifically, how does ignoring this randomness affect the results of a twin study?
The classical twin design
The classical twin design allows one to regress phenotypic scores (y) on a set of latent (unobserved) variables, so as to determine the proportion of phenotypic variance that is explained by the latent variables. In many twin studies, the latent variables comprise the additive polygenic variable (A), shared environmental (C), and unshared environmental variables (E), and the regression model is
where y represents the phenotype of interest, subscript i denotes twin pair, subscript j denotes twin member within a pair, b0 is the intercept, and ay, cy, and ey are regression coefficients. As the variables Ay, Cy, and Ey are unobserved, we cannot observe their means and variances. This poses no problem, as we can impose a scale by standardization (zero mean and unit variance). Sometimes, a non-additive genetic factor, referred to genetic dominance (Dy), is modeled instead of Cy. This choice (Dy or Cy) is usually based on the inspection of the twin correlations. Whereas Cy increases the resemblance of both MZ and DZ pairs, Dy increases the resemblance of MZ pairs more than that of DZ pairs. The classical twin design is not possible to include both Cy and Dy as this model (including Ay and Ey) is not identified.1,20,28 To ease presentation, we assume that Dy is absent. Assuming Ay, Cy, and Ey are uncorrelated, we have the following decomposition of the variance of the phenotype y:
where the phenotypic variance, σ2(y), attributable to unshared environmental effects (Ey) may include (measurement) error variance. The heritability, denoted hy2, is the standardized genetic variance component, ie, hy2 = ay2/(ay2 + cy2 + ey2). This decomposition is identified in the twin design given the assumptions that the correlation between Ay of the first (Ay1) and second twin (Ay2) members of a twin pair, cor(Ay1,Ay2), equals one in MZ twins and one half in DZ twins. The latter is the expected proportion of alleles shared identically by descent in full sibs under the assumption of random mating.29 The former follows from the working assumption that the MZ twins are genetically identical (but see Czyz et al14,15). Furthermore, we have cor(Cy1,Cy2) = 1 and cor(Ey1,Ey2) = 0 by definition of shared and unshared environmental effects. That is, the shared environmental influences C contribute to similarity between members of a twin pair, while the unshared influences contribute to differences. We thus can express the twin covariances cov(yi1yi2) as 0.5*ay2+cy2 in DZ twins and ay2+cy2 in MZ twins, which along with Equation 2, allows us to estimate the parameters ay, cy, and ey. This is typically done by means of genetic covariance structure modeling using maximum likelihood estimation.30
In Equation 1, the phenotype y is assumed to be continuous. However, a binary phenotype, such as disease status, can be modeled in essentially the same way using the liability threshold model.31 In this model, we assume that the binary disease status is the manifestation of an underlying complex liability dimension (aka vulnerability). The liability is a continuous latent phenotype which is related to Ay, Cy, and Ey, as in Equation 1. Disease status depends on the liability in that a score on the liability beyond a given threshold value is associated with a diagnosis of being affected (eg, suffering major depression). The threshold value itself is a function of diagnostic criteria, and therefore possibly arbitrary. In the twin model of a binary variable, Equations 1 and 2 are applied to the bivariate liability, with the identifying constraint that the variance of the liability equals one (ie, ay2 + cy2 + ey2 = 1). This model is consistent with the view of disease etiology as a function of polygenic and environmental effects. It should be noted that relatively high liability heritability does not necessarily imply high concordance.32 For instance, given a liability h2 of 0.80 and a disease prevalence of 1% (approximately the situation concerning schizophrenia), we expect to observe ~1.25% of the MZ twin pairs to be discordant, and the MZ proband-wise concordance rate to equal ~37.6 (DZ twin values are ~1.83% and ~8.66, respectively). The relatively low proband-wise concordance rate in combination with high heritability is consistent with the additive model given in Equations 1 and 2. Although low concordance is often interpreted as indicative of possible gene–environment interplay, a seemingly low proband concordance rate neither implies, nor rules out, such an interplay.
The DMZ design directly follows from the twin design. Any difference between MZ twins must be attributable to unshared effects (Ey), as genetic effects and shared environmental effects are a source of resemblance, not of difference. However, as noted above, unshared effects relevant to discordance may include genetic features.14,15 In the case of a metric (continuous) phenotype, MZ phenotypic differences can be tested for association with differences in other variables by means of a regression model.33 As explained in Gurrin et al34 this model need not actually be limited to MZ twins, as the model can be extended to include DZ twins. In the case of a binary phenotype (eg, disease status), a logistic regression model or MacNemar’s test can be used.
The fixed effects in the twin model
We view the twin model (Equation 1) as a fixed effects model, in the sense that the regression parameters ay, cy, and ey are assumed to be invariant over individuals. This assumption pertains to the ideal situation, in which the model (Equations 1 and 2) holds with respect to all individuals in the population of interest (in statistical terms, this implies that the data are identically and independently distributed35). This fixed effects aspect of the model extends to the biometric model underlying the polygenic variable A, as this is the summation of individual effects of many GVs. In this model, each genotype at a given locus is assigned a genotypic effect. For example, in the case of a diallelic locus k (alleles Bk, bk), the effects are μk + βk (BkBk), μk + δk (Bkbk) and μk − βk (bkbk), where μk is the midparent value, βk is the homozygote effect, and δk is the dominance deviation, which we assume to be zero here. Here, the effect βk, which equals the regression coefficient in the regression of yk on locus k (where BkBk, Bkbk, and bkbk are coded −1, 0, and 1, respectively), is invariant over individuals, but βk (k=1 … K) is variable over the K loci relevant to the phenotype.
The possibility that the regression parameters in the twin model may differ as a function of a covariate is generally recognized. For instance, the twin model can be extended to include opposite-sex DZ twin pairs, alongside male and female MZ and DZ twins.1 This extended model can be used to test the (arguably epigenetic) hypothesis that ay varies over sex, ie, the effect of genes on the phenotype of interest varies with sex, giving rise to a sex by genotype interaction. The presence of DZ opposite-sex twins allows for the additional test of the hypothesis that sex differences in the parameter ay originate in the effects of different genes in males and females, or in the sex moderation of the effects of a single set of genes.1 Purcell36 and Zheng and Rathouz37 presented a general moderation model, in which the genetic and environmental effects on a given measured phenotype (say, depression) may be moderated by an environmental phenotype (say, marital status), while taking into account the possibility that the latter may itself be subject to genetic effects. Although it is standard fare to test the hypothesis that the parameters in the twin model vary with respect to a well defined and measured covariate, we conceptualize epigenetic effects on the parameter ay due to a latent (individual level) index of epigenetic influences.
Epigenetics: cytosine methylation
The modern definition of epigenetics emphasizes the regulation of gene expression that can be transmitted mitotically independent of DNA sequence. For instance, Tan33 states: “In a broad sense, the epigenetic control over gene activity involves multiple molecular mechanisms (…), all of which act as ‘volume controls’ that up- or down regulate a gene’s expression without changing its DNA sequence”. These molecular mechanisms, which include histone modifications, DNA methylation, and non-coding RNAs,38 are key mechanisms in establishing tissue identity.39 Most studies of epigenetics in humans have focused on cytosine methylation, which occurs mainly at cytosines in cytosine–phosphate–guanine (CpG) dinucleotides. The effect of methylation on expression is position dependent: gene body CpG methylation is associated with transcriptional activity, while CpG methylation at promoter regions generally represses this activity.40 The expression level of genes may be particularly related to the methylation level of their enhancers.41 Methylation results in changes in gene expression, which are mitotically heritable, but potentially reversible. While most CpGs are methylated,42 unmethylated CpGs may occur in clusters called CpG islands, which are present in the 5-prime regulatory regions of approximately 70% of human genes. Finally, methylation plays an important role in the silencing of repetitive elements such as transposons.43 Individual differences in cytosine methylation are well established in clinical studies and twin studies. One source of individual differences in cytosine methylation is defects in imprinting, where normally the allele originating from one parent is silenced by means of methylation. A second source is epimutations, giving rise to variation in cytosine methylation, which may be due to stochastic errors (during mitosis), environmental, and genetic factors. Genetic factors associated with individual epigenetic differences that have been identified thus far include SNPs (ie, methylation quantitative trait locus,44,45 Bell et al46) and structural variation.47,48
The classical twin design and the DMZ design are recognized as a useful tools in epigenetic studies.3,11,23,27,49,50 The DMZ design allows for the assessment of the disease association with differences in methylation, which is not confounded by genomic sequence variation. We note that post-zygotic mutations may undermine this design, as these introduce sequence variation between MZ twins at the loci of the mutations. Dal et al51 estimated that 5%–25% of the de novo mutations were post-zygotic in a healthy MZ twin pair, where the overall de novo mutation rate is 0.82–1.70×10−8 per base pair per generation (in the twins, this was 1.31×10−8 and 1.01×10−8). Acuna-Hildago et al52 estimated that each individual carries 2–7 post-zygotic de novo mutations. Zwijnenburg et al,12 Bell and Saffery,23 Czyz et al,14 Castillo-Fernandez et al,10 Tan,33 and Tan et al49 presented reviews of the results of genome wide and candidate gene (or region) epigenetic studies of DMZ twins. These studies concern a wide variety of medical disorders (eg, multiple sclerosis, asthma, Alzheimer’s disease, breast cancer, systematic lupus erythematosus, type I diabetes, and psoriasis), psychiatric disorders (schizophrenia, major depressive disorder, bipolar disorder, and autism spectrum disorders), psychological traits (ADHD [attention deficit hyperactivity disorder], risk-taking, and antisocial behavior), and physical traits (birth weight and Body Mass Index). Although these studies vary greatly in the scope and method of assay, tissue sample, statistical methods, and sample size, virtually all demonstrated (and often replicated; Castillo-Fernandez et al10) epigenetic differences relating to methylation.
Czyz et al,14 Bell and Saffery,23 and Bell and Spector27 reviewed the twin studies of methylation, based on either the classical twin design, or on MZ twins only. The aim of these studies is to establish whether genetic factors (DNA sequence variation) contribute to individual differences in methylation levels. A well-established finding is the increase with age of MZ epigenetic discordance, which may reflect the accumulation of epimutations due to stochastic errors and environmental influences.53 This has been demonstrated cross-sectionally and longitudinally.54 Chorionicity is implicated as mono-chorionic MZ twins are more discordant than dichorionic MZ twins55 (Saffery et al),56 which suggest that late twining is a factor. Heritability of methylation level is both tissue dependent and genomic region dependent.55,57
van Dongen et al22 conducted a genome wide study of methylation based on buccal cells in ten 8–19-year-old MZ twin pairs. The overall average MZ correlation was 0.54 at CpG displaying high variability. The correlation varied as a function of region with higher correlations at CpG located in CpG islands, and lower correlations (ie, a greater role of unshared effects, including stochastic errors) in CpG poor regions. Bell et al23 reported a genome-wide average h2 of 0.18, based on an adult twin (30−80 years) study of DNA methylation in blood in 137 females twin pairs. A drawback of the twin design in this context is that the MZ correlation may vary due to intrauterine factors24 and epigenetic starting point in the zygotic stage.21 McRae et al8 studied genome-wide methylation measures in peripheral blood lymphocytes, in an extended twin design, including adolescent twins, their siblings, and parents (614 individuals in 117 families). This design is appreciably less dependent on MZ data as it involves many additional relationships. McRae et al reported an average h2 of 0.20, with ~65% of the 417,069 probes displaying h2 greater than zero.8 Shared environmental influences explained very little variation in methylation levels.
In summary, epigenetic individual differences, pertaining to cytosine methylation, are largely attributable to unshared factors (unshared environmental effects and stochastic errors), and to a lesser extent to genetic factors. The relative contributions of genetic and environmental factors to variance vary with age, tissue and (genomic) region. The study of MZ twins discordant for disease has demonstrated the role of epigenetics in disease etiology.
Epigenetics as a source of random effects
As mentioned earlier, the twin design essentially allows us to carry out a regression analysis with the phenotype of interest (say, intelligence) as the dependent variable and the unmeasured genetic and environmental variables as independent variables. Supposing that we knew that individual differences in methylation were relevant to this phenotype, how should we represent these effects in the twin model, and, if ignored, what effect would they have on the decomposition of variance? The literature includes several different representations. Epigenetics has been viewed as a third source of variance, distinct from genetic and environmental sources. Given that this third source of variance is indistinguishable from unshared effects, it is supposed to increase the unshared environmental variance in the twin model.25,26 This third source view was inspired by Gartner’s famous experiments, which demonstrated that isogenic organisms raised in homogeneous environmental circumstances display appreciable phenotype variance, presumably due to epigenetic effects stemming from stochastic errors.58 Czyz et al14 questioned this interpretation for the following reason:
(…) the concept of stochastic epimutations as the third source of variation in opposition to genetic and environmental effects has important limitations because it is not evident that the random faults in methylation maintenance are not themselves genetically determined (…), or of environmental origin.
This does not detract necessarily from the third source interpretation in isogenic organisms in homogenous or enriched environments.59 However, it seems unlikely that the representation in terms of a third source of variance mimicking unshared environmental effects is accurate in outbred human populations. In outbred populations, we have to incorporate epigenetic effects within the effects of DNA sequence differences and environmental differences.
Bell and Spector27 and Bell and Saffery23 represent DNA methylation effects explicitly as an independent source in the twin model, ie, a latent variable denoted M along side the latent variables Ay, Cy, and Ey. The twin correlation with respect to M may vary depending on age, tissue, etc. However, as the correlation is not generally zero, the M cannot be accommodated as a component of Ey. This representation acknowledges the fact that shared environmental factors (eg, chorionicity and age) and shared genetic factors may contribute to epigenetic individual differences. However, in our view the representation of epigenetic effects as a distinct source of variance does not sit well with the role of epigenetics as pertaining to gene expression regulation. That is, this representation assigns epigenetics the role of a main effect on the phenotype, alongside Ay, Cy, and Ey. But cytosine methylation is an effect on the expression of DNA sequence, and as such a source of moderation of genetic effects. We consider such moderation to be interpretable as an interaction between the genetic effects (originating in DNA sequence variation) and the causes (genetic, stochastic, and environmental) of epigenetic effects.
As explained earlier, in the standard twin model, we consider the parameter ay as a fixed parameter, ie, not varying between members of the population of interest.35 Now, given that methylation influences gene expression,60,61 and expression influences genetic effect, we propose that the locus of the epigenetic effects is the parameter ay, ie, the effects of DNA sequence variation at loci relevant to the phenotype y. In terms of the regression model, this implies that the parameter ay is not fixed, but varies over individuals as a function of epigenetic individual differences at the loci relevant to the phenotype y. Simplifying the model by discarding shared environmental effects, we have
where ayij is a random parameter, as indicated by the subject subscripts (as above: i for twin pair and j for member of a twin pair). This parameter is now effectively a (latent) phenotype subject to its own decomposition:
where ayij is moderated by the genetic variable Aa (ie, genes influences methylation levels) and (external and internal) environmental variable Ea, which includes stochastic errors. The model is shown in Figure 1. Note that we allow Ay and Aa to be correlated (parameter rA), and we allow Ey and Ea to be correlated (rE). However, the genetic variables (Aa and Ay) are uncorrelated with the environmental variables (Ea and Ey), excluding any kind of genotype–environment covariance. The effect of Aa and Ea on the parameter ay may be mediated by epigenetic marks such as methylation, which in turn influences expression levels.61 The exact causal chain from Aa and Ea to ay is important, but beyond our primary question. That is, assuming the parameter ay is actually random due to individual differences in epigenetic and environmental processes and outcomes, how will such randomness affect the results of a standard twin model, where ay is treated as fixed? We address this issue by means of a small simulation study.
The consequences of ignoring random effects in the twin model
The model in Figure 1 reduces to the standard twin model if ay is a fixed value, ie, if the parameters aa and ea are zero. In that case, the additive genetic, unshared environmental effects (AE) model will fit the twin data well, and the h2, equaling ay2/(ey2 + ay2), will accurately reflect the proportion of phenotypic variance attributable to additive genetic effects. By introducing the random effects by specifying non-zero aa and ea, we ask: how do such effects, if ignored in the twin model, affect the 1) phenotypic variance, 2) the twin correlations, 3) the estimate of h2, and 4) the overall goodness of fit of the AE model. To this end, we carried out a small simulation study in which we simulated data according to Figure 1, with the parameter ay random. We then fitted the standard twin model (ay a fixed parameter; ie, ignoring the randomness) to determine the effects of the randomness on the results.
We set mean (ay)=√0.5 and ey=√0.5, and we set σ2(ay) =0.025, 0.05, 0.075, and 0.1. Note that randomness in ay implies randomness in the h2 of the phenotype y (Figure 1). Given these settings, the mean h2 is 0.5 and the standard deviation in h2 is approximately 0.11, 0.15, 0.18, and 0.20 (corresponding to σ2(ay) =0.025, 0.05, 0.075, and 0.1, respectively). We define h2 as the proportion of phenotypic variance due to genetic factors in a population of individual who share the approximately same methylation levels. The h2 of ay was chosen to equal 0.2. This relatively low value is inspired by the fact that the genetic contributions to phenotypic variance in methylation levels are small.8,23 We set the values of rA and rE to equal −0.3, 0, or 0.3. We suppose that these correlations could assume a non-zero value if the environmental (Ey) or genetic effects (Ay) on phenotype y also have a direct or an indirect effect on methylation. In addition, the genetic correlation may be due to linkage disequilibrium. The simulation results were obtained in analyses of 25,000 MZ and 25,000 DZ pairs. This large sample size ensures precise parameter estimates, and very great power to detect model misfit in terms of the likelihood ratio test (χ2(4)) of the standard AE twin model. This is the test of the AE model versus the saturated model, in which the MZ and DZ covariance matrices are estimated freely (ie, two-parameter model vs six-parameter model; hence, a 4 df test). We simulated the data in R,62 and fitted the AE model by means of maximum likelihood estimation using the OpenMx library in R.63 The R script used in this simulation is available on request. This design gives rise to 36 parameter configurations (3 values of rE, 3 values of rA, 4 values of σ2(ay)). We limit our discussion to the 18 parameter configurations shown in Table 1 (other results are available on request) with σ2(ay) equaling either 0.025 or 0.10. We include in the table the values of the correlation rA and rE (columns 1 and 2), the variance σ2(ay) (column 3), the phenotypic variance (σ2mz) and correlation of the MZs (rmz) (columns 4 and 5) and DZs (σ2mz and rdz; columns 6 and 7), the estimate of the additive genetic and environmental variance components (A and E; columns 8 and 9), the estimate of h2 based on A and E (column 10), the χ2 goodness-of-fit index of the AE model (column 11), and the expected variance in h2 (column 12), given the variance in ay (shown in column 3).
The third row in Table 1 shows the expected values of the parameters given σ2(ay) =0. The values in the subsequent row show the observed values given σ2(ay) =0.025 or σ2(ay) =0.10. Note that in generating the data, σ2(ay) is 0.025 or 0.10, but the results in columns 8–11 were obtained by fitting the standard AE twin model, in which ay is not a random parameter. The χ2 goodness of fit of the model increases with increasing σ2(ay), ie, meaning that the model fit is poorer as σ2(ay) increases. However, given σ2(ay)=0.025 the effect on the χ2 is negligible (Table 1, column 11). Given σ2(ay) =0.1, the effect is evident, given the expected value of the likelihood ratio of 4. However, as the sample size is large (50,000), we consider the increase in the χ2 to be minor (given α=0.01, the critical value is 13.28). The phenotypic variance increases from 1 (σ2(ay) =0.0) to approximately 1.1 (σ2(ay) =0.1). The estimates of the twin correlation are slightly lower than expected (ie, 0.5 and 0.25): the MZ correlations vary between 0.468 and 0.494, and the DZ correlation varies between 0.226 and 0.253. As a consequence, the estimate of the heritability is slightly lower than 0.5 (0.470–0.491). The values of rA and rE appear to have little effect on the results. We note that the effects of randomness in ay are visible given σ2(ay) =0.1, but overall fairly minor. The results in Table 1 were obtained with h2 of ay set to equal 0.2. We also considered a value of 0.5, but the results were comparable to those shown in the table (additional results available on request). We return to the implications of these results in the discussion.
The aim of this paper was to consider the classical twin model in the light of epigenetic studies of methylation. Given that methylation is a source of variation in gene expression, we represented this in the twin model as randomness in the genetic regression parameter (parameter ay in Equation 1; see also Figure 1). The model is simplistic as it represents only a snapshot of a process that may involve feedback from phenotype to the environment,64 which in turn may influence methylation and gene expression. Yet, this representation is consistent with the biological view of methylation as a source of interaction, and with the definition of interaction in biometrical genetic modeling (ie, as a source of heteroskedasticity).28,65 That is, methylation is represented as a cause of variation (ie, randomness) in the effects that genes have on the phenotype of interest. This representation has the advantage that it informs a twin model (Figure 1), which we used in a simulation study to assess the effects of ignoring such randomness. We assumed randomness in the genetic expression at many loci, gives rise to randomness in the sum of the effects at these loci, ie, the polygenic variable Ay. However, we did not assume that each locus contributing to Ay is necessarily subject to the effects of methylation.
On the basis of our small simulation study, we conclude that randomness in ay has discernible, but small, effects on the results of the classical twin model (in which we ignore this randomness). Specifically, the results reflect quite accurately the role of Ey in terms of the parameter ey, and the role of Ay in terms of the mean value of ay in a well-fitting model. So assuming randomness in the parameter ay, the twin model will reflect quite accurately the average genetic effect in terms of the h2. The absence of appreciable misfit in terms of the likelihood ratio test implies that this test is “blind” to the misspecification of treating the random parameter ay as fixed. Thus, the twin model produces a good estimate of the average h2, but sheds no light on the possible standard deviation of h2 arising from epigenetic effects. The variation in the h2 was considerable: the values of given σ2(ay)=0.1 implies that standard deviation of the h2 was 0.20. We conclude that sensible results obtained in a well-fitting twin model cannot be taken to mean that the assumption that the genetic parameter is fixed (as mentioned earlier) is correct. Related to this is the fact that the twin model cannot detect genetic heterogeneity, ie, the possibility that a disease have several distinct genetic causes,66 which are hard to distinguish phenotypically. For instance, the high heritability of schizophrenia (~0.80) neither implies nor rules out genetic heterogeneity.
Our results were based on quite arbitrary parameter settings and limited to an AE model. We tried other settings, which we considered reasonable, but obtain largely the same results. We included C effects on the phenotype, but these did not add anything to the results as we obtained the same results of good estimates of the effect of E and C, and good estimates of the average effects of A. We believe that the representation of epigenetic processes as a source parameter randomness (in ay) in the twin model is plausible, and we are confident concerning the effects of such randomness (if ignored) on the results of standard twin modeling. However, we make no predictions concerning the ultimate importance of epigenetic mechanisms in twin studies in general – the importance is likely to depend in part on the phenotype. A version of the twin model that allows for the estimation of the variance in ay would be useful to obtain an indication of the magnitude this effect. Such a model would definitely require additional information, such as repeated phenotypic measures, or (more demandingly) measures of methylation which are relevant to the phenotype of interest.
The classical twin design has been instrumental in demonstrating the role of genetic and environmental influences on a wide range of phenotypes. The accuracy of variance components obtained in twin studies depends on the validity of the many assumptions associated with this design.1 While the twin model can be extended in various ways2,67 to obtain more accurate estimates, the fact remains that these estimates answer the first in a sequence of questions. Immediate follow-up questions are which GVs contribute to polygenic components, and what is the nature of gene–environment interplay. These follow-up questions are currently being addressed thanks to the present means to measure genetic and epigenetic variables in large volumes. It is clear from many recent articles that the classical twin design and the DMZ design remains important tools in these studies.3,23,27,33,49 In this connection, it is interesting to note that the twin design, which has been criticized for providing no information on gene–environment interplay,68 now is recognized to be an important tool in studying this interplay, at least insofar as it concerns epigenetics.
In conclusion, we have represented cytosine methylation as an epigenetic source of randomness in the genetic regression coefficient in the twin model. While we have emphasized methylation, we note that other epigenetic processes may moderate effects of DNA sequence variation (eg, DNA hydromethylation69 and gene expression quantitative trait locus7). The actual source of moderation, or parameter randomness, is immaterial to the conclusions of our simulation. However, as we currently lack a model to evaluate this randomness in the twin model, we cannot say how important it is in terms of effect size. Possibly, the incorporation of methylation data in twin modeling along the lines of Purcell36,37 may provide the means to quantify this randomness. It is interesting to note that such randomness, if appreciable, has implications for phenotypic modeling general. For instance, if two phenotypes are correlated due to pleiotropic genetic effects, and the effects are random (as defined in this paper), the phenotypic correlation will presumably also be random. The implications of epigenetic processes may therefore be of interest to other fields that focus on individual differences, such as psychology.
We gratefully acknowledge funding from the European Research Council (ERC-230374) Genetics of Mental Illness. JvD is supported by ACTION under the European Union Seventh Framework Program (FP7/2007-2013) under grant agreement no 602768. MN is supported by the Royal Netherlands Academy of Science (KNAW Academy Professor Prize PAH/6635). SvdS is financially supported by the Netherlands Scientific Organization (NWO/MaGW: VIDI-452-12-014).
The authors report no conflicts of interest in this work.
Eaves LJ. Inferring the causes of human variation. J R Statist Soc Ser A. 1977;140(3):324–355.
Eaves LJ, Last KA, Young PA, Martin NG. Model-fitting approaches to the analysis of human behaviour. Heredity. 1978;41(3):249–320.
van Dongen J, Slagboom PE, Draisma HHM, Martin NG, Boomsma DI. The continuing value of twin studies in the omics era. Nat Rev Genet. 2012;3(9):640–653. doi:10.1038/nrg3243.
Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.
So HC, Li M, Sham PC. Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study. Genet Epidemiol. 2011;35:447–456. doi:10.1002/gepi.20593.
Neale BM, Ferreira MAR, Medland SE, Posthuma D. Statistical Genetics: Gene Mapping through Linkage and Association. London: Taylor and Francis; 2007.
Wright FA, Sullivan PF, Brooks AI, et al. Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 2014;46(5):430–437.
McRae AF, Powell JE, Henders AK, et al (2014). Contribution of genetic variation to transgenerational inheritance of DNA methylation. Genome Biol. 2014;15:R73.
Hogenson TL. Epigenetic as the underlying mechanism for monozygotic twin discordance. Med Epigenet. 2013;1:1–13.
Castillo-Fernandez JE, Spector TD, Bell JT. Epigenetics of discordant monozygotic twins: implications for disease. Genome Med. 2014; 60(6):1–16.
Boomsma D, Busjahn A, Peltonen L. Classical twin studies and beyond. Nat Rev Genet. 2002;3:872–882.
Zwijnenburg PJG, Meijers-Heijboer HEJ, Boomsma DI. Identical but not the same: the value of discordant monozygotic twins in genetic research. Am J Med Genet Part B. 2010;153B:1134–1149.
Gringras P, Chen W. Mechanisms for differences in monozygous twins. Early Hum Dev. 2001;64:105–117.
Czyz W, Morahan JM, Ebers GC, Ramagopalan SV. Genetic, environmental and stochastic factors in monozygotic twin discordance with a focus on epigenetic differences. BMC Med. 2012;10:93. doi:10.1186/1741-7015-10-93.
Czyz W, Ramagopalan SV. Molecular Genetic Causes of Discordance in Monozygotic Twins. In: eLS. Chichester: John Wiley & Sons. doi:10.1002/9780470015902.a0025033.
Haque FN, Gottesman II, Wong AHC. Not really identical: epigenetic differences in monozygotic twins and implications for twin studies in psychiatry. Am J Med Genet Part C. 2009;151C:136–141.
Ye K, Beekman M, Lameijer EW, et al. Aging as accelerated accumulation of somatic variants: whole-genome sequencing of centenarian and middle-aged monozygotic twin pairs. Twin Res Hum Genet. 2013;16:1026–1032.
Weber-Lehmann J, Schilling E, Gradl G, Richter DC, Wiehler J, Rolf B. Finding the needle in the haystack: Differentiating “identical” twins in paternity testing and forensics by ultra-deep next generation sequencing. Genetics. 2014;9:42–46.
Samuels ME, Friedman JM. Genetic mosaics and the germs line lineage. Genes. 2015;6:216–237. doi:10.3390/genes6020216.
Falconer DS, Mackay TFC. Introduction to Quantitative Genetics. 4th ed. Harlow: Pearson Prentice Hall; 1996.
Petronis A. Epigenetics as a unifying principle in the aetiology of complex traits and diseases. Nature. 2010;465(10):721–727. doi:10.1038/nature09230.
van Dongen J, Ehli EA, Slieker RC, et al. Epigenetic variation in monozygotic twins: a genome-wide analysis of DNA methylation in buccal cells. Genes. 2014;5:347–365. doi:10.3390/genes5020347.
Bell JT, Saffery R. The value of twins in epigenetic epidemiology. Int J Epidemiol. 2012;41:140–150. doi:10.1093/ije/dyr179.
Gordon L, Joo JE, Powell JE, et al. Neonatal DNA methylation profile in human twins is specified by a complex interplay between intrauterine environmental and genetic factors, subject to tissue-specific influence. Genome Res. 2012;22(8):1395–1406. doi:10.1101/gr.136598.111.
Molenaar PCM, Boomsma DI, Dolan CV. A third source of developmental differences. Behav Genet. 1993;23:519–524.
Plomin R. Commentary: Why are children in the same family so different? Non-shared environment three decades later. Int J Epidemiol. 2011;40:582–592. doi:10.1093/ije/dyq144.
Bell JT, Spector TD. A twin approach to unraveling epigenetics. Trends Genetics. 2011;27(3):116–125. doi:10.1016/j.tig.2010.12.005.
Jinks JL, Fulker DW. Comparison of the biometrical genetical, MAVA, and classical approaches to the analysis of the human behavior. Psychol Bull. 1970;73(5):311–349.
Visscher PM, Medland SE, Ferreira MAR, et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genetics. 2006;2(3):e41. doi:10.1371/journal.pgen.0020041.
Martin NG, Eaves LJ. The genetical analysis of covariance structure. Heredity. 1977;38(1):79–95.
Falconer DS. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann Hum Genet. 1965;29:51–76.
Smith C. Heritability of liability and concordance in monozygous twins. Ann Hum Genet. 1970;34:85–91.
Tan Q. Epigenetic epidemiology of complex diseases using twins. Med Epigenet. 2013;1:46–51. doi:10.1159/000354285.
Gurrin LC, Carlin JB, Sterne JAC, Dite GS, Hopper JL. Using bivariate models to understand between- and within-cluster regression coefficients, with application to twin data. Biometrics. 2006;62:745–751. doi:10.1111/j.1541-0420.2006.00561.x.
Molenaar PCM. On the limits of standard quantitative genetic modeling of inter-individual variation: extensions, ergodic conditions and a new genetic factor model of intra-individual variation. In: Hood KE, Halpern CT, Greenberg G, Lerner RM, editors. Handbook of Developmental Science, Behavior, and Genetics. Malden, MA: Blackwell; 2010:626–648.
Purcell S. Variance components models for gene–environment interaction in twin analysis. Twin Res. 2002;5(06):554–571.
Zheng H, Rathouz PJ. Fitting procedures for novel gene-by-measured environment interaction models in behavior genetic designs. Behav Genet. 2015;45:467–479.
Goldberg AD, Allis CD, Bernstein E. Epigenetics: a landscape takes shape. Cell. 2007;128:635–638.
Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330.
Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012;13:484–492.
Aran D, Sabato S, Hellman A. DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol. 2013;14:R21.
Bird A. DNA methylation patterns and epigenetic memory. Genes Dev. 2002;16:6–21.
Leonova KI, Brodsky L, Lipchick B, et al. P53 cooperates with DNA methylation and a suicidal interferon response to maintain epigenetic silencing of repeats and noncoding RNAs. Proc Natl Acad Sci. 2013;110:E89–E98.
Banovich NE, Lan X, McVicker G, et al. Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels. PLoS Genet. 2014; 10(9):e1004663. doi:10.1371/journal.pgen.1004663.
Gibbs JR, van der Brug MP, Hernandez DG, et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 2010;6:e1000952. doi:10.1371/journal.pgen.1000952.
Bell JT, Pai AA, Pickrell JK, et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biology. 2011;12:R10.
Colak D, Zaninovic N, Cohen MS, et al. Promoter-bound trinucleotide repeat mRNA drives epigenetic silencing in fragile X syndrome. Science. 2014;343(6174):1002–1005. doi:10.1126/science.1245831.
Cabianca DS, Casa V, Bodega B, et al. A long ncRNA links copy number variation to a polycomb/trithorax epigenetic switch in FSHD muscular dystrophy. Cell. 2012;149(4):819–831. doi:10.1016/j.cell. 2012.03.035.
Tan Q, Christiansen L, von Bornemann Hjelmborg J, Christensen K. Twin methodology in epigenetic studies. J Exp Biol. 2015;218:134–139. doi:10.1242/jeb.107151.
Petronis A. Epigenetics and twins: three variations on the theme. Trends Genet. 2006;22(7):347–350.
Dal GM, Ergüner B, Sagiroglu MS, et al. Early postzygotic mutations contribute to de novo variation in a healthy monozygotic twin pair. J Med Genetics. 2014;51:455–459.
Acuna-Hidalgo R, Bo T, Kwint MP, et al. Post-zygotic point mutations are an underrecognized source of de novo genomic variation. Am J Hum Genetics. 2015;97:67–74.
Fraga MF, Ballestar E, Paz MF, et al. Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci U S A. 2005;102:10604–10609.
Talens RP, Christensen K, Putter H, et al. Epigenetic variation during the adult lifespan: cross-sectional and longitudinal data on monozygotic twin pairs. Aging Cell. 2012;11:694–703.
Kaminsky ZA, Tang T, Wang AC, et al. DNA methylation profiles in monozygotic and dizygotic twins. Nat. Genet. 2009;41(2):240–245. doi:10.1038/ng.286.
Saffery R, Morley R, Carlin JB, et al. Cohort profile: The peri/post-natal epigenetic twins study. Int J Epidemiol. 2012;41(1):55–61.
Gervin K, Hammero M, Akselsen HE, et al. Extensive variation and low heritability of DNA methylation identified in a twin study. Genome Res. 2011;21:1813–1821.
Gartner K. A third component causing random variability beside environment and genotype. A reason for the limited success of a 30 year long effort to standardize laboratory animals? Lab Anim. 1990;24:71–77.
Freund J, Brandmaier AM, Lewejohann L, et al. Emergence of individuality in genetically identical mice. Science. 2013;340(6133):756–759. doi:10.1126/science.1235294.
Schadt EE, Lamb J, Yang X, et al. An Integrative Genomics Approach to Infer Causal Associations between Gene Expression and Disease. Nat Genet. 2005;37(7):710–717. doi:10.1038/ng1589.
van Eijk KR, de Jong S, Boks MPM, et al. Genetic analysis of DNA methylation and gene expression levels in whole blood of healthy human subjects. BMC Genomics. 2012;13:636. doi:10.1186/1471-2164-13-636.
R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available from: http://www.R-project.org/. Accessed August 28, 2015.
Boker S, Neale MC, Maes HH, et al. OpenMx: an open source extended structural equation modeling framework. Psychometrika. 2011;76(2):306–317.
Dolan CV, de Kort JM, van Beijsterveldt CEM, Bartels M, Boomsma DI. GE covariance through phenotype to environment transmission: an assessment in longitudinal twin data and application to childhood anxiety. Behav Genetics. 2014;44(3):240–253. doi:10.1007/s10519-014-9659-5.
Eaves LJ, Last K, Martin NG, Jinks JL. A progressive approach to non-additivity and genotype-environmental covariance in the analysis of human differences. Br J Math Statist Psychol. 1977;30:1–42.
Wray NR, Maier R. Genetic basis of complex genetic disease: the contribution of disease heterogeneity to missing heritability. Curr Epidemiol Rep. 2014;2014(1):220–227.
Keller MC, Medland SE, Duncan LE, et al. Modeling extended twin family data I: description of the cascade model. Twin Res Hum Genet. 2009;29:8–18.
Dolan CV, Molenaar PCM. A note on the scope of developmental behaviour genetics. Int J Behav Dev. 1995;18:749–760.
Wen L, Tang F. Genomic distribution and possible function of DNA hydroxymethylation in the brain. Genomics. 2014;104:341–346.
This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.