Background
The genetic architecture of autism spectrum disorder (ASD) remains uncertain, although there are putative models [
1]. One model posits that ASD is heterogeneous (heterogeneity model), that severe mutations in any of a large set of genes are sufficient to cause the disorder. Another emphasizes the major role played by common variation—shared by all of us to a greater or lesser extent (infinitesimal model)—in the documented high heritability of ASD [
2‐
6]. A hybrid model asserts that common and rare variations combine in some way, perhaps additively [
7], to confer liability [
1,
7,
8]. At the level of a population, common variation probably plays the dominant role in liability, whereas a rare mutation can make the largest contribution to liability for an individual who carries it [
9]. Still, our current understanding of the genetic architecture of ASD is unsatisfactory, especially regarding how common and rare variations jointly confer risk. This architecture is important because it has clinical consequences; for example, it could require more nuanced evaluation of recurrence risk. Establishing the exact nature of the interplay between common and rare risk variations will be challenging, however, because of the multiplicity of plausible models that could fit the current data.
The infinitesimal and additive models differ largely in the magnitude of variants’ impact on liability. Furthermore, because a strict infinitesimal model does not fit the empirical ASD data well, due to documented large effects from some rare variants, we only consider the additive model here. Notably, while heterogeneity and additive models are fundamentally different, they can share key elements, as a recent study by Oetjens and colleagues [
10] describes. For subjects carrying mutations for one of 11 rare genetic disorders, Oetjens and colleagues show that quantitative traits associated with these disorders vary substantially as a function of the common genetic variation they carry and they speculate that these rare and common variants could act additively to affect the traits. Yet, they note that it would be hard to distinguish additive from non-additive models even for these quantitative traits without large data sets (see also [
11,
12]). In the context of ASD and its binary diagnosis, distinguishing heterogeneity and additive models will be even more challenging.
To address this problem empirically, a sample of ASD subjects who have been characterized for rare and common variations is essential. Because rare, de novo potentially damaging mutations, henceforth PDVs, carry the most readily detectable signal for ASD association [
13‐
16], the ideal sample would be characterized for such PDVs. Following the tradition in human genetics, we call ASD subjects carrying such PDVs as “PDV carriers” and all other ASD subjects as “non-carriers.” Such well-characterized samples of the population are not common and none as yet are especially large, a limiting factor for any study. For the sample to be analyzed here, we combine data from three sources: subjects diagnosed with ASD from the Simons Simplex Collection or SSC [
17]; ASD and unaffected subjects from the Population-Based Autism Genetics and Environment Study or PAGES [
9]; and subjects from the Electronic MEdical Records and Genomics Network or eMERGE [
18], whom we assume have not been diagnosed with ASD and are non-carriers.
These data could be analyzed in at least two ways. One approach would be to use polygenic risk scores (PRS), which are based on common variants putatively affecting liability [
19]. Typically, these variants are identified from genome-wide association studies (GWAS) and the PRS for each subject is computed as a weighted sum of the count of risk alleles they carry. Then, values of the PRS in ASD PDV carriers (ASD-PDV) and ASD non-carriers (ASD-NO-PDV), as well as unaffected subjects, can be contrasted to assess how common and rare variations jointly confer risk. An elegant version of this approach is the pTDT or polygenic Transmission Disequilibrium Test [
7], which requires parental genotypes. Because only a portion of our data have parental genotypes, here we concentrate on the PRS.
The PRS is only as effective as the information arising from GWAS, which for ASD is still relatively limited compared to other phenotypes (see Grove and colleagues [
6] versus the Psychiatric Genomics Consortium for schizophrenia and bipolar disorder [
20,
21]). For this reason, we emphasize here another approach to developing a score, which will be based on the theory of Genomic-Best Linear Unbiased Prediction (G-BLUP) [
22‐
24]. The ideas behind G-BLUP are similar to the PRS. Rather than using GWAS results, G-BLUP develops a predictive model to distinguish case versus control subjects, genetically, using genetic variation across the genome (see Additional File
1 for more details on G-BLUP). Here, we call this genomic prediction “GP.” If effective, the PRS and GP will not be strictly independent. Yet, if they were not
strongly dependent, they could be combined to produce an even more effective predictor. To allow for this possibility, we tune GP to the population samples used in this study, whereas we use PRS based on different samples. Moreover, because we use a pruning and thresholding approach to the PRS, we chose a threshold that forced the number of SNPs included in the score to be relatively sparse, yet informative, and thereby limiting its correlation with GP. Note that our purpose here is not to compare the predictions of GP and PRS, the former tuned to the population sample and the latter not, but rather to develop predictors useful for examining how common and rare variations jointly confer risk of ASD.
We use these approaches to document (1) that the burden of risk variants carried by ASD subjects is stochastically greater than that carried by control subjects; (2) that carriers of rare, PDVs bear a burden intermediate between non-carrier and control subjects; (3) that both PDV carriers and non-carriers have a stochastically greater burden of common risk variation than control subjects; and (4) that the effects of common and rare variants on liability for ASD likely combine additively. Regarding (3), it appears that ASD subjects carry a substantial burden of common risk variation, even if they also carry a rare PDV affecting risk. For (4), although common and rare risk variations likely act additively, the resolution imposed by the current data is coarse.
Discussion
Here, we asked how rare and common risk variation jointly affect liability for ASD. We analyzed two samples characterized for both types of variation. Based on genotypes of common variation, we computed a small set of risk scores, each of which is likely to describe a portion of the genetic risk of ASD attributable to common variation. We also computed a weighted average of these scores, WGRS, which tended to perform better than any single score at differentiating ASD and unaffected subjects (Fig.
3) and at differentiating ASD subjects who carried rare PDVs likely to affect risk—PDV carriers (Fig.
5)—from ASD subject who were not known to carry such variants (non-carriers). By contrasting patterns of the expected and observed burden of common risk variation in PDV carriers and non-carriers (Fig.
4), we conclude that the preponderance of evidence suggests that rare and common risk variation combine additively in their effects on ASD liability. This agrees with conclusions from other researchers [
7,
10,
42].
It is worthwhile emphasizing, however, that the evidence presented here is far from conclusive and it differs from that of other studies. Consider the study by Weiner and colleagues [
7], which includes many of the authors of this current manuscript as collaborators. It introduces the pTDT, which uses three key pieces of information to evaluate association: a previously established PRS function; the average of the PRS of mother and father, the mid-parent average; and the deviation of the offspring from the mid-parent average. Using this information, they show that the pTDT is an effective tool for genetically discriminating ASD probands from their unaffected siblings. Moreover, they establish that both carriers and non-carriers, as groups, carry a stochastically greater burden of common risk variants relative to unaffected siblings. Yet, carriers have a pTDT score of 0.17, on average, somewhat but not significantly greater than the average score for non-carriers, 0.12 (their Additional file
1: Table S13). Analyzing a broader set of developmental disabilities, Niemi and colleagues [
42] report similar findings to those of Weiner and colleagues [
7], specifically carriers have genetic scores indistinguishable from non-carriers and both carry greater burden of risk variation. Both studies use the pTDT approach, and both conclude that rare and common variations combine additively to affect risk.
In contrast to those results [
7,
42], in our study the average burden for carriers falls between that for unaffected and non-carrier affected individuals and this we view as evidence for additive effects. The conclusions of our studies are not completely at odds, although they do not fit perfectly together either. That the burden of common risk variants is greater in carriers and non-carriers in all three studies, relative to expectation, is consistent with common variation contributing to liability. What remains unresolved is how it combines with rare variants if they also have a large impact on liability. If PDVs found in individuals with ASD or severe developmental disability were close to completely penetrant—thus having a large impact on liability—then little or no contribution from common variation would be necessary for a diagnosis. Under this model, effects of PDVs are sufficient to cause developmental disability [
42] or ASD [
7]; common variation would induce variation about the mean liability for affected individuals and perhaps alter presentation of the phenotype. This is observed for quantitative phenotypes of other rare genetic disorders [
10]. However, in this scenario the expected liability arising from common variation in carriers should be near zero, as opposed to the significant positive estimates found by Weiner [
7] and Niemi [
42]. What explains the excess of common variation found in the carriers from their studies? One reasonable possibility is that stochastic variation plays a complicating role. If some PDVs were of sufficient impact on liability to cause the ASD or other developmental phenotype, whereas others were not, and if the impact of this latter group on liability combined additively with common variation, then the fraction of each type of PDV would determine where the mean liability of carrier subjects fell on the continuum between unaffected and non-carrier subjects. Such a model would induce greater variability in the average score for carriers, perhaps sufficiently to make the average burden estimated from carriers and non-carriers indistinguishable.
With larger samples than presented here, more compelling evidence could be drawn from an evaluation of carriers of PDVs in genes with very different recurrence rates in ASD individuals. For example, certain genes, such as
CHD8 [
38‐
40], are often found to carry PDVs in ASD individuals. Other genes show significant association, yet far less recurrence. In future studies and with a much larger sample, we should be able to order ASD risk genes accurately in terms of the relative risk of ASD generated by PDVs in these genes and evaluate how common variant risk changes along this ranking. If the two sources of risk work additively, they should show a strong negative relationship.
Why do we need to evaluate the nature of the relationship between common and rare variations so thoroughly? Suppose, for example, that some of the ASC’s 102 ASD genes do not truly affect risk and half of the
assumed PDV carriers have PDVs in these genes. Under this unlikely but not impossible scenario, these subjects would, in expectation, carry the mean WGRS observed in non-carriers (Fig.
4). To achieve the mean WGRS observed for the entire population of
assumed PDV carriers, which consists of an equal mixture of true carriers of risk PDVs and non-carriers, the mean for the subpopulation of true carriers would, in expectation, fall at the mean for unaffected individuals (Fig.
4). Under this scenario, joint effects of rare and common risk variants are irrelevant, a rare PDV would always be sufficient to cause ASD. Such scenarios can only be completely ruled out by using alternative ways of evaluating whether rare and common risk variation combine additively in their effects on ASD liability.
If common variant risk burden of PDV carriers is substantial, as the results here suggest, they have implications for genetic counseling regarding recurrence risk of ASD. Currently, genetic counseling for recurrence risk is binary, depending on whether or not a rare PDV in an ASD gene is found in the proband’s genome. If such a PDV is found, then the PDV is typically assumed “causal” for the proband’s ASD and recurrence probability for ASD is its prevalence. When this assumption is a good approximation, and it will be for many PDV carriers (9], counseling is also a good approximation. In some families, however, the PDV carrier has ASD in large part because of the polygenic burden carried by the parents and in this instance the current advice for recurrence risk is inaccurate. To give a concrete example, when we examined loss-of-function carriers in the SSC, we estimated that over 40% of these individuals would still have ASD even without the loss-of-function PDV [
9], and this would be predicted to be even more of an issue with less penetrant variation (e.g., missense variation). Because the present state of knowledge does not allow us to know, a priori, which scenario is relevant, it is important for genetic counselors to consider this uncertainty and whether it should be built into their advice for parents regarding recurrence risk.
Acknowledgements
We thank everyone who contributed to this study, both the research subjects and the investigators of the following studies:
Simons Simplex Collection: We would like to thank the SSC principal investigators (A. L. Beaudet, R. Bernier, J. Constantino, E. H. Cook, Jr, E. Fombonne, D. Geschwind, D. E. Grice, A. Klin, D. H. Ledbetter, C. Lord, C. L. Martin, D. M. Martin, R. Maxim, J. Miles, O. Ousley, B. Peterson, J. Piggot, C. Saulnier, M. W. State, W. Stone, J. S. Sutcliffe, C. A. Walsh and E. Wijsman) and the coordinators and staff at the SSC clinical sites; the SFARI staff, in particular N. Volfovsky; D. B. Goldstein for contributing to the experimental design; and the Rutgers University Cell and DNA repository for accessing biomaterials.
Electronic MEdical Records and Genomics Network: Group Health Cooperative/University of Washington—Funding support for Alzheimer's Disease Patient Registry (ADPR) and Adult Changes in Thought (ACT) study was provided by a U01 from the National Institute on Aging (Eric B. Larson, PI, U01AG006781). A gift from the 3M Corporation was used to expand the ACT cohort. DNA aliquots sufficient for GWAS from ADPR Probable AD cases, who had been enrolled in Genetic Differences in Alzheimer's Cases and Controls (Walter Kukull, PI, R01 AG007584) and obtained under that grant, were made available to eMERGE without charge. Funding support for genotyping, which was performed at Johns Hopkins University, was provided by the NIH (U01HG004438). Genome-wide association analyses were supported through a Cooperative Agreement from the National Human Genome Research Institute, U01HG004610 (Eric B. Larson, PI). Mayo Clinic—Samples and associated genotype and phenotype data used in this study were provided by the Mayo Clinic. Funding support for the Mayo Clinic was provided through a cooperative agreement with the National Human Genome Research Institute (NHGRI), Grant #: UOIHG004599; and by grant HL75794 from the National Heart Lung and Blood Institute (NHLBI). Funding support for genotyping, which was performed at The Broad Institute, was provided by the NIH (U01HG004424). Marshfield Clinic Research Foundation—Funding support for the Personalized Medicine Research Project (PMRP) was provided through a cooperative agreement (U01HG004608) with the National Human Genome Research Institute (NHGRI), with additional funding from the National Institute for General Medical Sciences (NIGMS) The samples used for PMRP analyses were obtained with funding from Marshfield Clinic, Health Resources Service Administration Office of Rural Health Policy grant number D1A RH00025, and Wisconsin Department of Commerce Technology Development Fund contract number TDF FYO10718. Funding support for genotyping, which was performed at Johns Hopkins University, was provided by the NIH (U01HG004438). Northwestern University—Samples and data used in this study were provided by the NUgene Project (
www.nugene.org). Funding support for the NUgene Project was provided by the Northwestern University’s Center for Genetic Medicine, Northwestern University, and Northwestern Memorial Hospital. Assistance with phenotype harmonization was provided by the eMERGE Coordinating Center (Grant number U01HG04603). This study was funded through the NIH, NHGRI eMERGE Network (U01HG004609). Funding support for genotyping, which was performed at The Broad Institute, was provided by the NIH (U01HG004424). Vanderbilt University—Funding support for the Vanderbilt Genome-Electronic Records (VGER) project was provided through a cooperative agreement (U01HG004603) with the National Human Genome Research Institute (NHGRI) with additional funding from the National Institute of General Medical Sciences (NIGMS). The dataset and samples used for the VGER analyses were obtained from Vanderbilt University Medical Center's BioVU, which is supported by institutional funding and by the Vanderbilt CTSA grant UL1RR024975 from NCRR/NIH. Funding support for genotyping, which was performed at The Broad Institute, was provided by the NIH (U01HG004424). Assistance with phenotype harmonization and genotype data cleaning was provided by the eMERGE Administrative Coordinating Center (U01HG004603) and the National Center for Biotechnology Information (NCBI). The datasets used for the analyses described in this manuscript were obtained from dbGaP at
http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000360.v3.p1.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.