- Research
- Open access
- Published:
DNA methylation models of protein abundance across the lifecourse
Clinical Epigenetics volume 16, Article number: 189 (2024)
Abstract
Background
Multiple studies have shown that DNA methylation (DNAm) models of protein abundance can be informative about exposure, phenotype and disease risk. Here we investigate and provide descriptive details of the capacity of DNAm to capture non-genetic variation in protein abundance across the lifecourse.
Methods
We evaluated the performance of 14 previously published DNAm models of protein abundance (episcores) in peripheral blood from a large adult population using the Avon Longitudinal Study of Parents and Children (ALSPAC) at ages 7–24 and their mothers antenatally and in middle age (N range = 145–1464). New age-specific episcores were trained in ALSPAC and evaluated at different ages. In all instances, episcore–protein associations were evaluated with and without adjustment for genetics. The association between longitudinal protein stability and longitudinal episcore projection was also evaluated, as was sex-specificity of episcores derived solely in female participants.
Findings
Of the 14 Gadd episcores, 10 generated estimates associated with abundance in middle age, 9 at age 24, and none at age 9. Eight of these episcores explained variation beyond genotype in adulthood (6 at age 24; 7 at midlife). At age 9, the abundances of 22 proteins could be modelled by DNAm, 7 beyond genotype of which one trained model generated informative estimates at ages 24 and in middle age. At age 24, 31 proteins could be modelled by DNAm, 19 beyond genotype, of which 5 trained models generated informative estimates at age 9 and 8 in middle age. In middle age, 23 proteins could be modelled, 13 beyond genotype, of which 3 were informative at age 9 and 7 at age 24.
Interpretation
We observed that episcores performed better at older ages than in children with several episcores capturing non-genetic variation at all ages.
Introduction
In humans, there are over 20,000 protein coding genes, many of which have well-characterised patterns of tissue-specific expression and function [1]. Aberrant protein expression is known to play important roles in the development of many diseases. Changes in protein abundance allow for the diagnosis and prognosis of many diseases [2] and may be influenced by a variety of biological and environmental causes including genotype [3] and gene regulators such as DNA methylation (DNAm) [4].
DNAm is a reversible epigenetic modification which is linked to both genetic and environmental factors and is known to regulate gene expression [5]. In mammals, this is primarily carried out by adding a methyl group to a cytosine residue, which is followed by a guanine residue (termed CpG site). Recently, models have been developed using DNAm levels measured at several CpG sites to predict protein abundance. For example, the Grimage model [6] includes DNAm models of a number of plasma proteins which together are capable of accurately predicting an individual’s time to death. It is notable that this model, which uses DNAm to proxy proteins, performs better as an indicator of time to death than models derived directly from DNAm. This has been hypothesised to be due to the protein models capturing functionally relevant DNAm variation, which may be missed when directly regressing 100,000 s of individual CpG sites onto time to death directly.
Motivated by the success of the Grimage model, Gadd et al [7] carried out a large-scale analysis of the viability of DNAm to index protein abundance and found this to be the case for 109 of 953 proteins. These models, called episcores, were projected into an external sample and associations were found with a number of morbidities. Both the model training and evaluation of their associations with morbidity were carried out in older adult populations, as such it is not clear whether the performance of these models transfers to younger populations, especially children.
An analogous investigation of genetic indices of protein abundance has also been carried out by Xu et al [8], which generated models called polygenic scores (PGS) for 825 of 2692 proteins. Given that genotype is established at fertilisation and is largely stable throughout life whereas disease processes and exposures tend to differ by life stage, DNAm models of protein abundance will likely capture additional variation associated with disease progression and environmental exposures beyond that of genetic models alone. However, since DNAm variation is at least partially (~ 20%) determined by genotype [9], we expect that DNAm models of protein abundance may capture some of the variation captured by genetic models.
Here, we ask how episcores trained in later adulthood perform across the lifecourse and to what extent DNAm may capture age-specific variation in protein abundance. We first test whether Gadd episcores trained in older adult populations transfer to younger populations in ALSPAC (ages 9 and 24) and whether they replicate in similarly aged populations in ALSPAC (mothers of study participants at middle age). Next, we consider if episcores projected into DNAm at one time point are capable of predicting protein expression at a future time point. Following this, we determine to what extent episcores capture variance in protein beyond genotype. Finally, we train episcores for 88 proteins quantified in ALSPAC children at ages 9 and 24 and in their mothers at middle age (referred to as ALSPAC episcores), evaluate their transferability between time points, determine whether episcores created using the mothers data transfer as well in males as in females, and determine to what extent these episcores capture variance in protein levels beyond the genotype (see Fig. 1 for general overview of analyses). Broadly, this will provide a greater understanding of the flexibility of episcores between different populations and also present further age-specific episcores for use by the scientific community.
General overview of methods and analyses. A Process of training episcores. B Projecting episcores into ALSPAC. C Cross-sectional and prospective prediction of protein levels using episcores. D Incorporating genetic models to determine non-genetic information capture of episcores. Created with BioRender.com
Methods
Study population
ALSPAC is a prospective birth cohort study in south-west England. Pregnant women resident in one of the three Bristol-based health districts with an expected delivery date between 1 April 1991 and 31 December 1992 were invited to participate. The study has been described elsewhere in detail, and ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees [10,11,12]. ALSPAC initially enrolled a cohort of 14,541 pregnancies, from which 14,062 live births occurred, with 13,998 alive at 1 year. Follow-up has included parent and child completed questionnaires, links to routine data and clinic attendance.
Research clinics were held when these offspring participants were approximately 9, 10, 11, 13, 15, 18, and 25 years old. Data for 25 years of age were collected and managed using REDCap electronic data capture tools hosted at the University of Bristol. REDCap (Research Electronic Data Capture) is a secure, web-based software platform designed to support data capture for research studies [13]. The study website contains details of all the data that is available through a fully searchable data dictionary. http://www.bristol.ac.uk/alspac/researchers/our-data/.
DNA methylation measurements
As part of the ARIES project (http://www.ariesepigenomics.org.uk), a subsample of ALSPAC mother–child pairs had DNA methylation measured using either the Illumina Infinium HumanMethylation450 Beadchip (450 K array) or the Illumina Infinium MethylationEPIC BeadChip (EPIC array). In blood samples collected from ALSPAC children, DNAm was measured at ages 7, 9, 15–17 and 24 and from the mothers during the study child pregnancy (‘antenatal’) and approximately 18 years later (‘midlife’). All DNA methylation wet-lab and preprocessing analyses were performed at the University of Bristol as part of the ARIES project and has been described in detail previously [14]. Methylation data normalisation was carried out using the TOST algorithm to minimise between probe differences [15]. The subsamples used in our study are described in Table 1.
Proteomic measurements
Protein measurements were previously taken for a total of 9000 blood plasma samples in ALSPAC using the Olink Target 96 inflammatory panel, split between non-fasted children at age 9 (3005), fasted children at age 24 individuals (3027) and 2968 fasted mothers at midlife. These data have been previously described in full by Goulding et al [16]. In short, Proteomics data were processed and quality-controlled using the R package metaboprep [17], which provides a standardised workflow for filtering samples and proteins based on missingness, total sum abundance (TSA) and principal component analysis (PCA) outliers. This process ensured the removal of extreme outliers and the identification of representative proteins for PCA. Efforts were made to maximise the number of children with protein measurements at both 9 and 24 and the number of mother–child pairs with protein measurements.
Previously published DNAm models of protein abundance (Gadd episcores)
ALSPAC children (Illumina HumanHap550 quad chip) and mothers (Illumina human660W-quad array) were genotyped using Illumina platforms, and the resulting data underwent rigorous quality control procedures, which included checks for gender mismatches, heterozygosity, missingness, relatedness, and population stratification. SNPs were filtered based on minor allele frequency, call rate, and Hardy–Weinberg equilibrium, with population stratification assessed via multidimensional scaling using HapMap reference populations. These procedures, described in detail elsewhere [18, 19], ensured a high-quality dataset for downstream analyses.
Recently published [7] DNAm models for abundance of 109 proteins (Gadd episcores) were trained using DNAm and protein abundance measured in blood samples from the KORA study [20]. Of these 109 proteins, the abundances of 12 were measured in ALSPAC using the Olink platform. These corresponded to 14 Gadd episcores (2 proteins had both proteins measured by both Somascan and Olink platforms), which were then projected into ALSPAC DNAm. These episcores were projected into ALSPAC at ages 7, 9, 15 and 24 in the ALSPAC children and at antenatal and middle age timepoints in the mothers (see Table 1).
Genetic models of protein abundance
Genetic models of protein abundance for 308 Olink- and 2348 Somascan-measured proteins were published by Xu et al. [8]. Proteins with measurements on both platforms have two PGS, one for each platform. For these, only the PGS with the strongest correlation with measured protein abundance was used for further analysis, so as to evaluate episcores against the strongest performing genetic model available.
Novel DNAm models of protein abundance in ALSPAC (ALSPAC episcores)
We trained de novo linear DNAm models of protein abundance in ALSPAC participants at three life stages (age 9, age 24, in middle age). Elastic Net as implemented by the glmnet R package [21] was used to train models for all 88 measured Olink proteins with all autosomal CpGs sites measured on the EPIC array as possible features. Protein values were residualised by sex prior to training. Model hyperparameters, alpha and lambda, were jointly optimised by fivefold cross-validation. Coefficients for ALSPAC-trained episcores for all three timepoints are available in Supplementary File 1.
Proteins with mean cross-validated R > 0.10 in ALSPAC at a specific time point were modelled using data from all participants with data at that time point and then projected into DNAm measured at other time points with matched protein abundance measurements. For example, episcores trained in middle age maternal participants were also projected into DNAm measured at ages 9 and 24. DNAm in ALSPAC was measured using a mix of 450 k and EPIC arrays. In cases where models would be trained and projected into DNAm data generated using a mix of arrays, models trained on the subset of CpG sites common to both arrays.
Evaluating episcore performance across the life course
Gadd episcores were assessed for replication by projecting them into DNAm measured in ALSPAC middle age mothers, which is demographically similar to the datasets used by Gadd to build these scores. Replication was deemed successful if model estimated and measured protein abundances were correlated at Pearson correlation coefficient threshold R > 0.10 and P < 0.05. We also assessed whether Gadd episcore performance was transferable to less similar data by projecting them into DNAm measured in ALSPAC offspring at age 9 and 24, again using a R > 0.10 and, P < 0.05 threshold.
The predictive capacity of Gadd episcores was evaluated by projecting the episcores into ALSPAC at age 7 and age 15 and evaluating their correlation with protein measurements at ages 9 and 24, respectively.
The capacity of DNAm to model protein abundance at a given time point in ALSPAC was assessed within a cross-validation framework (5 folds) which optimised for mean cross-validated error (error metric: mean squared error), as such each datapoint was used to train the ALSPAC episcores, and evaluate the performance of the episcore. DNAm was considered capable of modelling a protein if the mean correlation between estimates and measurements across folds was at least 0.1. Models were then trained for such proteins using all available data at that time point. Transferability of these models to other time points was assessed by projecting these models into DNAm time points and calculating correlations between model estimates and protein measurements. A model was considered transferable for correlation R > 0.10 and P < 0.05. Finally, sex-specific performance was evaluated by projecting models created in the mothers group into sex-specific groups in the age 9 and 24 data. T-tests were carried out to determine whether there were notable differences between the performance in males and females.
Comparison of episcore and genetic measure of protein abundance
Capture of non-genetic protein variation was evaluated through the use of F-tests to compare two linear models, the first including just a published PGS to predict protein abundance and a second including both an episcore and the PGS to predict protein abundance (threshold for reporting: F-test P < 0.05). This was carried out at each cross-sectional time point (age 9, age 24, middle age). For the ALSPAC episcores, F-tests were performed within a cross-validation framework (5 folds) so that episcores could be trained and evaluated within the same dataset.
Effect of protein stability and variance on model performance
Correlations of protein levels between ages 9 and 24 were calculated to determine the longitudinal stability of protein abundance. The relationship between protein stability and episcore transferability was evaluated by calculating the correlations between the protein stability correlations and the ALSPAC model transferability correlations from age 9 to 24, and vice versa.
Results
Replication and transferability of Gadd episcores in ALSPAC
Gadd episcores were projected into DNAm measured in ALSPAC at middle age to assess replication. Concurrent Olink measurements for 11 proteins matched 14 Gadd episcores since there are 3 episcores (trained from Somascan and Olink measurements) for CXCL11, CXCL10 and MMP1. Of these, 10 episcores replicated (P < 0.05 and Pearson’s R > 0.1, Fig. 2 and Table 2).
We also projected Gadd episcores into DNAm measurements from ALSPAC offspring at ages 9 and 24 with concurrently available Olink protein measurements to evaluate whether they were transferable to younger populations. Here, only four episcores transferred to age 9 (Pearson’s R > 0.1 but none at P < 0.05) and nine to age 24 (P < 0.05 and Pearson’s R > 0.1, Fig. 2 and Table 2).
Gadd episcores have predictive capacity for protein levels
To determine the predictive capacity of Gadd episcores, we assessed whether Gadd episcores projected into DNAm measured at one time point could predict protein abundance measured at a later time point (Fig. 3). No episcores projected into age 7 DNAm were predictive of protein abundance at age 9 (Supplementary Table 1). Two episcores at age 15 were predictive of protein abundance at age 24 (Supplementary Table 2). Seven episcores projected into DNAm measured during pregnancy predicted protein abundance at middle age, approximately 18 years later (Supplementary Table 3).
Gadd episcores explain protein variance beyond genotype
Next we asked whether Gadd episcores explained variance in protein abundance beyond polygenic scores (PGS). PGS, Gadd episcores, and protein abundance data were available for eight proteins in ALSPAC at ages 9, 24 and middle age. We evaluated the additive explanatory value of Gadd episcores by performing F-tests to compare a baseline model with just a PGS predicting protein abundance to a model additionally including the corresponding Gadd episcore (Methods).
At age 9, no Gadd episcores showed evidence (F-test P < 0.05) of explanatory beyond the PGS alone (Supplementary Table 4). However, at age 24, Gadd episcores for five proteins explained protein variance beyond the PGS: CD6, CXCL10, CXCL11, OSM, VEGFA (Fig. 4A, Supplementary Table 5). For two proteins, CXCL11 and OSM, Gadd episcores explained more than 5% of the variance in protein abundance beyond polygenic scores.
Similarly, in middle-aged mothers, Gadd episcores captured protein variance beyond the genotype for seven proteins: CCL11, CD6, CXCL10, CXCL11, MMP1, OSM, VEGFA (Fig. 4B, Supplementary Table 6). Here again, the Gadd episcores for CXCL11 and OSM explained more than 5% of variance in protein abundance beyond polygenic scores.
ALSPAC-trained episcores transfer across ages
To evaluate the transferability of episcores trained within ALSPAC, we trained episcores using the matched DNAm and protein abundance data available in ALSPAC at ages 9 and 24 and in middle age. For each time point, we used cross-validation (CV) to evaluate model performance and evaluated transferability of models for proteins with sufficiently strong CV performance (mean CV R > 0.1). We observed strong performance for 22, 31 and 23 of the 88 proteins at ages 9, 24 and middle age, respectively.
We then trained episcores for these proteins using all available data at each given time point and asked whether they transferred to the other time points. Of the 22 episcores trained at age 9, only IL18R1 transferred to older ages, in this case to both age 24 and middle age (Fig. 5A). For the 31 episcores trained at age 24, five transferred to age 9, and eight transferred to middle age (Fig. 5B). Finally, of the 23 episcores trained in middle age, three transferred to age 9 and seven transferred to age 24 (Fig. 5C; Supplementary Tables 7-Supplementary Table 9).
Transferability of episcores trained in ALSPAC. A Episcore trained at age 9 and transferred to age 24 (N = 319–763) and middle age (N = 274–622). B Episcores trained at age 24, transferred to ages 9 (N range = 145–336) or to middle age (N range = 274–622). C Episcores trained at middle age and transferred to ages 9 (N = 145–336) or 24 (N = 319–763). Error bars represent +/ −1 standard deviation (SD) for CV performance within training time point
Given that protein abundance may differ by sex, we asked whether transferability may differ by sex. All participants at middle age were women, so sex-specificity was evaluated between ages 9 and 24. Sex-specificity was evaluated by calculating sex-specific transfer correlations and then asking if they differed between the sexes (t-test). We observed no sex differences for the eight episcores that transferred into age 9 (t = 0.74, P = 0.52) nor for the seven that transferred into age 24 (t = −0.01, P = 0.99).
Episcores trained in childhood and in adulthood explain non-genetic variation
For our ALSPAC-trained episcores, we again asked whether episcores explain variation in protein abundance beyond genotype. This was evaluated through the use of cross-validation to maximise power (Methods).
At age 9, seven of 12 episcores with a corresponding PGS explained protein variance beyond the PGS (F-test P < 0.05; Fig. 6A, Supplementary Table 10). One episcore (TGFalpha) explained more than 5% of the variance in protein abundance beyond the PGS. At age 24, nineteen of the 21 episcores with a corresponding PGS explained protein variance beyond PGS (Fig. 6B, Supplementary Table 11). Episcores for four proteins (CXCL11, IL6, OSM, TNFSF14) explained more than 5% of the variance in protein abundance beyond the PGS. In middle-aged mothers, thirteen of 13 episcores with a corresponding PGS explained protein variance beyond the PGS (Fig. 6C, Supplementary Table 12). Episcores for four proteins (CD6, CXCL11, Flt3L, OSM) explained more than 5% of variance in protein abundance beyond the PGS.
Variance in protein abundance explained by ALSPAC episcores exclusively, polygenic scores exclusively and combined at A age 9 (N range = 197–309), B 24 years (N range = 663–704), and C middle age (N range = 547–553). Combined models which explain more variance than the polygenic score alone are marked with an asterisk (*) (F-test P < 0.05)
Longitudinal protein stability doesn’t determine episcore transferability
To determine whether the stability of protein levels affects transferability, we compare the longitudinal stability of proteins from age 9 to age 24 against the transferability DNAm models from age 9 to 24, and from age 24 to age 9. We observed weak positive correlations both from age 9 to 24 (R = 0.31, P = 0.15) (Fig. 7A) and from age 24 to 9 (R = 0.23, P = 0.21) (Fig. 7B). Furthermore, it is clear that protein stability is much stronger from age 9 to age 24, than episcore stability between the same age groups.
Discussion
Although Gadd et al. [7] showed that DNAm can be used to estimate protein abundance, their episcores have never been tested in younger populations. Here, we have evaluated these episcores in ALSPAC participants at three different ages. We find that most of the Gadd episcores capture variance in protein abundance in young (age 24) and middle-aged adults but very little in children (age 9).
We also examined the extent to which episcores capture non-genetic variation. We did indeed observe evidence for this for several proteins in young and middle age adults, whether the episcore was trained within or outside ALSPAC. We observed some but weaker evidence for this at age 9, but only for episcores trained at age 9. This is consistent with our understanding that protein abundance like DNA methylation is determined by a combination of both genetic and environmental factors and suggests that molecular prediction of protein abundance should, where possible, integrate both genetic and epigenetic sources of information. We note here a potential bias of this analysis, as the Gadd episcores were only released should they pass a certain threshold in test data, whilst the Xu polygenic models were released for every protein tested. As such it is unlikely that the proportion of episcores shown to be successful over genetic models is truly representative. Training set sizes also differed quite significantly for episcore (n ~ 500–1000 for Gadd episcores and n ~ 250–750 for ALSPAC episcores) and polygenic models (n ~ 3000–5000). Larger sizes for episcore training would almost certainly have increased the number of proteins with trained episcores and the proportion of protein variance captured.
To further examine the capacity of DNAm to capture protein abundance across the lifecourse, we trained new episcores in ALSPAC at ages 9 and 24, and in middle age, which is important given the significant variation across the lifecourse in measures such as protein abundance. Evaluating the transferability of episcores between different ages, and providing new age-specific episcores may be important for model selection in future research. We find that the capacity of DNAm to capture protein abundance at age 9 is low compared to its capacity in adulthood at age 24 and in middle age. Transferability to and from age 9 was similarly poor and evidence for capture of protein variance beyond genotype weaker than for older ages. By contrast, we found the capacity of DNAm to capture protein variation at age 24 to be quite similar to middle age, including capacity to capture variation beyond genotype. In fact, this capacity was comparable to that of the Gadd episcores, also trained in mainly middle-age adults. This is somewhat unexpected, given that Gadd et al. regressed out genetic variation from protein abundances prior to training episcores [7], whilst we do not. Future work should determine whether this step produces better models both for capturing protein variation and for health-related phenotypic variation ore generally.
Throughout this study, we have provided evidence that when using episcores, it is important to consider age of the study population, given much poorer performance of episcores created in middle age individuals when deployed in younger children. Furthermore, we also demonstrate that episcores capture non-genetic variation, and such should be considered as an additional resource to genetic scores. Finally, we provide a number of age-specific episcores, which may provide additional value to the scientific community.
The limitations of our study include small and highly variable sample sizes (N range = 145–1464), which introduces concerns around statistical power and comparability of results. The use of longitudinal data from the same familial cohort for use in model testing may have inflated model metrics (due to relatedness between mothers and children and due to autocorrelation between time points for children with measurements in more than one time point). The Xu polygenic models were also trained in adults, so they may perform less well in children and young adults. Our conclusions are limited to the relatively small set of inflammatory proteins we examined in ALSPAC (88 proteins from the Olink inflammatory panel). We note that the genotyping and DNA methylation platforms used in the ALSPAC cohort differ from those in the external cohorts used to develop the Gadd episcores and Xu PGS scores in our analyses. Consequently, some discrepancies may exist that are beyond our control. Finally, we note that all episcore and PGS models discussed were trained using data obtained from white European populations, so our findings may not apply outside of this demographic.
Availability of data and materials
The ALSPAC data access plan has been described in detail elsewhere: http://www.bristol.ac.uk/alspac/researchers/data-access/documents/alspac-data-management-plan.pdf, and research proposal can be submitted here: https://proposals.epi.bristol.ac.uk/.
References
Kim M-S, et al. A draft map of the human proteome. Nature. 2014;509:575–81.
Aronson JK, Ferner RE. Biomarkers—a general review. Curr Protoc Pharmacol. 2017;76:9–23.
Melzer D, et al. A genome-wide association study identifies protein quantitative trait loci (pQTLs). PLoS Genet. 2008;4:e1000072.
Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012;13:484–92.
Attwood JT, Yung RL, Richardson BC. DNA methylation and the regulation of gene transcription. Cell Mol Life Sci. 2002;59:241–57.
Lu AT, et al. DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging. 2019;11:303–27.
Gadd DA, et al. Epigenetic scores for the circulating proteome as tools for disease prediction. Elife. 2022;11:e71802.
Xu Y, Ritchie SC, Liang Y, Timmers P, Pietzner M. An atlas of genetic scores to predict multi-omic traits. bioRxiv. 2022
Min JL, et al. Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation. Nat Genet. 2021;53:1311–21.
Fraser A, et al. Cohort profile: the Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort. Int J Epidemiol. 2013;42:97–110.
Boyd A, et al. Cohort profile: the ‘children of the 90s’—the index offspring of the Avon Longitudinal Study of Parents and Children. Int J Epidemiol. 2013;42:111–27.
Northstone K, et al. The Avon Longitudinal Study of Parents and Children (ALSPAC): an update on the enrolled sample of index children in 2019. Wellcome Open Res. 2019;4:51.
Harris PA, et al. Research electronic data capture (REDCap)–a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42:377–81.
Relton CL, et al. Data resource profile: Accessible Resource for Integrated Epigenomic Studies (ARIES). Int J Epidemiol. 2015;44:1181–90.
Touleimat N, Tost J. Complete pipeline for Infinium(®) Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. Epigenomics. 2012;4:325–41.
Goulding N, et al. Inflammation proteomics datasets in the ALSPAC cohort. Wellcome Open Res. 2022;7:277.
Hughes DA, et al. metaboprep: an R package for preanalysis data description and processing. Bioinformatics. 2022;38:1980–7.
Horikoshi M, et al. New loci associated with birth weight identify genetic links between intrauterine growth and adult height and metabolism. Nat Genet. 2013;45:76–82.
Rietveld CA, et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science. 2013;340:1467–71.
Holle R, Happich M, Löwel H, Wichmann HE, MONICA/KORA Study Group. KORA—A research platform for population based health research. Gesundheitswesen. 2005;67 Suppl 1:S19–25.
Hastie, Qian, Tay. An Introduction to glmnet. CRAN R Repositary.
Acknowledgements
We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses.
Funding
S.W. is supported by Cancer Research UK (C18281/A30905). P.Y., M.S., and S.W. are supported via the following: Medical Research Council Integrative Epidemiology Unit at the University of Bristol (MC_UU_00032/3, MC_UU_00032/5) and the Cancer Research UK Integrative cancer epidemiology programme (C18281/A29019). P.Y. and M.S. are also supported by the National Institute for Health and Care Research Bristol Biomedical Research Centre. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. The UK Medical Research Council and Wellcome (Grant ref: 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors, and SW, PY and MS will serve as guarantors for the contents of this paper.
Author information
Authors and Affiliations
Contributions
SW planned and carried out all analyses and wrote the first draft. MS and PY supervised the project and critically reviewed the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Paul Yousefi and Matt Suderman: Co senior authors.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Waterfield, S., Yousefi, P. & Suderman, M. DNA methylation models of protein abundance across the lifecourse. Clin Epigenet 16, 189 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13148-024-01802-y
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13148-024-01802-y