Skip to main content

Blood-based epigenome-wide association study and prediction of alcohol consumption

Abstract

Alcohol consumption is an important risk factor for multiple diseases. It is typically assessed via self-report, which is open to measurement error through recall bias. Instead, molecular data such as blood-based DNA methylation (DNAm) could be used to derive a more objective measure of alcohol consumption by incorporating information from cytosine-phosphate-guanine (CpG) sites known to be linked to the trait. Here, we explore the epigenetic architecture of self-reported weekly units of alcohol consumption in the Generation Scotland study. We first create a blood-based epigenetic score (EpiScore) of alcohol consumption using elastic net penalized linear regression. We explore the effect of pre-filtering for CpG features ahead of elastic net, as well as differential patterns by sex and by units consumed in the last week relative to an average week. The final EpiScore was trained on 16,717 individuals and tested in four external cohorts: the Lothian Birth Cohorts (LBC) of 1921 and 1936, the Sister Study, and the Avon Longitudinal Study of Parents and Children (total N across studies > 10,000). The maximum Pearson correlation between the EpiScore and self-reported alcohol consumption within cohort ranged from 0.41 to 0.53. In LBC1936, higher EpiScore levels had significant associations with poorer global brain imaging metrics, whereas self-reported alcohol consumption did not. Finally, we identified two novel CpG loci via a Bayesian penalized regression epigenome-wide association study of alcohol consumption. Together, these findings show how DNAm can objectively characterize patterns of alcohol consumption that associate with brain health, unlike self-reported estimates.

Introduction

Alcohol consumption, particularly heavy use, has been associated with increased morbidity and mortality, cognitive impairment, progressive white matter degeneration in the brain, and is a major risk factor for various forms of cancer [1,2,3,4,5]. Despite regular alcohol use being an important risk factor for a plethora of diseases, self-reported consumption is an imperfect phenotype that can be prone to recall bias [6].

Like other environmental and lifestyle factors [7, 8], alcohol consumption is linked to the epigenome, specifically blood-based DNA methylation (DNAm) patterns [9,10,11]. DNAm is an epigenetic mark that is typically characterized by the addition of a methyl group to the 5’ carbon of a cytosine base, often occurring at cytosine-phosphate-guanine (CpG) dinucleotides, also referred to as a CpG site [12]. DNAm can influence gene expression and cellular function; thus, methylomic modifications could mediate alcohol–disease risk associations and development [12, 13] as well as alcohol addiction [14, 15]. As such, identification of alcohol-associated CpG sites could provide biological insights into the pathophysiology of alcohol-related diseases [11, 16].

DNAm-based predictors of complex traits have gained prominence in recent years through the prediction of phenotypes such as age and smoking [8, 11, 17, 18]. The previous largest epigenome-wide association (EWAS) meta-analysis study of alcohol consumption (self-reported units consumed per day in the past year) included over 13,000 individuals from 13 cohorts. Using a 144-CpG signature, the authors explained up to 13.8% of the variance in the phenotype (incremental R2 over linear regression models including age and sex) in four independent test sets. There are two reasons why a DNAm-based predictor might provide an improved index of alcohol consumption. First, similar to the way that hair cortisol and glycated haemoglobin track long-term stress and glucose regulation, the methylome, although dynamic, is relatively stable. For example, many CpG sites associated with smoking revert back to similar levels as non-smokers around 5 years after quitting [7]. However, some sites remain differentially methylated up to 30 years after cessation [19]. Second, under the assumption that enough individuals respond accurately in a self-report questionnaire and by averaging over multiple CpG sites to build a predictor, one should be able to gain more precise estimates for those whose self-report data are inaccurate.

In this study, we explore the creation of an epigenetic predictor of alcohol consumption, making use of a large single-cohort DNAm study, Generation Scotland. We assess the performance of this predictor in 9 independent external subsets from four different studies: older adults across the 8th and 9th decades of life—the Lothian Birth Cohorts (LBC) of 1921 and 1936 [20, 21]; adolescents and adults from the Avon Longitudinal Study of Parents and Children (ALSPAC) [22, 23]; and multi-ancestry adult women from the Sister Study [24]. We also explore differential patterns by sex and units consumed in the last week relative to an average week. Furthermore, to gain further biological insights into potential alcohol-mediated pathways underlying disease, we perform the largest epigenome-wide association study (EWAS) of alcohol consumption to date (N = 16,717).

Results

A total of 16,717 Generation Scotland participants (mean age 47.5, SD 14.9 years; 9758 females and 6959 males) had blood-based DNAm (see “Methods”) and self-reported alcohol consumption data available (Table 1, Supplementary Table 1, Supplementary Fig. 1). The mean alcohol units (unit definition as per UK National Health Service definition at 8 g/10 ml of pure alcohol) consumed in the week prior to completing the questionnaire and blood draw was 10.9 (SD 12.7, Supplementary Figs. 2 and 3). A total of 10,506 (62.8%) participants reported that this number was reflective of their usual drinking pattern (“normal week” drinkers) with 1622 and 3756 noting it was less or more than they typically drink in a week, respectively (response unknown for N = 833).

Table 1 Generation Scotland cohort classification by units drunk per week (unit definition as per UK National Health Service definition at 8 g/10 ml of pure alcohol), for both people reporting usual drinking (“normal week” drinkers) and everyone in the cohort

Alcohol consumption EpiScore

An epigenetic score (EpiScore) was trained on self-reported alcohol units (log(x + 1) transformed) consumed in the week prior to the blood draw for DNAm measurement. Generation Scotland was split into a training (N = 8684) and test set (N = 8033).

We evaluated whether pre-selecting CpGs ahead of training could improve prediction performance. We trained predictors on either the full methylome (386,399 CpGs after limiting measured features to those also present in the Illumina 450 K array for wider applicability) or 3999 CpGs with evidence of an association to alcohol consumption in three recent EWASs that excluded Generation Scotland [9, 16, 25] (see Methods). Additionally, we evaluated whether training on a subset of 5618 (69.9% of the training set) individuals who reported that their consumption in the previous week was reflective of a “normal week” influenced predictor performance.

EpiScore prediction performance was assessed by Pearson correlations (r) between self-reported alcohol consumption units per week and the EpiScore, as well as by calculating the incremental R2 upon the addition of the EpiScore to a linear regression model adjusting for age and sex in the test set. We found that predictors trained on pre-filtered CpGs ahead of elastic net outperformed those trained on all CpG. We also found no improvement when training on individuals whose consumption in the previous week was reflective of a normal week (Table 2, Supplementary Fig. 4) possibly due to a reduced sample size (69.9% of the full set).

Table 2 Predictive performance and number of features of four EpiScores generated using elastic net regression

If the methylome is only able to capture recent exposure to alcohol, then our predictors should showcase differential performance if a person had deviated from their normal alcohol units consumed in a given week (drinking more or less than normal). We therefore evaluated EpiScore performance on participants who reported their alcohol consumption was similar to a normal week versus those reporting having consumed more or less than normal over the past week (N = 7642/8033, 95.1% of the testing dataset—status not recorded for 391 individuals). This consisted of 4888 individuals whose consumption in the past week was reflective of normal drinking behaviour, 1920 people who drank more than usual that week, and 834 people who drank less than usual. The EpiScore performed best in the subset of the test set that reported their consumption in the last week to be reflective of a normal week (Table 2, Supplementary Fig. 5).

Alcohol EpiScore tested in external cohorts

The previous analyses established that model training was optimized by: (1) considering everyone and not just “normal week” drinkers and (2) pre-filtering to CpGs previously associated with alcohol. We therefore trained a final model in this manner making use of the full Generation Scotland cohort (N = 16,717). This returned an EpiScore consisting of 659 features (Supplementary Table 2). Predictive performance was evaluated in four external cohorts: the Lothian Birth Cohorts (LBC) of 1921 and 1936 (N = 436 and 895, respectively); ALSPAC (5 cohort subsets, NTOTAL = 4083, ranging from 476 to 1482 per subset); and the Sister Study cohorts (2 cohort subsets, NTOTAL = 5119, with N = 2770 and 2349 per subset, see Methods, Table 3). The LBC and ALSPAC cohorts reported alcohol consumption as average units consumed in a week the year prior to sampling (using the NHS unit definition of 8 g/10 ml of pure alcohol), while the Sister Study cohorts reported a derived variable that represented the average number of drinks per week over the last year. For simplicity, here we use the term “units per week” for all cohorts. All alcohol measurements were log(x + 1) transformed.

Table 3 EpiScore performance metrics in the Lothian Birth Cohorts, ALSPAC, and Sister Study cohorts

In the two Scottish LBC studies (Supplementary Table 3), the EpiScore had a moderate correlation with self-reported alcohol consumption (rLBC21 = 0.41, rLBC36 = 0.42). The EpiScore had an incremental R2 (over a linear regression model adjusting for age and sex) of 17.9% in LBC1921 and of 16.6% in LBC1936 (Table 3, Supplementary Fig. 6). This outperforms a previously published alcohol consumption EpiScore trained on a subset of N = 2819 in Generation Scotland [8], which, when retested here to ensure equal LBC testing data pre-processing (the original paper tested on alcohol units as opposed to log(units + 1)), presented an incremental R2 over a model adjusting for age and sex of 6.3% in LBC1921 and 10.6% in LBC1936.

Considering the five ALSPAC cohort subsets (15up 450/EPIC: 15–17-year-olds measured on either 450 K or EPIC Illumina chips, F24: 24-year-olds, FOM: mothers in midlife, and FOF: fathers in midlife), the EpiScore correlation with self-reported alcohol consumption ranged from r = 0.11 to 0.45, with an incremental R2 over a linear regression model adjusting for age and sex ranging from 1.2 to 20%. Notably, the worst performing subsets were made up of younger individuals (mean age less < 18). The two Sister Study cohort subsets (one measured with 450 K array and another with EPIC array) showed correlations of r = 0.53 and 0.51, and with an incremental R2 = 28.1% and 26.9%, respectively (Table 3, Supplementary Figs. 7 and 8). The mean age in the Sister Study cohorts was approximately 56 years old. In a subset of 796 individuals of the Sister Study, who self-identified as Black, the correlation and incremental R2 were 0.37 and 14.4%, respectively,

Measured alcohol consumption and EpiScore associations

We next tested for associations between self-reported alcohol consumption or our alcohol EpiScore and a number of lifestyle/health/socioeconomic factors, self-reported disease history, brain MRI-derived variables, and all-cause mortality in the LBC sets using a series of linear and logistic regression models, adjusting for age and sex (see Methods, Supplementary Table 4).

A small number of the explored associations were statistically significant (PFDR < 0.05). Considering lifestyle and cognitive traits, the EpiScore was associated with smoking status in LBC1936 (standardized β = 0.117, PFDR = 0.009) and with occupational social class in LBC1921 (β = −0.105, FDR P = 0.024). On the other hand, a higher self-reported alcohol consumption was associated with smoking status in LBC1921 (β = 0.17, PFDR = 0.011). Considering disease history, the EpiScore was positively associated with high blood pressure in LBC1936 (ORper SD of the EpiScore = 1.22, PFDR = 0.012). Self-reported alcohol consumption was not significantly associated with any of the disease histories considered here. Further, we found a significant association between our EpiScore and time to all-cause mortality in LBC1936 (HRper SD of the EpiScore = 1.16 [95% CI 1.05, 1.28]).

All brain MRI variables were found to be significantly associated with the EpiScore in LBC1936 but were not found to be significantly associated with self-reported alcohol consumption (Fig. 1). These included negative associations with total brain volume (β =  − 0.044, PFDR = 0.012), grey matter volume (β =  − 0.074, PFDR = 0.001), and normal-appearing white matter volume (β =  − 0.064, PFDR = 0.012), and a positive association with white matter hyperintensity volume (β = 0.120, PFDR = 0.012).

Fig. 1
figure 1

Self-reported (SR) alcohol consumption (units per week) and alcohol EpiScore associations with global brain imaging in LBC1936. Standardized effect sizes from age and sex-adjusted linear regression models shown along with 95% confidence intervals. Alcohol units defined as per NHS guidelines of 8 g/10 ml of pure ethanol

Sex-specific EpiScore performance

Given the differences in average alcohol consumption between males and females, we explored sex-specific models (see Methods). However, the prediction performance, as measured by r and incremental R2 over a model accounting for age and sex, did not vary greatly across EpiScores (Supplementary Fig. 9, Supplementary Table 5) in both LBC1921 and LBC1936.

Alcohol consumption variance explained by the methylome and EWAS

Next, we determined the proportion of variance in the alcohol consumption phenotype that can be explained by all CpG sites measured on a DNAm array (more specifically, the Illumina EPIC array, consisting of 752,722 CpGs after QC). To do this, we fitted a Bayesian sparse regression model and performed a variance partitioning analysis using BayesR+ (see Methods). BayesR+ has been shown to implicitly control for white cell proportions, which are typically estimated from the DNAm data, related participants, and other unknown confounders [26]. Three mixture distributions were specified, corresponding to possible small, medium, and large effect sizes for the CpGs (explaining 0.01%, 0.1% and 1% of the variance, respectively). We fit models using (1) “normal pattern” drinkers in Generation Scotland and (2) the full Generation Scotland cohort. Our analyses found that 45.0% (95% Credible Interval 39.7%, 50.5%) and 49.3% (95% Credible Interval 44.3%, 54.4%) of alcohol consumption (log(x + 1) transformed) were explained by all CpGs in models with “normal week” drinkers (those whose self-reported alcohol consumption was consistent with their normal drinking behaviour) and the full cohort, respectively.

In addition to the variance components analysis, BayesR+ simultaneously conducts an epigenome-wide association study (EWAS—see Methods). This assesses the association between each CpG and the outcome, jointly across and conditionally on all possible CpGs. We found a total of four and six lead CpGs with a posterior inclusion probability (PIP) greater than 0.95 (Fig. 2, Table 4) in models considering just “normal week” drinkers and the full cohort, respectively. Two CpGs had a PIP greater than 0.95 in both models, and a total of eight unique lead CpGs were found.

Fig. 2
figure 2

EWAS of alcohol consumption (units per week, log(x + 1) transformed) Manhattan plot. Model using 1) “normal week” drinkers and 2) full cohort. Threshold line set at posterior inclusion probability (PIP) = 0.95. Alcohol units defined as per NHS guidelines of 8 g/10 ml of pure ethanol

Table 4 BayesR alcohol consumption (units per week, log(x + 1) transformed) EWAS associations with mean beta PIP > 0.8

We queried the EWAS catalog (accessed 14th May, 2024) for the eight aforementioned CpG sites and found that three had been previously linked to alcohol consumption, while one had been previously linked to alcohol withdrawal recovery (Supplementary Table 6). This search is not exhaustive, as not all studies deposit data in this resource. Indeed, six of the eight CpG sites were found in the largest previously published alcohol consumption EWAS making use of Generation Scotland data [11] (all but cg03741185 and cg06053623). Seven of the eight CpGs were found to be associated with at least one other trait in the EWAS catalog, including age, prevalent type 2 diabetes, serum high-density cholesterol, gestational age, serum triglycerides, blood pressure, BMI, and others.

Our lead CpGs mapped to the genes POLR3GL, SLC7A11, SRPK2, PSAT1, IL12RB1, SLC43A1, and LOC100132354 (Table 4). IL12RB1 has not been previously linked to alcohol consumption, based on EWAS Catalog output and the previously largest GS-based alcohol consumption EWAS.

Discussion

Excessive alcohol consumption is one of the most important contributors to the global burden of disease, with important associations to conditions including cardiovascular disease, cancer, and more [3,4,5]. Alcohol has further been associated with DNAm differences via multiple mechanisms, [27] and as such, the altered methylome could offer clues into alcohol–disease links.

Here, we report a new alcohol EpiScore which explains up to 28% of the variance in self-reported consumption and performs similarly well across three UK-based and one US-based cohort. The score performs best in mid-to-older-aged adults and in those who stated that their consumption in the past week was reflective of a normal week, compared to those who drank less or more than normal. By contrast, the EpiScore showed poorest performance in the teenage subset of the ALSPAC cohort (correlation of 0.11). As highlighted by others [11, 28], our work suggests that an alcohol EpiScore is well placed to track chronic exposure. These findings also suggest that the relationship between alcohol and the methylome is dynamic and reversible. Indeed, a recent study found that a large number of CpG sites that were found to be associated with alcohol consumption presented differential methylation between former and current drinkers and found that alcohol-related hypomethylation is largely reversible upon cessation [16]. Our results suggest that changes to the methylome could be observed in short time frames, but longitudinal data with frequent time points would be needed to confirm this.

Our alcohol EpiScore also negatively associated brain MRI-derived measures (tested in the LBC1936 cohort), whereas self-reported alcohol consumption did not. Chronic alcohol use is associated with changes in brain structure and connectivity [29], and previous studies have reported links between higher alcohol consumption and lower white and grey matter volume [30], as well as with higher white matter hyperintensity volume [31]. A recent study making use of the UK Biobank brain MRI data (N = 36,585) [2] found that self-reported alcohol consumption was associated negatively and slightly non-linearly with both white matter and grey matter volumes, after accounting for covariates including age, sex, and BMI. There, consuming as few as 1–2 alcoholic drinks daily was associated with lower brain volume.

Previous studies have found that CpG pre-filtering ahead of elastic net greatly improves predictor performance when using this training method [32]. Our current results echo this, with an increase in prediction accuracy found when training on CpGs with a previously established association to alcohol consumption. This could be due to technical limitations of penalized regression when the number of predictors is much larger than the number of observations [33], alongside the screening out of CpGs with low intra-sample variability due to technical variance [34, 35]. We also show the importance of sample size when training EpiScores; compared to a previous EpiScore, trained in 2,819 unrelated Generation Scotland volunteers with “normal week” drinking patterns [8], our new score explained 1.7- and 2.8-fold more variance in a self-reported alcohol consumption phenotype from the Lothian Birth Cohorts of 1936 and 1921, respectively.

Alcohol consumption patterns and alcohol-related complications differ between the sexes [30, 36, 37]. In addition, sex differences in the methylome have been described [38]. However, we found that sex-specific EpiScores yielded very similar results when matching sample sizes and comparing to sex-agnostic models. This suggests that despite there being differences in consumption patterns, the methylation response to alcohol is similar across males and females for an equivalent consumption of units.

Our alcohol EWAS identified 8 sentinel loci, which mapped to 7 unique genes. Three of the genes the CpGs map to, which have already been reported to be associated with alcohol consumption in previous EWAS efforts [9, 11, 16, 39], are part of the aminotransferase family (SLC7A11, PSAT1, and SLC43A1). Alcohol is known to disrupt protein metabolism and amino acid transport [40, 41] and SLC7A11’s role in the liver–brain axis in alcohol-related disease and potential as a future drug target has been described [11]. One of our strongest CpG associations (cg26774981) mapped to the SRPK2 gene, a kinase that controls alternative splicing. A recent paper found the regulation of alternative splicing by SRPK2 is implicated in lipogenesis in humans with alcohol-associated liver disease, thus making it a potential drug target [42]. One of the seven genes mapping to the CpG loci, we identified has not been linked to alcohol consumption: a type I transmembrane protein of the haemopoietin receptor superfamily (IL12RB1). Future work is needed to replicate these findings and to understand their potential role in alcohol-mediated disease aetiology.

Our study has several limitations. Firstly, the majority of the Generation Scotland and Lothian Birth Cohorts are of White British ancestry, which could lead to biases and difficulty translating these results to other population. However, the EpiScore performed as well or better in two external cohorts of diverse age ranges (ALSPAC) and ancestries (Sister Study). Secondly, as has been discussed previously, this study is based on an imperfect phenotype. Indeed, self-reporting has its limitations, and further details regarding alcohol consumption, such as a breakdown of type of drink consumed (beer, wine, spirits) and pattern of consumption (e.g. binge versus light drinking alongside seasonal variations), could help further untangle its relationship with human health and the methylome. It is also unknown if EpiScores can capture lifetime drinking patterns or if there is a particular period in life where alcohol is particularly detrimental for health. These limitations surrounding the phenotypic measurement of alcohol consumption perhaps limit the interpretation of the EpiScore and question what aspects of drinking behaviour it is reflecting. Replication of the current findings, along with more extensive comparisons with other health and lifestyle traits, may help to determine what aspects of alcohol consumption the EpiScore captures and where it might (and might not) be valid as a phenotypic proxy. Thirdly, DNAm was measured in whole blood, and therefore, these results may not apply to all blood cell types or other mechanistically relevant tissues such as brain. Finally, there were relatively subtle differences between the performance of various EpiScores during the feature pre-selection and population subsetting of the training dataset. Defining a “best” score is non-trivial, especially given that sample sizes varied between the whole cohort and those who reported drinking a similar amount to a normal week. Here, we selected the EpiScore with the highest point estimate values for r and R2 across the Generation Scotland test set for application in the external cohorts.

Going forwards, phenome-wide association studies of self-reported alcohol consumption and its EpiScore in relation to incident disease outcomes and health-related traits would help to quantify the utility of the latter in risk prediction settings. Running these analyses across diverse cohorts would also help to identify consistent trends versus those that might be bespoke to a particular study. Further, the addition of other objective biomarkers, such as blood alcohol, would further help to identify patterns of short-term versus long-term behavioural patterns in relation to DNA methylation differences. This would also help one to determine the accuracy and degree of bias with self-reported measures. However, such data are rarely collected in epidemiological cohort studies.

Whereas existing biomarkers can accurately measure chronic/heavy alcohol exposure, our EpiScore offers an opportunity to track consumption across all levels of exposure. There is also evidence to suggest that it can capture novel associations with brain health, compared to self-report metrics. Finally, we show that the EpiScore generalizes well across cohorts of diverse ages and ancestries. Future studies should determine if our EpiScore can help studies to impute an alcohol phenotype where self-report data are not available.

Methods

Generation Scotland (GS)

Overview

Generation Scotland is a Scottish family-based study with over 24,000 participants recruited between 2006 and 2011 [43]. Participants were aged between 18 and 99 years at recruitment, with a mean age of 47.5 years (SD 14.9). After exclusions (Supplementary Fig. 1), a total of 16,717 participants (9758 females and 6959 males) had measured blood-based DNAm (see Methods) and self-reported alcohol consumption data available (Supplementary Table 1 and Table 1). The mean units consumed in the week prior to completing the questionnaire and blood draw were 10.9 (SD 12.7, Supplementary Figs. 2 and 3). A total of 10,506 (62.8%) participants reported that this number was reflective of their usual drinking pattern with 1622 and 3756 noting it was less or more than they typically drink in a week (response unknown for N = 833).

DNA methylation

DNA methylation in blood at baseline (recruitment) was quantified for 18,413 Generation Scotland participants across three separate sets (NSet1 = 5087, NSet2 = 4450, NSet3 = 8876) using the Illumina MethylationEPIC (850 K) array. Individuals in Set 1 included a mixture of related and unrelated individuals. Set 2 comprised individuals unrelated to each other and to those in Set 1. Set 3 contained a mix of related individuals—both to each other and to those in Sets 1 and 2—and included all remaining samples available for analysis. Methylation data were processed across 121 experimental batches (NBatches. Set1 = 31, NBatches, Set2 = 30, NBatches, Set3 = 60).

Quality control details have been reported previously [44, 45]. Briefly, probes were removed based on (1) outliers from visual inspection of the log median intensity of the methylated versus unmethylated signal per array, (2) a bead count < 3 in more than 5% of samples, (3) ≥ 5% of samples having a detection p-value > 0.05, (4) if they pertained to the sex chromosomes, (5) if they overlapped with SNPs, and/or (6) if present in potential cross-hybridizing locations [46]. Samples were removed (1) if there was a mismatch between their predicted sex and recorded sex, (2) if ≥ 1% of CpGs had a detection p-value > 0.05, (3) if sample was not blood based, and/or (4) if participant responded “yes” to all self-reported diseases in questionnaires. A total of 752,722 CpGs remained after QC. Missing values were imputed using the mean of each CpG across all samples. Dasen normalization [47] was performed across all individuals.

Alcohol consumption data

Self-reported alcohol consumption was measured at baseline via questionnaires to obtain the number of units consumed in previous week (unit definition as per UK National Health Service: 8 g/10 ml of pure alcohol). Participants were also asked whether this was their usual drinking amount, or whether they had consumed more or less than normal. A total of 16,717 individuals had non-missing alcohol consumption data and methylation data—the rest of participants were excluded from this study (after exclusion, NSet1 = 4576, NSet2 = 4108, and NSet3 = 8033 individuals were left in sets 1, 2 and 3, respectively). Of these, 10,506 marked this quantity as representative of their typical weekly consumption, and 3756 stated this quantity was more than normal and 1622 less than normal (Supplementary Table 1, Supplementary Fig. 1).

Lothian Birth Cohorts of 1921 and 1936 (LBC1921 and LBC1936)

Overview

LBC1921 and LBC1936 are longitudinal studies of ageing on individuals born in 1921 and 1936, respectively [20]. Study participants completed the Scottish Mental Surveys of 1932 and 1947 at approximately age 11 years old and were living in the Lothian area of Scotland at the time of recruitment in later life.

DNA methylation

Blood samples considered here were collected at around age 79 for LBC1921 and at around age 70 for LBC1936. DNA methylation was quantified using the Illumina HumanMethylation450K array, for a total of 692 (up to 3 repeated measurements from 469 individuals) and 2796 (up to 4 repeated measurements from 1043 individuals) samples from LBC1921 and LBC1936, respectively. Quality control details have been reported previously [48]. Briefly, probes were removed (1) if they presented a low (< 95%) detection rate with p-value < 0.01 and/or (2) if they presented inadequate hybridization, bisulphite conversion, nucleotide extension, or staining signal, as assessed by manual inspection. Samples were removed (1) if they presented a low call rate (< 450,000 probes detected at p-value < 0.01) and/or (2) if predicted sex did not match reported sex.

Self-reported alcohol consumption

Participants were asked about their usual alcohol consumption, including number of times alcohol is consumed per week, normal alcohol consumption, typical drink of choice, and glasses/pints consumed on average. From this information, alcohol consumption in units consumed per week was derived. A total of 436 and 895 individuals had non-missing alcohol consumption and methylome data available in LBC1921 and LBC1936 baseline, respectively, and were considered in this study.

ALSPAC

Overview

The Avon Longitudinal Study of Parents and Children (ALSPAC) is a cohort study of pregnant women resident in Avon, UK, with expected dates of delivery between 1 April 1991 and 31 December 1992 [22, 23]. Among these, 20,248 pregnancies were identified as being eligible and the initial number of pregnancies enrolled was 14,541 resulting in 14,062 live births and 13,988 children who were alive at 1 year of age. At the start of the study, mothers invited their partners to complete questionnaires. In total, 121,113 partners have provided data and 3807 are currently formally enrolled. As part of Accessible Resource for Integrated Epigenomic Studies (ARIES) [49, 50], a subsample ALSPAC children, mothers and partners had DNAm assayed using the Illumina Infinium HumanMethylation450 or MethylationEPIC Beadchip array from peripheral blood samples collected at multiple time points from birth to middle age. The present study used DNAm measured from peripheral blood samples collected from ALSPAC children at ages 15–17 (time point “15up”) and 24 (time point “F24”) [51], and from ALSPAC mothers and partners [52] 18 years after the study pregnancy. Study data were collected and managed using REDCap electronic data capture tools hosted at the University of Bristol. REDCap (Research Electronic Data Capture) is a secure, web-based software platform designed to support data capture for research studies [53]. Please note that the study website contains details of all the data that are available through a fully searchable data dictionary and variable search tool (http://www.bristol.ac.uk/alspac/researchers/our-data/).

DNA methylation

Illumina Infinium HumanMethylation450 and MethylationEPIC Beadchip arrays were used to assess genome-wide DNAm patterns in peripheral blood. Samples across different time points were distributed in a semi-random manner across slides in order to mitigate batch effects. Data pre-processing and normalization were performed using the R package meffil as previously described [50]. Samples with large numbers of undetected probe signals were removed, along with those that had sex or genotype mismatches. Probes undetected in more than 20% of samples were excluded.

Self-reported alcohol consumption

Alcohol consumption was measured as the estimated number of units consumed on average during the week the year before blood sample collection for DNAm analysis. Consumption was estimated multiplying alcohol intake frequency per week by intake quantity. Frequency was assessed by the question “How often do you have a drink containing alcohol”, with possible responses including “Never”, “Monthly or less”, “2–4 times a month”, “2–4 times a week”, and “4 or more times a week”. “Never” drinking respondents were considered non-drinkers and were included in all primary analyses. Quantity was assessed by asking the number of drinks consumed where “one drink referred to ½pint of beer/cider, a small (125 ml) glass of wine or a single (25 ml) measure of spirit”, each of which is roughly equivalent to one UK alcohol unit (8 g of ethanol).

Sister study

Overview

The Sister Study is a US-nationwide prospective cohort study of 50,884 women enrolled between 2003 and 2009; women were eligible for enrolment if they resided in the USA and were breast cancer-free themselves but had a biological sister who was previously diagnosed. As part of study enrolment when all women were breast cancer-free, women completed self-reported questionnaires and an in-home visit where a whole blood sample was collected. Information about obtaining data from the Sister Study can be found at: https://sisterstudy.niehs.nih.gov/English/coll-data.htm.

DNA methylation

Two case–cohort samples of women were selected for DNAm profiling. In 2014, blood DNA samples from 2878 self-identified non-Hispanic White women were assayed on the Infinium HumanMethylation450 BeadChip [54]. This sample included 1542 women who were diagnosed with breast cancer in the years following enrolment (mean time to diagnosis: 4 years). In 2019, blood DNA samples from 2599 self-identified Black (Hispanic and non-Hispanic) and non-Hispanic White women were assayed on the Infinium MethylationEPIC v1 BeadChip [55]. This sample included 999 women who were diagnosed with breast cancer in the years following enrolment (mean time to diagnosis: 5 years). Self-identified Hispanic and non-Hispanic Black women were over-sampled for DNAm profiling in order to maximize the racial and ethnic diversity of the MethylationEPIC sample.

For both DNAm samples, DNAm data were preprocessed using the ENmix software pipeline, which included background correction, dye bias correction, inter-array normalization, and probe-type bias correlation [56,57,58]. Samples were excluded if they did not meet quality control measure including bisulphate intensity < 4000, had greater than 5% of probes with low-quality methylation values (detection P > 0.000001, < 3 beads, or values outside 3 times the interquartile range), or were outliers for their methylation beta value distributions. In total, 178 participants from the HumanMethylation450 sample and 250 participants from the MethylationEPIC sample were excluded for not meeting quality control measures.

Alcohol consumption

Participants’ history of alcohol consumption was obtained within 1 year of blood draw as part of a baseline questionnaire for alcohol use. Women reported information including the age at which they started and stopped drinking alcohol. The frequency of alcohol consumption was reported as days per week, month, or year by decade of life. The alcohol use variable used in this study was a derived variable that represented the average number of drinks per week over the last twelve months.

EpiScore of alcohol consumption: who to train on, and how?

In an effort to assess the optimal cohort sample and feature space to train on, multiple EpiScores were assessed. The Generation Scotland cohort was divided into a training (sets 1 and 2, N = 8684) and a testing dataset (set 3, N = 8033). EpiScores were trained on the full training dataset, as well as just on the “normal week” drinkers (N = 5618). Further, EpiScores were trained on the full methylome (386,399 CpGs after limiting measured features to those also present in the Illumina 450 K array for wider applicability) or on a subset of 3999 epigenome-wide significant CpGs (P < 3.6 × 10−8) that have been previously linked to alcohol consumption in three separate studies not using Generation Scotland [9, 16, 25].

Elastic net penalized regression was used to train our EpiScores on log-transformed alcohol consumption + 1 (glmnet package in R, v4.1). CpG beta values in the training set were scaled to mean zero and unit variance ahead of elastic net, thus obtaining standardized regression effect sizes. The L1, L2 mixing parameter was set at α = 0.5, and tenfold cross-validation was performed to select the shrinkage parameter (λ) that minimized the mean cross-validated prediction error.

Predictive performance for each EpiScore was assessed by projecting the latter into the testing dataset by multiplying each CpG by its estimated weight and performing summation, scaling CpG beta values beforehand to mean zero and unit variance. Pearson correlation (r) of the EpiScore with measured log alcohol consumption + 1, as well as the incremental R2 upon the addition of the EpiScore to a linear regression model adjusting for age and sex, was then calculated. EpiScore statistical significance was assessed considering the marginal test for the beta in the linear regression model adjusting for age and sex (assessing whether beta is significantly different from zero).

Training the EpiScore in generation Scotland and testing in the Lothian Birth Cohorts, ALSPAC, and sister study

Having established that training on all individuals with self-reported alcohol consumption data (regardless of whether this pattern reflected a typical week or was more or less than normal), and on a pre-filtered set of CpGs, yields the better performing EpiScore, we next trained on the full Generation Scotland cohort (N = 16,717). As with the creation of previous EpiScores, elastic net penalized regression was used with α = 0.5 and tenfold CV. This EpiScore was then projected and tested on the Lothian Birth Cohorts of 1921 and 1936, ALSPAC, and the Sister Study. Its performance was again assessed via a Pearson correlation with self-reported alcohol consumption (log(x + 1) transformed) and the incremental R2 upon the addition of the EpiScore to a linear regression model adjusting for age and sex.

Sex-specific EpiScores

Sex-specific EpiScores were trained after matching sample sizes (thus ensuring larger sample sizes weren’t driving better prediction). Given that the smallest sex-stratified sample size was N = 6958 (males), we trained male-specific EpiScore on the full male sample set, a female-specific EpiScore trained on a random subsample of 6958 female participants, and a sex-agnostic EpiScore trained on equal numbers of males and females with overall sample size also being 6958 (NF = 3479, NM = 3479).

To assess performance, using the same metrics and testing LBC dataset described previously, we tested the resulting EpiScores in three different ways: (1) a sex-specific manner by which predictions are obtained using each testing sample’s sex-specific EpiScore, (2) an opposite-sex manner, by which the EpiScore trained on the opposite sex of the testing sample is used to obtain predictions, and (3) a sex-agnostic manner, by which all samples, regardless of sex, are predicted using the EpiScore trained on both males and females.

EpiScore and self-reported alcohol consumption associations in the Lothian Birth Cohorts

Associations between multiple phenotypes and self-reported alcohol consumption, as well as with our generated EpiScore trained on the full Generation Scotland cohort, were evaluated separately in LBC1921 and LBC1936. For each phenotype, linear regression models were run, adjusting for age, sex, and either self-reported alcohol consumption or the epigenetic predictor. Phenotypes considered included body mass index (BMI in kg/m2), hand grip strength (maximum of left and right hand measurements, in kg), self-reported years of education, self-reported smoking status (never smoker, ex-smoker, and current smoker), number of smoked packs per day, measured time taken to walk 6 m (in seconds), occupation-based social class (measured as social grades based on highest reached occupation [59]), and depression and anxiety scores (HADS-D and HADS-A total from the Hospital Anxiety and Depression questionnaire [60]). Associations with blood biomarkers cholesterol and triglycerides were also assessed.

Self-reported alcohol and alcohol EpiScore associations with self-reported prevalent disease were evaluated using logistic regression, adjusting for age and sex. These included CVD, stroke, neoplasm, high blood pressure, diabetes, and thyroid dysfunction. Associations with time to all-cause mortality were assessed using a Cox proportional hazards model with age and sex as covariates, using the survival R package (v3.5), with time to all-cause mortality or censoring as the survival outcome.

Finally, associations with multiple brain imaging phenotypes measured in LBC1936 were considered. Briefly, structural and diffusion tensor (DTI) MRI acquisition and processing in LBC1936 were performed at Wave 2 (age 73 years) according to an open-access protocol [61]. Total brain, grey matter and normal-appearing white matter (NAWM) volumes were calculated using a semi-automated multi-spectral fusion method [62]. Intracranial volume was determined semi-automatically using Analyze 11.0™. Total brain, grey matter, and white matter volume measurements were scaled to mean zero and unit variance, and associations with self-reported alcohol consumption and the alcohol EpiScore were assessed via linear regression, adjusting for age, sex, and intracranial volume.

Sample sizes varied for each phenotype considered given missing values arising from incomplete participant questionnaires (Supplementary Table 3). Association P-values were FDR corrected (using the Benjamini–Hochberg procedure) to account for multiple testing within each LBC cohort.

Variance components analysis and EWAS using BayesR+ 

BayesR+  [26], a software implementation of a Bayesian regression modelling framework, which implicitly controls for white cell proportions, related participants, and other unknown confounders, was used to estimate the variance accounted for in alcohol consumption by methylation data, as well as estimate its association with each measured CpG (a total of 752,722). To remove the effects of age, sex, and smoking (via an EpiScore [7]), the input for BayesR+ was defined by the residuals of a linear regression model for alcohol consumption (log(x + 1) transformed) with those variables as covariates. CpG M-values were pre-corrected in a similar way, regressing out age, sex, smoking EpiScore, and batch. They were subsequently scaled to have mean zero and unit variance.

Full details of the BayesR+ modelling framework have been previously described [26]. Briefly, BayesR+ utilizes Gibbs sampling to generate draws from the posterior distribution conditional on the input data, setting prior mixture variances to 0.0001, 0.001 and 0.01, corresponding to possible small, medium, and large effect sizes of the CpGs considered (explaining 0.01%, 0.1% and 1% of the variance of the phenotype of interest, respectively). After a burn-in of 5000 draws, 10,000 draws were retained. Subsequently, a thinning of five draws was applied to reduce autocorrelation (i.e. 1000 iterations are used when reporting results for this analysis). The convergence of the hyperparameters was evaluated through the Geweke test [63], as well as assessing parameter values across iterations and assessing autocorrelation. For each probe, the proportion of iterations for which the probe was categorized as having a nonzero effect was calculated, this yielding the posterior inclusion probability (PIP). A PIP value over 0.95 (95%) was deemed to reflect an epigenome-wide significant CpG locus.

Variance components were estimated by the mean sum of squared standardized posterior effect sizes across the 1000 iterations. Individual effect sizes were estimated as the average across the 1000 iterations for each CpG. Models were run considering data for the full Generation Scotland cohort, as well as just the subset of “normal pattern” drinkers.

Data availability

According to the terms of consent for Generation Scotland participants, access to data must be reviewed by the Generation Scotland Access Committee. Applications should be made to access@generationscotland.org. Lothian Birth Cohort data are available on request from the Lothian Birth Cohort Study, University of Edinburgh (https://www.ed.ac.uk/lothian-birth-cohorts/data-access-collaboration). Lothian Birth Cohort data are not publicly available due to them containing information that could compromise participant consent and confidentiality. ALSPAC data are available on request from bona fide researchers. The study website contains details of all the data that are available through a fully searchable data dictionary and variable search tool (http://www.bristol.ac.uk/alspac/researchers/our-data/). Data from the Sister Study are available upon request via the Sister Study website (https://sisterstudy.niehs.nih.gov/English/coll-data.htm.) All custom R (version 4.3.0), Python (version 3.9.7), and bash code is available with open access at the following GitHub repository: https://github.com/elenabernabeu/methylomics_alcohol. EWAS summary statistics are available via  Edinburgh DataShare: https://datashare.ed.ac.uk/handle/10283/8929.

References

  1. Zahr NM, Pfefferbaum A. Alcohol’s effects on the brain: neuroimaging results in humans and animal models. Alcohol Res. 2017;38:183.

    PubMed  PubMed Central  Google Scholar 

  2. Daviet R, et al. Associations between alcohol consumption and gray and white matter volumes in the UK Biobank. Nat Commun. 2022;13:1–11.

    Article  Google Scholar 

  3. NIAA. Alcohol Facts and Statistics. https://www.niaaa.nih.gov/alcohols-effects-health/alcohol-topics/alcohol-facts-and-statistics.

  4. Rehm J, et al. The relationship between different dimensions of alcohol use and the burden of disease—an update. Addiction. 2017;112:968–1001.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Cancer Research UK. Alcohol and cancer.

  6. Northcote J, Livingston M. Accuracy of self-reported drinking: observational verification of ‘last occasion’ drink estimates of young adults. Alcohol Alcohol. 2011;46:709–13.

    Article  PubMed  Google Scholar 

  7. McCartney DL, et al. Epigenetic signatures of starting and stopping smoking. EBioMedicine. 2018;37:214–20.

    Article  PubMed  PubMed Central  Google Scholar 

  8. McCartney DL, et al. Epigenetic prediction of complex traits and death. Genome Biol. 2018;19:136.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Liu C, et al. A DNA methylation biomarker of alcohol consumption. Mol Psychiatry. 2018;23:422–33.

    Article  CAS  PubMed  Google Scholar 

  10. Zakhari S. Alcohol metabolism and epigenetics changes. Alcohol Res. 2013;35:6.

    PubMed  PubMed Central  Google Scholar 

  11. Lohoff FW, et al. Epigenome-wide association study of alcohol consumption in N = 8161 individuals and relevance to alcohol use disorder pathophysiology: identification of the cystine/glutamate transporter SLC7A11 as a top target. Mol Psychiatry. 2021;27(3):1754–64.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Robertson KD. DNA methylation and human disease. Nat Rev Genet. 2005;6(8):597–610.

    Article  CAS  PubMed  Google Scholar 

  13. Meissner A, et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008;454(7205):766–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Berkel TDM, Pandey SC. Emerging role of epigenetic mechanisms in alcohol addiction. Alcohol Clin Exp Res. 2017;41:666.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Zhang H, Gelernter J. Review: DNA methylation and alcohol use disorders: progress and challenges. Am J Addict. 2017;26:502–15.

    Article  PubMed  Google Scholar 

  16. Dugué PA, et al. Alcohol consumption is associated with widespread changes in blood DNA methylation: analysis of cross-sectional and longitudinal data. Addict Biol. 2021;26: e12855.

    Article  PubMed  Google Scholar 

  17. Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013;14:1–20.

    Article  Google Scholar 

  18. Yousefi PD, et al. DNA methylation-based predictors of health: applications and statistical considerations. Nat Rev Genet. 2022;23(6):369–83.

    Article  CAS  PubMed  Google Scholar 

  19. Joehanes R, et al. Epigenetic signatures of cigarette smoking. Circ Cardiovasc Genet. 2016;9:436–47.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Deary IJ, Gow AJ, Pattie A, Starr JM. Cohort profile: the Lothian Birth Cohorts of 1921 and 1936. Int J Epidemiol. 2012;41:1576–84.

    Article  PubMed  Google Scholar 

  21. Taylor AM, Pattie A, Deary IJ. Cohort profile update: the Lothian Birth Cohorts of 1921 and 1936. Int J Epidemiol. 2018;47:1042–60.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Fraser A, et al. Cohort profile: the avon longitudinal study of parents and children: ALSPAC mothers cohort. Int J Epidemiol. 2013;42:97.

    Article  PubMed  Google Scholar 

  23. Boyd A, et al. Cohort Profile: the ‘children of the 90s’–the index offspring of the Avon Longitudinal Study of Parents and Children. Int J Epidemiol. 2013;42:111–27.

    Article  PubMed  Google Scholar 

  24. Sandler DP, et al. The sister study cohort: baseline methods and participant characteristics. Environ Health Perspect. 2017;125: 127003.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Carreras-Gallo N, et al. Impact of tobacco, alcohol, and marijuana on genome-wide DNA methylation and its relationship with hypertension. Epigenetics. 2023;18:2214392.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Trejo Banos D, et al. Bayesian reassessment of the epigenetic architecture of complex traits. Nat Commun. 2020;11:2865.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Krishnan HR, Sakharkar AJ, Teppen TL, Berkel TDM, Pandey SC. The epigenetic landscape of alcoholism. Int Rev Neurobiol. 2014;115:75.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Yousefi PD, et al. Validation and characterisation of a DNA methylation alcohol biomarker across the life course. Clin Epigenet. 2019;11:1–12.

    Article  Google Scholar 

  29. Sullivan EV, Pfefferbaum A. Brain-behavior relations and effects of aging and common comorbidities in alcohol use disorder: a review. Neuropsychology. 2019;33:760–80.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Paul CA, et al. Association of alcohol consumption with brain volume in the Framingham study. Arch Neurol. 2008;65:1363–7.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Fein G, Shimotsu R, Di Sclafani V, Barakos J, Harper C. Increased white matter signal hyperintensities in long-term abstinent alcoholics compared to non-alcoholic controls. Alcohol Clin Exp Res. 2009;33:70.

    Article  PubMed  Google Scholar 

  32. Bernabeu E, et al. Refining epigenetic prediction of chronological and biological age. Genome Med. 2023;15:1–15.

    Article  Google Scholar 

  33. Fan J, Lv J. Sure independence screening for ultra-high dimensional feature space. J R Stat Soc Series B Stat Methodol. 2006;70:849–911.

    Article  Google Scholar 

  34. Pidsley R, et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol. 2016;17:1–17.

    Article  Google Scholar 

  35. Sugden K, et al. Patterns of reliability: assessing the reproducibility and integrity of DNA methylation measurement. Patterns. 2020;1: 100014.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Kezer CA, Simonetto DA, Shah VH. Sex differences in alcohol consumption and alcohol-associated liver disease. Mayo Clin Proc. 2021;96:1006–16.

    Article  PubMed  Google Scholar 

  37. Ceylan-Isik AF, McBride SM, Ren J. Sex difference in alcoholism: Who is at a greater risk for development of alcoholic complication? Life Sci. 2010;87:133.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Grant OA, Wang Y, Kumari M, Zabet NR, Schalkwyk L. Characterising sex differences of autosomal DNA methylation in whole blood using the Illumina EPIC array. Clin Epigenetics. 2022;14:1–16.

    Article  Google Scholar 

  39. Wilson LE, et al. Alcohol and DNA methylation: an epigenome-wide association study in blood and normal breast tissue. Am J Epidemiol. 2019;188:1055.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Israel Y, Salazar I, Rosenmann E. Inhibitory effects of alcohol on intestinal amino acid transport in vivo and in vitro. J Nutr. 1968;96:499–504.

    Article  CAS  PubMed  Google Scholar 

  41. Adibi SA, Baraona E, Lieber CS. Effects of ethanol on amino acid and protein metabolism. Med Nutr Complicat Alcohol. 1992. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-1-4615-3320-7_5.

    Article  Google Scholar 

  42. Li G, et al. Targeting hepatic serine-arginine protein kinase 2 ameliorates alcohol-associated liver disease by alternative splicing control of lipogenesis. Hepatology. 2023;78(5):1506–24.

    Article  PubMed  Google Scholar 

  43. Smith BH, et al. Cohort Profile: Generation Scotland: Scottish Family Health Study (GS: SFHS). The study, its participants and their potential for genetic research on health and illness. Int J Epidemiol. 2013;42:689–700.

    Article  PubMed  Google Scholar 

  44. McCartney DL, et al. Investigating the relationship between DNA methylation age acceleration and risk factors for Alzheimer’s disease. Alzheimer’s & Dement: Diagn, Assess & Dis Monit. 2018;10:429–37.

    Google Scholar 

  45. McCartney DL, et al. An epigenome-wide association study of sex-specific chronological ageing. Genome Med. 2019;12:1–11.

    Article  PubMed  PubMed Central  Google Scholar 

  46. McCartney DL, et al. Identification of polymorphic and off-target probe binding sites on the Illumina Infinium MethylationEPIC BeadChip. Genom Data. 2016;9:22.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Pidsley R, et al. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics. 2013;14:1–10.

    Article  Google Scholar 

  48. Marioni RE, et al. DNA methylation age of blood predicts all-cause mortality in later life. Genome Biol. 2015;16:1–12.

    Article  CAS  Google Scholar 

  49. Relton CL, et al. Data resource profile: accessible resource for integrated epigenomic studies (ARIES). Int J Epidemiol. 2015;44:1181–90.

    Article  PubMed  Google Scholar 

  50. Min JL, Hemani G, Smith GD, Relton C, Suderman M. Meffil: efficient normalization and analysis of very large DNA methylation datasets. Bioinformatics. 2018;34:3983–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Northstone K, et al. The Avon Longitudinal Study of Parents and Children (ALSPAC): an update on the enrolled sample of index children in 2019. Wellcome Open Res. 2019;4:51.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Northstone K, et al. The avon longitudinal study of parents and children ALSPAC G0 Partners: a cohort profile. Wellcome Open Res. 2023;8:37.

    Article  Google Scholar 

  53. Harris PA, et al. Research electronic data capture (REDCap)–a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42:377–81.

    Article  PubMed  Google Scholar 

  54. Kresovich JK, et al. Methylation-based biological age and breast cancer risk. J Natl Cancer Inst. 2019;111:1051–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Kresovich JK, Sandler DP, Taylor JA. Methylation-based biological age and hypertension prevalence and incidence. Hypertension. 2023;80:1213–22.

    Article  CAS  PubMed  Google Scholar 

  56. Xu Z, Niu L, Li L, Taylor JA. ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucl Acids Res. 2016;44:e20.

    Article  PubMed  Google Scholar 

  57. Xu Z, Langie SAS, De Boever P, Taylor JA, Niu L. RELIC: a novel dye-bias correction method for Illumina Methylation BeadChip. BMC Genomics. 2017;18:1–7.

    Article  CAS  Google Scholar 

  58. Niu L, Xu Z, Taylor JA. RCP: a novel probe design bias correction method for Illumina Methylation BeadChip. Bioinformatics. 2016;32:2659.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Census 1951, Classification of Occupations - Great Britain. General Register Office - Google Books. https://books.google.co.uk/books/about/Census_1951_Classification_of_Occupation.html?id=0iUrAQAAIAAJ&redir_esc=y.

  60. Zigmond AS, Snaith RP. The hospital anxiety and depression scale. Acta Psychiatr Scand. 1983;67:361–70.

    Article  CAS  PubMed  Google Scholar 

  61. Wardlaw JM, et al. Brain aging, cognition in youth and old age and vascular disease in the Lothian Birth Cohort 1936: rationale, design and methodology of the imaging protocol. Int J Stroke. 2011;6:547–59. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/j.1747-4949.2011.00683.x.

    Article  PubMed  Google Scholar 

  62. Valdés Hernández MDC, Ferguson KJ, Chappell FM, Wardlaw JM. New multispectral MRI data fusion technique for white matter lesion segmentation: Method and comparison with thresholding in FLAIR images. Eur Radiol. 2010;20:1684–91.

    Article  PubMed Central  Google Scholar 

  63. Geweke, J. Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments. (1991) https://doiorg.publicaciones.saludcastillayleon.es/10.21034/SR.148.

Download references

Acknowledgements

Generation Scotland: We are grateful to all the families who took part, the general practitioners, and the Scottish School of Primary Care for their help in recruiting them and the whole GS team that includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, healthcare assistants, and nurses. ALSPAC: We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses.

Funding

Generation Scotland: Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates (CZD/16/6) and the Scottish Funding Council (HR03006). Genotyping and DNA methylation profiling of the Generation Scotland samples were carried out by the Genetics Core Laboratory at the Edinburgh Clinical Research Facility, Edinburgh, Scotland, and were funded by the Medical Research Council UK and the Wellcome Trust (Wellcome Trust Strategic Award STratifying Resilience and Depression Longitudinally (STRADL; Reference 104036/Z/14/Z) and 220857/Z/20/Z. The DNA methylation data assayed for Generation Scotland were partially funded by a 2018 NARSAD Young Investigator Grant from the Brain & Behavior Research Foundation (Ref: 27404; awardee: Dr David M Howard) and by a JMAS SIM fellowship from the Royal College of Physicians of Edinburgh (Awardee: Dr Heather C Whalley). Lothian Birth Cohorts: We thank the LBC1921 and LBC1936 participants and team members who contributed to these studies. The LBC1921 was supported by the UK’s Biotechnology and Biological Sciences Research Council (BBSRC), The Royal Society, and The Chief Scientist Office of the Scottish Government. The LBC1936 is supported by the BBSRC, and the Economic and Social Research Council [BB/W008793/1] (which supports S.E.H.), Age UK (Disconnected Mind project), the Milton Damerel Trust, the Medical Research Council (MR/M01311/1), and the University of Edinburgh. Methylation typing of LBC1936 was supported by the Centre for Cognitive Ageing and Cognitive Epidemiology (Pilot Fund award), Age UK, The Wellcome Trust Institutional Strategic Support Fund, The University of Edinburgh, and The University of Queensland. Genotyping was funded by the BBSRC (BB/F019394/1). S.R.C. is supported by a Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (Grant Number 221890/Z/20/Z). ALSPAC: The UK Medical Research Council and Wellcome (Grant ref: 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors and Matthew Suderman will serve as guarantors for the contents of this paper. A comprehensive list of grants funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf). Funding for ALSPAC DNAm measurements was supported by the Wellcome (102215/2/13/2); the University of Bristol; the UK Economic and Social Research Council (ES/N000498/1); the UK Medical Research Council (MC_UU_12013/1, MC_UU_12013/2); and the John Templeton Foundation (60828). MS and PY work within the MRC Integrative Epidemiology Unit at the University of Bristol, which is supported by the Medical Research Council (MC_UU_00011/5). Sister Study: This research was supported by the Intramural Research Program of the National Institutes of Health (Z01-ES049033, Z01-ES049032, Z01-ES044005). A.D.C. was supported by a Medical Research Council PhD Studentship in Precision Medicine with funding from the Medical Research Council Doctoral Training Program and the University of Edinburgh College of Medicine and Veterinary Medicine. R.F.H is supported by an MRC IEU Fellowship. M.R.R. was funded by Swiss National Science Foundation Eccellenza Grant PCEGP3-181181 and by core funding from the Institute of Science and Technology Austria. E.B. and R.E.M. are supported by Alzheimer’s Society major project grant AS-PG-19b-010. This research was funded in whole, or in part, by the Wellcome Trust (104036/Z/14/Z, 220857/Z/20/Z, and 221890/Z/20/Z). For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Author information

Authors and Affiliations

Authors

Contributions

E.B. analysed the data. A.D.C. developed the external data replication pipeline. J.K.K. and M.S. replicated results in the Sister Study and ALSPAC cohorts, respectively. D.L.M., R.F.H., J.C., MdC.V.H., S.M.M., M.E.B., J.M.W., Y.X., D.P.S., A.C., S.E.H., A.M.M., J.A.T, P.Y., S.R.C., and K.L.E. were involved in the data generation. E.B., R.E.M., and C.A.V. drafted the initial manuscript. E.B., M.R.R., C.A.V., and R.E.M. designed the study. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Catalina A. Vallejos or Riccardo E. Marioni.

Ethics declarations

Ethics approval and consent to participate

All components of Generation Scotland received ethical approval from the NHS Tayside Committee on Medical Research Ethics (REC Reference Number: 05/S1401/89). All participants provided broad and enduring written informed consent for biomedical research. Generation Scotland has also been granted Research Tissue Bank status by the East of Scotland Research Ethics Service (REC Reference Number: 15/0040/ES), providing generic ethical approval for a wide range of uses within medical research. This study was performed in accordance with the Helsinki declaration. Ethical approval for the LBC1921 and LBC1936 studies was obtained from the Multi-Centre Research Ethics Committee for Scotland (MREC/01/0/56) and the Lothian Research Ethics committee (LREC/1998/4/183; LREC/2003/2/29). In both studies, all participants provided written informed consent. These studies were performed in accordance with the Helsinki declaration. Ethical approval for the ALSPAC study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Consent for biological samples has been collected in accordance with the Human Tissue Act (2004). Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time. Ethical approval for the Sister Study was obtained from the Institutional Review Board of the National Institutes of Health, who continues to oversee the study. Written informed consent was collected at study enrolment by all participants.

Competing interests

R.E.M is an advisor to the Epigenetic Clock Development Foundation. R.F.H. has received consultant fees from Illumina. R.E.M and R.F.H. have received consultant fees from Optima Partners. D.L.McC is currently employed in part-time capacity by Optima Partners. M.R.R. receives research funding from Boehringer Ingelheim. All other authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bernabeu, E., Chybowska, A.D., Kresovich, J.K. et al. Blood-based epigenome-wide association study and prediction of alcohol consumption. Clin Epigenet 17, 14 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13148-025-01818-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13148-025-01818-y