Skip to main content

Single-cell transcriptomics reveal the prognostic roles of epithelial and T cells and DNA methylation-based prognostic models in pancreatic cancer

Abstract

Background

Pancreatic adenocarcinoma (PDAC) exhibits a complex microenvironment with diverse cell populations influencing patient prognosis. Single-cell RNA sequencing (scRNA-seq) was used to identify prognosis-related cell types, and DNA methylation (DNAm)-based models were developed to predict outcomes based on their cellular characteristics.

Methods

We integrated scRNA-seq, bulk data, and clinical information to identify key cell populations associated with prognosis. The TCGA dataset was used for validation, and cell composition was inferred from DNAm data. Prognostic models were constructed based on cell-type-specific DNAm markers, and genomic features were compared across risk groups. Nomograms were created to assess treatment responses in different risk levels.

Results

Epithelial and T cells were major prognostic factors. Genomic analysis showed that epithelial cells in PDAC followed a malignant trajectory. DNAm data from TCGA confirmed the association of higher epithelial and T cell proportions with worse prognosis. Prognostic models based on DNAm markers of these cells effectively predicted patient survival, especially 5-year overall survival (AUC = 0.834). High-risk group with epithelial cell model showed altered pathways (tight junctions, NOTCH, and P53 signaling), while high-risk group with T cell model had changes in glycolysis, hypoxia, and NOTCH signaling, with more KRAS or TP53 mutations. Low-risk groups in the T cell model displayed stronger antitumor immune responses. Treatment predictions and nomograms were developed for clinical use.

Conclusions

scRNA-seq and DNAm data integration enabled the creation of predictive models based on epithelial and T cell-specific methylation patterns, offering robust prognosis prediction for PDAC patients.

Introduction

Pancreatic cancer stands as a highly aggressive malignancy within the gastrointestinal tract, characterized by a bleak prognosis, and it may become the second leading cause of cancer-related deaths by 2030 [1, 2]. Pancreatic ductal adenocarcinoma (PDAC), constituting the predominant pathological type of pancreatic cancer, accounts for 95% of cases, with a mere 5% 5-year overall survival (OS) [3]. Due to its distinctive anatomical location and non-specific clinical symptoms, a significant proportion of PDAC patients receive a diagnosis at an advanced or metastatic stage, precluding the opportunity for surgical intervention in over 80% of cases [4]. Concurrently, a substantial number of patients undergoing surgery succumb to recurrence and/or metastasis within 2 years [5]. Moreover, the lack of effective treatments has resulted in minimal improvement in the prognosis and survival rates of patients over the years [6,7,8]. A critical imperative exists to elucidate the molecular mechanisms governing PDAC progression and unearth prognostic signatures for patients.

The dynamic interplay between pancreatic epithelial/cancer cells and stromal cells orchestrates the tumor microenvironment (TME), a pivotal factor in PDAC progression [9]. T cells, particularly FOXP3 + Tregs, critically shape the immunosuppressive TME in PDAC [10, 11]. The elimination of stromal cells leads to a more aggressive disease, and depleting myofibroblasts in mice has been linked to reduced overall survival [12]. Epithelial cells, serving as the origin of malignant neoplastic cells, significantly contribute to tumor development [13, 14]. Recently, single-cell RNA sequencing (scRNA-seq) has enabled high-resolution, cell-type-specific transcriptome analysis, offering insights into exact cellular and molecular changes at the single-cell level. This approach facilitates the dissection of hidden heterogeneity within cell populations [15,16,17]. Peng et al. revealed two ductal cell subpopulations in PDAC, characterized by varying malignancy levels and distinct cell markers [18]. Elyada et al. identified three types of cancer-associated fibroblasts (CAFs): myofibroblastic CAFs, inflammatory CAFs, and antigen-presenting CAFs, with their plasticity offering promising therapeutic avenues to improve patient prognosis [19]. However, it is crucial to acknowledge potential limitations stemming from the typically small sample sizes used in scRNA-seq analyses. To augment our understanding of the relationship between cell subpopulations and prognosis, it is imperative to validate scRNA-seq findings by integrating a larger patient cohort with clinical phenotypes (such as survival information). This approach can provide deeper insights into the implications of cellular subtypes for patient prognosis.

Distinctive functions and phenotypes exhibited by various cell types within tissues are, in part, regulated by epigenetic mechanisms, among which DNA methylation (DNAm) plays an indispensable role in maintaining cell identity and functional phenotypes by modulating gene expression [20,21,22,23,24]. Aberrant methylation of CpG islands and promoter regions results in the transcriptional silencing of pancreatic cancer suppressor genes and/or abnormal activation of oncogenes, thereby promoting its initiation and progression [25,26,27,28]. A prognostic model was developed to predict the survival of PDAC patients based on four DNAm-driven genes [29]. Additionally, Xiao et al. established novel prognosis-related molecular subgroups based on the DNAm signature [30]. Nevertheless, further research is imperative to elucidate whether these DNAm differences can accurately represent alterations in specific cell populations and to gain a more comprehensive understanding of the pathological processes underlying these variations. Efforts have been made to infer cell population compositions in DNAm profiles based on paired RNA data and scRNA-seq cell clusters [31,32,33]. In other words, it enables researchers to dissect the alterations in cell population underlying changes in DNAm profiles and explore their roles in prognosis.

In this study, we dissected the intratumoral heterogeneity of PDAC, identifying key prognostic cell subpopulations through scRNA-seq, notably epithelial and T cells with distinct genomic and functional characteristics. By leveraging these insights, we developed prognostic models grounded in DNAm markers specific to these cell types, providing a means to translate single-cell transcriptomic discoveries into detectable signals within bulk tissue. This approach not only deepens our understanding of PDAC’s molecular landscape but also offers a practical strategy to bridge the gap between advanced single-cell technologies and clinically actionable biomarkers, paving the way for more precise prognosis and personalized treatment strategies.

Methods

Patients, clinical information, and public datasets

Tumor and adjacent nonmalignant pancreatic tissue samples were collected from eight PDAC patients who underwent resection at Zhejiang Cancer Hospital. All participants provided informed consent, and the study was approved by the Ethics Committees of Zhejiang Provincial People’s Hospital (ZJPPHEC2023I-089) and Zhejiang Cancer Hospital (IRB-2023-1039). PDAC diagnoses were confirmed by two independent pathologists. Clinical details are provided in Table 1, and these samples were subsequently analyzed using DNA methylation microarray experiments.

Table 1 Clinical characteristics of PDAC patients

PDAC single-cell RNA sequencing datasets (GSE155698, GSE201230, GSE212966, GSE214295, and GSE156405) were retrieved from the Gene Expression Omnibus (GEO) [34]. GSE155698 includes PDAC samples from surgical (n = 6) and fine needle biopsy (n = 10) specimens, GSE201230 comprises fetal pancreas samples at 15–20 weeks gestational age. GSE212966, GSE214295, and GSE156405 include six, three, and five PDAC samples, respectively.

TCGA-PAAD project, containing bulk RNA sequencing (RNA-seq) data, DNAm, and somatic mutation data, was downloaded from University of California Santa Cruz (UCSC) Xena (185 tumor and 10 normal samples) [35, 36]. Latest clinicopathological data were obtained from The Cancer Genome Atlas (TCGA) database as of October 10, 2020. ICGC-PACA-AU project, including bulk RNA-seq, DNAm, and clinical data (250 tumor samples), were sourced from the International Cancer Genome Consortium (ICGC) database [37]. Meanwhile, six bulk RNA-seq datasets comprising 659 samples were selected as external validation cohort. Five datasets were obtained from GEO, including GSE85916 (79 samples), GSE28735 (42 samples), GSE57495 (63 samples), GSE62452 (64 samples), GSE71729 (123 samples), and the ArrayExpress dataset E-MTAB-6134 containing 288 samples, was included. During the processing of expression matrices, platform annotation files were used to map probes to gene symbols, and for genes with multiple probes, the average expression was taken as the gene’s expression level. The “SVA” R package was used to remove batch effects between different datasets [38]. Both transcriptome information and clinical information were downloaded for survival analysis. Pancreatic cancer samples with survival times exceeding 30 days were included in the analysis. As publicly accessible databases, the datasets from GEO, TCGA, ICGC, and ArrayExpress are freely available for download from their respective websites.

ScRNA-seq data processing

Raw single-cell data were downloaded and processed using the “Seurat” R package (v4.3.0.1) for quality control [39]. All functions were run with default parameters unless specified otherwise. Cells were filtered out if they met the following criteria: (1) fewer than 200 or more than 8000 genes; (2) genes expressed in fewer than 3 cells; (3) mitochondrial gene content exceeded 25%.

Data were normalized using “NormalizedData” (scale factor = 10,000), and 2000 highly variable genes were identified with “FindVariableFeatures”. Gene expression levels were scaled via “ScaleData”. Batch effects were mitigated with the “Harmony” package for dataset integration across different specimens, and the “SCTransform” function was used to remove cell cycle influences.

Principal component analysis (PCA) was performed on 2000 genes using “RunPCA”, significant principal components (PCs) were identified by the “ElbowPlot”, and the top 15 PCs were chosen for further dimension-reduction analysis. Cell clustering was conducted using “FindNeighbors” and “FindClusters” functions, with Clustering trees showed that resolution set to 0.4 achieved the clearer clustering results by “clustree”. The cell atlas was visualized using Uniform Manifold Approximation and Projection (UMAP) by “RunUMAP” [40].

Cell cluster annotation

Distinct cell clusters were identified and annotated using established cell markers from the literature [18, 41,42,43,44,45]. The COSG method (v0.9.0), based on cosine similarity, was employed for accurate and efficient marker gene identification [46]. Key markers included REG1A, REG1B, PRSS1, CTRB1, CTRB2 (acinar); CD79A, MS4A1, CD79B, VPREB3 (B cell); CDH5, VWF, PLVAP, CLDN5, RAMP2 (endothelial cell); EPCAM, KRT19, KRT8, KRT18, KRT7 (epithelial cell); LUM, DCN, COL1A1, SFRP2 (fibroblast); NAMPT, FCGR3B (granulocytes); AIF1, CD68, CD14 (macrophage); CLU, CPA3, TPSAB1 (mast cell); NKG7, GZMB (NK cell); IGJ, CD79A (plasma cell); RGS5, NDUFA4L2 (stellate cell); CD3D, CD3E, CD3G, IL7R, CD2 (T cell). Expression levels of well-established marker genes and the top five differentially expressed genes for each cell populations were depicted through violin plots and dot plots, respectively. The “VlnPlot”, “FeaturePlot”, and “DotPlot” functions were utilized to present the expression levels of specific marker genes. Sub-clustering of T cells followed the aforementioned methodology.

Prognostic cell cluster identification

The “SCISSOR” R package (v2.0.0), developed by Sun et al. [47], was employed to discern bulk phenotype-associated cell subpopulations. SCISSOR quantifies the similarity between single-cell and bulk data by gauging Pearson correlations for each cell-bulk pair, followed by optimizing a regression model to associate cells to phenotypes using a sparsity penalty and graph regularization, identifying the most relevant cells associated with the phenotype with high confidence. We applied SCISSOR to integrate PDAC bulk data (ICGC-PACA-AU), single-cell data (GSE155698, GSE212966, GSE214295, and GSE156405), and survival data to identify survival-associated cells, respectively. Adhering to SCISSOR’s criteria, we set the alpha value at 0.5 to restrict cells (20% of total). A 100-fold cross-validation reliability test (P < 0.05) was performed using “reliability.test” function to confirm the robustness of the identified prognostic cells. SCISSOR classified cells into Surv (worse survival), Surv+ (better survival), or background (no significant association).

Differentially expressed genes (DEGs) in prognostic-related (PR) cells

DEGs in PR cells were identified using the “FindMarkers” function from the Seurat package. To reduce the impact of inherent biological functions, differential expression analysis was performed within each cell type individually. Since PR cells were unevenly distributed across most cell types, two comparison strategies were employed to ensure robust statistical analysis. PR cells were compared to background cells when PR cells dominated a cell type, while Surv+ and Surv cells were compared when PR cell proportions were more balanced, such as in epithelial and macrophage cells. Genes with an absolute log2 fold change (log2FC) ≥ 0.5 and a Bonferroni-adjusted P value < 0.05 were considered DEGs.

Single-cell CNV analysis

Single-cell copy number variation (CNV) analysis was conducted using the “inferCNV” R package (v1.15.3) [48]. Chromosomal variations were inferred by comparing gene expression intensities across genomic regions in tumor cells relative to reference immune cells. The analysis utilized a raw counts matrix, an annotation file, and a gene/chromosome position file. For epithelial cells, CNV was calculated with immune cells as the reference.

Key parameters included “denoise” default hidden Markov model (HMM) settings and a cutoff of 0.1. The Bayesian latent mixture model was applied to ascertain posterior probabilities and reduce false-positive CNV calls. A heatmap visualizing expression intensities across each chromosome was generated, with regions of gain in red and loss in blue. CNV scores were computed as the quadratic sum of CNV regions.

Modular score calculation and signal visualization

Gene expression modular scores were calculated and visualized using the Seurat R package (v4.3.0.1). Based on previously reported studies [16, 41, 49], we selected markers for epithelial-mesenchymal transition (EMT), cancer stem cells, and genes involved in malignant progression in PDAC (Table S1). The “AddModuleScore” function was employed to compute cancer stem cell, mesenchymal, and malignant scores for epithelial cells. Expression of relevant marker genes were projected onto dimension-reduction plots using “FeaturePlot” function, and the signal distribution across PR epithelial cells was visualized with “VlnPlot” function.

Cell cluster annotation in fetal pancreata

Cell-type markers and the “SingleR” R package (v2.0.0) were used to identify and annotate cell clusters within the GSE201230 dataset [50]. Based on previous studies [51, 52], we selected markers for specific pancreatic cell types: INS, GCG, PAX6 (endocrine); CPA1, PRSS1, CLPS (acinar); COL3A1, DCN, VIM (mesenchymal); TRAC, LYZ, RAC2, PTPRC (immune); VWF, ADGRL4, ANGPT2, PECAM1 (endothelial); CRYAB, CDH19, SOX10 (Schwann); HBB, HBG2, HBA1 (erythroblasts); and EPCAM, KRT19, PROX1, PDX1 (epithelial).

Pseudo-time trajectory analysis

To fully understand the gene expression patterns and evolution of epithelial PR cell subtypes in different disease states, the “Monocle” R package was used to perform pseudo-time analysis on normal epithelial cells from the GSE201230 dataset and epithelial PR cells from the GSE155698 dataset [53]. A Monocle object was created using the “newCellDataSet” function with the parameter expressionFamily = negbinomial.size. Highly variable genes were selected for pseudo-time analysis via the “dispersionTable” function. Trajectory analysis was conducted using genes with a mean expression ≥ 0.1 and a dispersion_empirical ≥ 1 * dispersion_fit, as determined by Monocle2, to order cells along pseudo-time. Dimensionality reduction was performed using the “reduceDimension” function (reduction_method = “DDRTree”, max_components = 2), and the cell trajectory was visualized using the ‘plot_cell_trajectory’ function, displaying the minimum spanning tree. Trajectory also visualized with tree plot to represent the hierarchical relationships between cells.

Deconvolution bulk DNAm profiles based on the scRNA-seq using scDeconv

Bulk DNAm data from ICGC and TCGA cohorts were derived from tissue-level samples, limiting the ability to explore cell-type heterogeneity. To address this, we applied the scDeconv algorithm, which performs high-resolution cell-type deconvolution on bulk DNAm profiles by integrating paired bulk RNA-DNAm data with scRNA-seq data in a trans-omics approach [33]. The “scDeconv” R package was used to estimate cell-type proportions for each TCGA-PAAD patient based on scRNA-seq data (GSE155698), paired bulk RNA and DNAm data were provided. Additionally, this algorithm trained an ensemble model using DNAm features, enabling the prediction of relative cell types in other bulk DNAm datasets even without paired bulk RNA-seq data, this model further applied to the ICGC-PACA-AU dataset and our 8 PDAC samples. All parameters were set to default.

Deconvolution bulk RNA-seq data based on the scRNA-seq using BayesPrism

Bulk RNA-seq data from pancreatic cancer samples were normalized to transcripts per million (TPM) and log2(TPM + 1) to ensure data comparability. Genes with > 50% missing values were excluded [54]. For both the TCGA and ICGC databases, bulk transcriptomic data were matched with corresponding DNA methylation data for integrated analysis by using unique sample IDs. BayesPrism algorithm (“BayesPrism” R package) was used to perform high-resolution deconvolution of bulk RNA-seq data from integrated GEO datasets (659 samples) based on the single-cell RNA-seq data (GSE155698) [55], and the cell proportion was estimated by their gene expression levels.

DNAm profiling for Zhejiang PDAC samples

Genomic DNA from Zhejiang PDAC samples was extracted using the QIAamp DNA kit (Qiagen) and quantified with a Qubit HS DNA assay (Thermo Fisher Scientific). DNA methylation analysis was performed on ≥ 0.5 µg of bisulfite-converted DNA using the EZ methylation Kit (Zymo Research), followed by amplification, and hybridization to Infinium Human MethylationEPIC BeadChip (850 K, Illumina). Arrays were scanned using the Illumina HiScan SQ scanner.

DNAm data processing

Quality control and analysis of DNAm data were conducted using the “ChAMP” R package [56]. DNAm data from TCGA and ICGC were generated using the Illumina Infinium Human Methylation 450 K BeadChip. Raw fluorescence data were initially stored in IDAT files and then converted to β-values by “champ.load” function. Methylation levels were scored on a scale of 0 to 1 (unmethylated to totally methylated). Subsequent cell proportion analyses based on DNA methylation profiles utilized the intersecting methylation sites shared between the TCGA-PAAD , ICGC-PACA-AU dataset, and the Zhejiang samples. CpG sites with > 70% missing β-values, located on sex chromosomes, or linked to single-nucleotide polymorphisms (SNPs) were excluded. Missing values were imputed using a K-nearest neighbor algorithm, and probe sites from 2 kb upstream to 200 bp downstream of the transcription start site were selected. Mean β-value across multiple CpG sites for each gene was calculated as the average methylation level. Batch effects were corrected via beta mixture quantile expansion (BMIQ) normalization and singular value decomposition (SVD).

Differentially methylated probe (DMP) analysis

DMPs between 185 tumor and 10 normal samples in TCGA-PAAD were identified using the “champ.DMP” function. P values were adjusted using the Benjamini–Hochberg method to control the false discovery rate (FDR), and the delta β-value was calculated as the absolute difference between the group mean β-value for each probe. Probes with FDR < 0.01 and |delta β-value|> 0.2 were defined as DMPs. Volcano plot visualized the DMPs distribution, and gene annotations were obtained using GRCh38 from the GENCODE project.

Consensus clustering

Unsupervised consensus clustering of the TCGA-PAAD cohort was performed using DMPs and the “ConsensusClusterPlus” R package [57]. This methodology grouped cases into distinct clusters based on signatures, iterated 1000 times using bootstraps with K-means and Euclidean distance to ensure cluster consistency. Each bootstrap used 80% of TCGA samples, with cluster numbers (k) were set from 2 to 9. The optimal number of clusters was determined by the cumulative distribution function (CDF) and Delta plot, ensuring stabilization of the CDF plot without significant increments.

Survival analysis

Kaplan–Meier (KM) survival analysis was conducted using the “survival” and “survminer” R packages. Log-rank tests were used to compare overall survival between patient groups to assess the statistical differences. We investigated the associations between estimated cell-type abundance and prognosis. PDAC patients were divided into low and high cell proportion groups using the optimal cutoff value calculated via the “surv_cutpoint” function, which is also used to determine the cutoff expression value for methylation sites-associated genes.

Prognostic model construction and validation

Prognostic risk models were constructed using 178 PDAC samples from the TCGA dataset as the training set, and validated with external set of 250 PDAC samples from the ICGC dataset. Based on a human methylome atlas [58], which offers a repository of tissue-specific biomarkers, we curated 25 pancreatic ductal cell-specific CpG sites (CpGs) and 25 T cell-specific CpGs as potential prognostic markers. Univariate Cox regression analysis (P < 0.05) identified prognostic CpGs using “survival” R package, followed by Least Absolute Shrinkage and Selection Operator (LASSO) regression with the “glmnet” R package to reduce overfitting and remove highly similar sites. Multivariate Cox regression was applied to the selected CpGs, with tenfold cross-validation calculating the confidence interval for each lambda value.

Risk scores for each patient were calculated using the formula based on the β values of CpG sites and the accompanying regression coefficients:

$${\text{Formula}}:{\text{ RiskScore}}\, = \,\left( {\beta_{{{\text{CpG1}}}} \, \times \,{\text{Coef}}_{{{\text{CpG1}}}} } \right)\, + \,\left( {\beta_{{{\text{CpG2}}}} \, \times \,{\text{Coef}}_{{{\text{CpG2}}}} } \right)\, + \cdots \, + \,\left( {\beta_{{{\text{CpGn}}}} \, \times \,{\text{Coef}}_{{{\text{CpGn}}}} } \right).$$

Patients were classified into high- and low-risk groups based on the median risk score, KM analysis and log-rank tests assessed OS between risk groups. Time-dependent receiver operating characteristic (ROC) curves were generated with “timeROC” and “survminer” R packages, and the area under the ROC curve (AUC) values were used to evaluated the prognostic model accuracy. The ICGC cohort (ICGC-PACA-AU) validated the reliability and suitability of the models.

Independent prognostic analysis and nomogram construction

Univariate and multivariate Cox regression analyses were conducted on risk scores and clinicopathological data (age, gender, TNM stage, alcohol/smoking history, histologic grade, tumor stage, and tumor grade) to identify independent prognostic factors. Nomogram models predicting 1-, 2-, and 3-year OS were constructed based on selected independent prognostic factors using the “rms” R package. Calibration curves and decision curve analysis (DCA) were used to compare predicted OS with actual outcomes, assessing predictive accuracy.

Clinical characteristics and relationship with Moffitt subtypes in risk groups

The distribution of clinical characteristics between the risk groups was based on the patient clinical data. We explored the association between risk groups and Moffitt subtypes, with basal-like subtypes linked to worse outcome [59]. Moffitt subtype classification of TCGA samples was performed using the Purity Independent Subtyping of Tumors (PurIST) algorithm, a clinically validated single-sample classifier for pancreatic cancer subtypes [60].

Somatic mutation analysis

Somatic mutations in TCGA PDAC samples were analyzed using mutation annotation format (MAF) data with the “Maftools” R package [61], and visualized through waterfall plots by “oncoplot” function. Tumor mutation burden (TMB) was computed by “TMB” function, and mutation frequencies and TMB were compared between risk groups.

Functional enrichment analysis and pathway analysis

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed using the “clusterProfiler” R package (FDR < 0.05) [62] to identify enriched pathways and biological processes, applying the Hypergeometric Test. Gene Set Variation Analysis (GSVA) evaluated pathway activity variation between risk groups (FDR < 0.05) [63], utilizing the “c2.cp.kegg.v2023.1.Hs.symbols.gmt” or “h.all.v2023.1.Hs.symbols.gmt” from the Molecular Signature Database [64], and results visualized via “pheatmap” and “ggplot2” R packages. Correlations between biological functions and RiskScores (r > 0.3) were calculated using “corrplot” package. The “PROGENy” R package assessed pathway deregulation across 14 typical tumor-related pathways [65], comparing groups using the Wilcoxon test.

Immune infiltration analysis

Immune cell infiltration in the TME was estimated using the RNA-seq expression matrix of TCGA-PAAD with XCELL [66], TIMER [67], and CIBERSORT [32] algorithms, implemented via the IOBR R package [68]. Statistical differences between groups were assessed using the Wilcoxon rank-sum test. Stromal and immune cell levels were quantified using the “ESTIMATE” R package, generating StromalScore, ImmuneScore, and EstimateScore. Single-sample gene set enrichment analysis (ssGSEA) was applied to calculate immune infiltration scores for each sample, with correlations to RiskScores examined using the “corrplot” R package.

Six immune subtypes, defined by variations in macrophage/lymphocyte signatures, Th1:Th2 ratios, intratumoral heterogeneity, aneuploidy, neoantigen load, cell proliferation, and immunomodulatory gene expression, were classified using the “ImmuneSubtypesClassifier” R package [69]. Additionally, we analyzed the correlation between RiskScores and the activities of steps in the cancer-immunity cycle using data from the Tracking Tumor Immunophenotype (TIP) database [70].

Immunotherapy and chemotherapy response prediction

We selected representative immune checkpoints (ICs) from previous studies and compared their expression across groups. The Tumor Immune Dysfunction and Rejection (TIDE) online algorithm was used to assess immune escape potential for TCGA samples [71]. Using tumor transcriptional data and the Genomics of Drug Sensitivity in Cancer (GDSC) [72], we calculated half-maximal inhibitory concentration (IC50) values for common PDAC chemotherapeutic drugs via the “pRRophetic” R package [73]. Sensitivity to chemotherapy and immunotherapy between risk groups was compared using the Wilcoxon test.

Statistical analysis

R (v4.1.3) and associated packages were used for statistical analyses and data visualization. Kaplan–Meier survival analysis with log-rank tests compared OS across patient groups. LASSO and Cox regression were employed for predictive model development, while Pearson’s correlation assessed correlation coefficients. Differences between normally distributed groups were evaluated using the Student’s t test, and the nonparametric Wilcoxon rank-sum test was applied for continuous variables. The chi-square test analyzed somatic mutation frequency and immunotherapy response, while functional enrichment used the Hypergeometric test. Cox regression significance was assessed with the Wald test. P values and FDR q-values < 0.05 were considered statistically significant.

Results

Prognostic cell populations identified via scRNA-seq: epithelial and T cells

The study flowchart is shown in Fig. 1. We analyzed the scRNA-seq dataset for PDAC (GSE155698), consisting of 6 surgical and 10 fine needle biopsy specimens from untreated patients. After quality control and normalization, 43,057 cells were clustered into 25 groups and annotated to 12 cell types based on DEGs and known markers (Fig. 2A, B, D, E, Fig. S1A, B, C, D). Substantial intertumoral heterogeneity was observed across patients (Fig. 2C). To identify survival-related clusters, we integrated 250 ICGC-PACA-AU bulk samples with survival data. A total of 5,767 PR cells were identified and categorized into three types (Fig. 2F). Surv cells (4,172, 9.6%) were linked to worse survival, while Surv+ cells (1,595, 3.7%) were associated with better outcomes. Surv cells mainly originated from epithelial, T cell, and granulocyte populations (Fig. 2G), whereas Surv+ cells were mainly composed of epithelial, fibroblast, stellate, and macrophage populations (Fig. 2H). Consistent trends in the distribution of PR cell populations were observed in the GSE212966, GSE214295, and GSE156405 datasets, further supporting the robustness of the identified results (Fig. S2–S4).

Fig. 1
figure 1

Workflow of the study. Step I: PR cells in PDAC were identified through scRNA-seq, and their biological functions were assessed using GO/KEGG pathway analyses. Step II: Malignancy of PR epithelial cells was assessed using InferCNV, malignant scoring, signal visualization, and pseudo-time trajectory analyses. Step III: Cell proportions for each TCGA-PAAD sample were inferred using scDeconv algorithm. Significant differences in cell proportions between TCGA-PAAD DNAm subgroups with distinct prognoses further support the association between PR cells (epithelial and T cells) and patient prognosis. Step IV: Given the critical role of epithelial and T cells in prognosis, prognostic and nomogram models were constructed and validated leveraging the corresponding cell-type-specific signatures. Step V: Relevance analysis was conducted between risk groups determined by prognostic models, covering clinical characteristics, mutation landscapes, enrichment pathways, immune cell infiltration, ICB treatment response prediction, and drug sensitivity. PR cells, prognostic-related cells; PDAC, pancreatic ductal adenocarcinoma; scRNA-seq, single-cell RNA sequencing; GO, Gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; TCGA, The Cancer Genome Atlas Program; DNAm, DNA methylation; K-M, Kaplan–Meier; ROC, receiver operating characteristic; TME, tumor microenvironment; ICB, immune-checkpoint blockade

Fig. 2
figure 2

PDAC cells and distinct PR cells characterized through scRNA-seq analysis. A UMAP visualization depicting clusters within PDAC patient samples, with each color-coded area representing an individual cluster. B UMAP plot illustrating 43,057 cells and highlighting the predominant cell types in PDAC, with each color-coded region denoting a specific cell type. C Proportional distribution of each cell type across individual samples. D Identification of cell populations using top five DEGs. Dot color indicates average expression level, and dot size reflects percentage expression. E Expression patterns of well-established marker genes across the identified PDAC cell types. Each violin represents the gene expression density for a particular marker gene within the corresponding cell type, and violin color indicates the cell type. F UMAP plot showcasing PR cells identified by the SCISSOR algorithm, with red and blue dots representing Surv (poor survival) and Surv+ (favorable survival) cells. G, H Distribution of Surv cell populations (G) and Surv+ cell populations (H) across different cell types. PDAC, pancreatic ductal adenocarcinoma; PR cells, prognostic-related cells; UMAP, Uniform Manifold Approximation and Projection; DEGs, differentially expressed genes

In PR epithelial cells, Surv cells had upregulated genes related to RNA splicing, wound healing, and unfolded protein response, while downregulated genes were involved in the electron transport chain and aerobic respiration (Fig. 3A, C). KEGG analysis revealed associations with focal adhesion, extracellular matrix (ECM)-receptor interaction, and chemical carcinogenesis (Fig. 3B, D). Surv T cells showed upregulation in pathways linked to erythrocyte differentiation and NOD signaling, with downregulated genes involved in cytoplasmic translation, ribosome biogenesis, and rRNA processing (Fig. 3E, G). KEGG analysis suggested links to infection pathways (Fig. 3F, H). Functional enrichment for other cell types is detailed in Fig. S5 and S6.

Fig. 3
figure 3

Functional Analysis of PR Cells. A, C Representative enriched GO terms (Biological Process) for upregulated (A) and downregulated (C) genes (|log2FC|≥ 0.5 and adjusted P < 0.05) in Surv epithelial cells, respectively. B, D KEGG pathways associated with upregulated (B) and downregulated (D) genes (|log2FC|≥ 0.5 and adjusted P < 0.05) in Surv epithelial cells, respectively. E, G Representative enriched GO terms (Biological Process) for upregulated (E) and downregulated (G) genes (|log2FC|≥ 0.5 and adjusted P < 0.05) in Surv T cells, respectively. F, H KEGG pathways associated with upregulated (F) and downregulated (H) genes (|log2FC|≥ 0.5 and adjusted P < 0.05) in Surv T cells, respectively. IK Normalized expression of marker genes related to myofibroblasts (I), complement-related genes (J), and inflammatory cancer-associated fibroblasts (iCAFs) (K) across PR fibroblast subsets. L UMAP plot of T cells re-clustered from Fig. 1B, with each color-coded region representing a distinct cell type. M, N Expression of effector (M) and exhaustion markers (N) for PR T cells on the UMAP map. Color key from gray to pink indicates relative expression levels from low to high. PR cells, prognostic-related cells; GO, Gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; iCAFs, inflammatory cancer-associated fibroblasts

In PR fibroblasts, myofibroblastic cancer-associated fibroblasts (myCAF)-related markers were significantly elevated in Surv+ cells compared to background fibroblasts (Fig. 3I), indicating a myCAF-like phenotype linked to better survival [12]. Surv+ CAFs exhibited high expression of complement-related gene expression, including C3, C7, CFH, and CFI, potentially influencing immune regulation and responses (Fig. 3J). Conversely, iCAF-related markers expression was low in PR fibroblasts (Fig. 3K), and Surv+ cells showed even lower expression levels than background fibroblasts comparatively. Furthermore, CAF subtypes showed dynamic reversibility between phenotypes [74].

Recent studies reveal heterogeneity and dynamic phenotypic changes of macrophages during tumor progression [18, 75,76,77,78,79,80]. Surv macrophages (307 cells) and Surv+ macrophages (140 cells) represented only a small proportion (4.6%) of total macrophages (9699 cells), with inconsistent and unclear prognostic implications. Limited numbers of 284 Surv acinar, 112 mast, and 26 plasma cells were identified and excluded from further analysis.

T cell exhaustion in the immunosuppressive TME with poor PDAC prognosis

A total of 1,073 Surv T cells were identified, and all T cells clustered into eight subpopulations and annotated into three cell types (Fig. 3L; Fig. S7A–E). The proportion of PR T cells showed no significant variation across subpopulations (Fig. S7F), though Tregs exhibited a slightly higher prevalence of Surv cells, potentially suppressing effector T cells’ antitumor responses [81]. Low expression of effector markers (GZNB, GZLY; Fig. 3M) and universally high expression of exhaustion markers (EOMES, GZMK, TIGIT, KLF2; Fig. 3N) suggested T cell exhaustion and may lose their anticancer capability, linked to poor PDAC prognosis [16, 44, 45, 82].

Malignant nature of epithelial cells with higher malignancy in Surv subtype

Surv and Surv+ subtypes were identified in 1095 and 747 epithelial cells, respectively, but both exhibited malignant characteristics (Fig. 4A). CNV scores were significantly elevated in both compared to reference immune cells, with Surv showing higher CNV scores (P = 5 × 10⁻5; Fig. 4B). Both subtypes demonstrated EMT and tumor stemness, facilitating tumor invasion and metastasis (Table S1; Fig. 4C, D). Surv cells had higher malignancy scores, along with elevated mesenchymal markers (VIM, COL1A1; PVIM = 2 × 10–16; PCOL1A1 = 5.7 × 10–11; Fig. 4E, F, G) and stem cell markers (CD44, NOTCH2, NOTCH1, NES; PCD44 = 2 × 10–16; PNOTCH2 = 0.00096; PNOTCH1 = 3 × 10–4; PNES = 0.01; Fig. 4H, I; Fig. S1E, F). Most PR epithelial cells were classified as type II malignant ductal cells described in the previous study [18] (Fig. 4J, K). Pseudo-time analysis using PR epithelial cells and normal epithelial cells from the fetal pancreas (Fig. 4L) showed cells progressed through three states: the starting branching point (state 3) and two branches (state 1 and state 2). 15- and 18-week fetal pancreas cells were mainly at the start of trajectory, while 19- and 20-week cells were at the end of branch 2. Both Surv and Surv+ epithelial cells along branch 1 (Fig. 4M, N), with Surv cells at the terminus and Surv+ cells moving toward it (Fig. 4O). PDAC markers associated with cell proliferation and metastasis (MUC1 [83], FXYD3 [18], CEACAM6 [84], KRT19[85]) were upregulated along the late stage of branch 1 gradually, indicating tumor progression (Fig. 4P).

Fig. 4
figure 4

Malignant Features of PR Epithelial Cells. A Heatmap illustrating extensive CNVs in PR epithelial cells, with red and blue colors indicating high and low CNV levels, respectively. B Distribution of CNV scores among Surv and Surv+ epithelial cells. CE Malignant score (C), cancer stem cell score (D), and mesenchymal score (E) across Surv and Surv+ epithelial cells. F, G Comparison of expression levels of mesenchymal cell markers between Surv and Surv+ epithelial cells. H, I Comparison of expression levels of tumor stem cell markers between Surv and Surv+ epithelial cells. J, K Expression levels of type I ductal cell markers (J) and type II ductal cell markers (K) plotted on the UMAP map, respectively. Color key from gray to red indicates relative expression levels from low to high. L UMAP plot of 8,836 cells illustrating the main cell types in fetal pancreata. Each color-coded region represents a distinct cell type. M Pseudo-time analysis of PDAC Surv, Surv+ epithelial cells, and normal epithelial cells from fetal pancreata (dark green = early state, yellow = late state). N Left panel shows pseudo-time progression, and right panel shows the differentiation branches. O Density graph of pseudo-time in PDAC Surv and Surv+ epithelial cells. P Expression of representative genes (MUC1, FXYD3, CEACAM6, and KRT19) along the pseudo-time trajectory plot. Color key from purple to red indicates relative expression levels from low to high. PR cells, prognostic-related cells; CNVs, copy number variations; UMAP, Uniform Manifold Approximation and Projection; PDAC, pancreatic ductal adenocarcinoma

Inference of epithelial and T cell proportions from PDAC methylation and bulk RNA-seq data validating their prognostic impact

Overall, we identified 10,406 TCGA PDAC DMPs (log2|FC|> 0.1, adjusted P < 0.05), revealing 4163 hypermethylated and 2413 hypomethylated genes (Fig. S8A). These DMPs impacted pathways related to tumor progression (Fig. S8B–H). Methylation-based analysis divided 178 PDAC samples into two prognostic subgroups (Fig. 5A–D, Fig. S8I). Notably, Cluster 1 patients exhibited worse overall survival (P = 0.002) and higher tumor purity (P = 3.2 × 10⁻⁹), characterized by increased proportions of epithelial and T cells (P < 0.05) (Fig. 5E–G).

Fig. 5
figure 5

Inferred Proportions of Epithelial Cells and T Cells in DNAm Profiles and their association with Patient Prognosis. A Unsupervised clustering (K = 2) of TCGA PDAC samples into two molecular subgroups according to DNAm profiles. B Cumulative Distribution Function (CDF) for 10,406 Differentially Methylated Positions (DMPs). C CDF Delta area curve. D Sample distribution in two subgroups displayed in T-SNE plot. E Kaplan–Meier survival plot for the two subgroups. F Comparison of tumor purity between the two subgroups. G Boxplot illustrating proportions of distinct cell types within the TME for the two subgroups. The y-axis represents the absolute cell-type proportion (0 to 1) estimated by scDeconv, and the x-axis represents different cell types. Significant changes between the subgroups were assessed using the Wilcoxon rank-sum test (*P value < 0.05, **P value < 0.01, ***P value < 0.001). H, I Overall survival comparison between high- and low-proportion groups for inferred epithelial cell (H) and T cells proportion (I). J Proportional distribution of each cell type in 8 PDAC patients we collected. K Construction of the prognostic RiskScoreEpi model using six epithelial cell-type-specific CpG sites and their corresponding coefficients through LASSO-Cox regression. L Determination of the λ value of the LASSO model via tenfold cross-validation with minimal misclassification. M, N Overall survival analysis for high- and low-risk groups with a cutoff value of the median RiskScoreEpi in the TCGA train set (M) and ICGC validation set (N), respectively. DNAm, DNA methylation; TCGA, The Cancer Genome Atlas Program; ICGC, International Cancer Genome Consortium; CDF, Cumulative Distribution Function; DMPs, differentially methylated probes; T-SNE, t-distributed stochastic neighbor embedding; TME, tumor microenvironment; PDAC, pancreatic ductal adenocarcinoma; LASSO, Least Absolute Shrinkage and Selection Operator

Prognostic analysis based solely on scDeconv DNAm-inferred cell proportions indicated that higher epithelial and T cell proportions were linked to poor prognosis (P = 0.0036 and P = 0.034; (Fig. 5H, I). Analysis of tumor samples from our 8 PDAC patients (Zhejiang, Table 1) confirmed these findings, showing that higher epithelial and T cell proportions correlated with shorter survival, whereas higher endothelial and stellate cell proportions were linked to better prognosis (Fig. 5J). Furthermore, consistent conclusions were drawn from cell proportions inferred using the BayesPrism algorithm based on bulk RNA-seq data (659 samples), further supporting the robustness of the results (P = 0.017 and P = 0.0014; Fig. S9A, B).

Identification of epithelial and T cell-specific DNAm sites for RiskScore model construction

To develop a clinically feasible assessment, we screened prognostic methylation sites representing T cell and epithelial cell proportions, 19 specific CpG sites to epithelial cells and 14 for T cells were all significantly associated with PDAC prognosis (P < 0.05). After applying the LASSO-Cox model, six CpG sites were selected for the epithelial cell-specific model (Fig. 5K, L) and four for the T cell-specific model (Fig. S10A, B). Using β values and their corresponding risk coefficients, we calculated RiskScoreEpi and RiskScoreT following the subsequent formulas:

$$\begin{aligned} {\text{RiskScore }}\left( {{\text{Epi}}} \right)\, = & \,\left( {cg07768763\, \times \,1.254} \right)\, + \,\left( {cg11314569\, \times \, 2.050} \right) \, + \,\left( {cg05038288\, \times \, - 1.397} \right)\, \\ & + \,\left( {cg00788354\, \times \, - 2.791} \right)\, + \,\left( {cg27650540\, \times \,0.585} \right) \, + \, \left( {cg02265288\, \times \, - 1.239} \right); \\ \end{aligned}$$
$$\begin{aligned} {\text{RiskScore }}\left( T \right)\, = & \,\left( {cg16239536\, \times \,1.571} \right) \, + \,\left( {cg07545925\, \times \,1.438} \right) \, \\ & + \, \left( {cg25653947\, \times \, - 1.477} \right)\, + \,\left( {cg14150023\, \times \,0.689} \right). \\ \end{aligned}$$

Patients were stratified by the median RiskScoreEpi (− 1.104), with high-risk patients showing significantly lower survival rates (P < 0.001; Fig. 5M, 6A). The model’s AUC values for predicting 1-, 3-, and 5-year OS were 0.695, 0.785, and 0.834, respectively (Fig. 6C). In the validation cohort (ICGC-PACA-AU), survival outcomes differed significantly between high- and low-risk groups (P = 0.022; Fig. 5N, 6B), and our 8 PDAC patients with higher risk scores exhibited shorter survival time (Fig. 6G). Similarly, RiskScoreT showed predictive power (P = 0.029; Fig. S10E, H), with AUC values of 0.628, 0.733, and 0.782 for 1-, 3-, and 5-year survival (Fig. S10C). In the validation cohort, high-risk groups exhibited worse outcome (P = 0.013; Fig. S10F, G, I).

Fig. 6
figure 6

Construction and Evaluation of Prognostic RiskScoreEpi Model and Nomogram Integrating Clinical Information. A, B For TCGA Train Set (A) and ICGC Validation Set (B): Top panel: Distribution plot of RiskScoreEpi in high- and low-risk groups. Middle panel: Distribution plot of survival time and survival status in high- and low-risk groups; X-axis represents patients with increasing RiskScoreEpi, and Y-axis represents survival time. Bottom panel: Beta values of the six CpG sites in high- and low-risk groups. C ROC curves and AUC analyses for 1-, 3-, and 5-year overall survival prediction of the RiskScoreEpi model in TCGA dataset. D Sankey diagram illustrating similarities and differences in sample composition before and after grouping based on two different criteria: TCGA DNAm subgroups (left) and RiskScoreEpi model (right). E, F Univariate E and Multivariate F Cox regression analysis of RiskScoreEpi and clinical information in TCGA dataset. G Heatmap of six CpG sites’ methylation levels, corresponding RiskScoreEpi, survival time, survival status, and risk groups of the 8 PDAC patients we collected. Color bar = β-value. H Nomogram for predicting 1-, 2-, and 3-year overall survival of PDAC patients based on RiskScoreEpi and prognostic factors. I Calibration curves of the nomogram at 1, 3, and 5 years. J DCA curves of the nomogram, RiskScoreEpi, and other prognostic factors. TCGA, The Cancer Genome Atlas Program; ICGC, International Cancer Genome Consortium; ROC, receiver operating characteristic; AUC, area under the curve; PDAC, pancreatic ductal adenocarcinoma; DCA, decision curve analysis

KM curves based on the expression levels of RiskScoreEpi-related genes (CUX2, GCSH) and RiskScoreT-related genes (CD3G, HMHA1) revealed significant prognostic differences, highlighting their relevance to survival outcomes (Fig. S9E–H). Over half of cluster 1 patients were in the high-risk group (Fig. 6D; Fig. S10D), and most Basal-like subtype samples were in the high-risk group, consistent with its association with patient poor prognosis, validating the robustness and predictive effectiveness of our Riskscore models (Fig. S9C, D).

Epithelial and T Cell RiskScores as independent predictors of survival

In univariate Cox analysis, RiskScoreEpi (P < 0.001), RiskScoreT, (P < 0.001), age (P = 0.008), grade (P = 0.0038), T stage (P = 0.034), M stage (P = 0.018), and N stage (P = 0.0032) were significant prognostic factors (Fig. 6E; Fig. S11A). Multivariate Cox analysis identified RiskScoreEpi (HR 5.05, P < 0.001), tumor grade (HR 1.41, P = 0.0491), and N stage (HR 2.31, P = 0.0044) as independent prognostic factors (Fig. 6F). Similarly, RiskScoreT (HR 4.59, P = 0.0042), M stage (HR 0.49, P = 0.0101), and N stage (HR 2.2, P = 0.0075) were also independent predictors (Fig. S11B).

For clinical decision-making, we constructed two prognostic nomograms based on these results. One incorporating RiskScoreEpi, tumor grade, and N stage, and the other included RiskScoreT with M and N stages (Fig. 6H; Fig. S11E). Higher total points in these nomograms were associated with worse survival. Calibration curves showed close alignment between predicted and actual survival at 1, 2, and 3 years, confirmed models’ accuracy (Fig. 6I; Fig. S11C). DCA further demonstrated superior clinical effectiveness compared to other factors (Fig. 6J; Fig. S11D).

RiskScore-based stratification reveals distinct clinical features, genomic landscape, and pathway activation

In the TCGA cohort, the high-risk group identified by RiskScoreEpi exhibited more aggressive tumor characteristics, including a higher proportion of G2/3 histological grade, T2 stage, history of alcohol and smoking, and pancreas-related diseases (Fig. S12A–M). Similar trends were observed in the RiskScoreT model, where the high-risk group comprised older patients (≥ 65 years), G2/3 grade, clinical stage II, T3 stage, and R1/2 residual tumors (Fig. S13A–M).

The high-risk group based on RiskScoreT also had significantly higher mutation frequency and TMB compared to the low-risk group (Fig. S14B; mutation frequency: 95% vs 70%, P < 0.001; Fig. 7B; PTMB < 0.01). KRAS and TP53 mutations were more frequent in the high-risk group (Fig. 7A; KRAS: 77% vs 46%, P < 0.001; TP53: 70% vs 50%, P < 0.05). A similar pattern was seen in RiskScoreEpi (Fig. S14A, C-D).

Fig. 7
figure 7

Mutation landscape, pathways, and TME between RiskScoreT model risk groups in TCGA dataset. A, B Comparative analysis of top 5 genes mutation spectrum (A) and TMB distributions (B) in the RiskScoreT high- and low-risk groups, respectively. C Heatmap showing the significant enrichment hallmark pathways of the RiskScoreT high- and low-risk groups by GSVA (P < 0.01). Color bar = enrichment score. D Boxplot illustrating proportions of distinct cell types within the TME for the RiskScoreT high- and low-risk groups. E Correlation analysis between the RiskScore and 14 tumor-related pathways. F ESTMATE analysis revealed immune and stromal infiltration differences in the RiskScoreT high- and low-risk groups. G, I Distinctions in 28 and 64 immune cell infiltration levels in the RiskScoreT high- and low-risk groups, analyzed using the ssGSEA (G) and xCell (I) algorithm, respectively. H Distribution of immune subtypes across the RiskScoreT high- and low-risk groups. (*P < 0.05, ** P < 0.01, *** P < 0.0001). TCGA, The Cancer Genome Atlas Program; TME, tumor microenvironment; TMB, tumor mutation burden; GSVA, gene set variation analysis; ssGSEA, single-sample gene set enrichment analysis

The high-risk group showed enrichment in tumor-associated pathways. In RiskScoreEpi, this included classical tumor signaling, while the low-risk group was enriched in bile acid biosynthesis (Fig. S15A; P < 0.01). For RiskScoreT, the high-risk group showed changes in tight junctions, NOTCH, and P53 signaling, while the low-risk group was enriched in immunodeficiency and type I diabetes pathways (Fig. S15B; P < 0.01). Using Hallmark gene sets, the RiskScoreEpi high-risk group showed significant enrichment in pathways like KRAS signaling, P53 pathway, EMT, E2F targets, and hypoxia (Fig. S15C; P < 0.01). RiskScoreT high-risk group was enriched in glycolysis, hypoxia, and NOTCH signaling (Fig. 7C; P < 0.01).

Correlation analysis showed positive associations between both RiskScoreEpi and RiskScoreT with pro-tumor signals and negative correlations with normal pancreatic function (correlation > 0.3; Fig. S15E-F). Pathway analysis highlighted activation of hypoxia tolerance, cell migration, and proliferation pathways (such as EGFR, Hypoxia, MAPK, WNT) in high-risk groups (Fig. S15G-H; P < 0.05). Meanwhile, RiskScoreEpi and RiskScoreT have positive correlations with these pathways (correlation > 0.3; Fig. 7E).

RiskScore stratification reveals differences in TME and immune phenotypes between high and low-risk groups

The high-risk group defined by RiskScoreEpi had significantly higher proportions of epithelial cells and T cells, while the low-risk group showed elevated endothelial cell proportions (Fig. 7D; P < 0.05). Similar trends were observed for RiskScoreT (Fig. S15D; P < 0.05). Although the RiskScoreEpi high-risk group showed lower StromalScore, ImmuneScore, and ESTIMATEScore, the differences were not significant (Fig. S16A). In contrast, the RiskScoreT high-risk group had significantly higher ImmuneScore and ESTIMATEScore (Fig. 7F; P < 0.01).

RiskScoreEpi was positively correlated with most tumor-infiltrating immune cells (TIICs) (correlation > 0.3; Fig. S16E), with the high-risk group showing enriched levels of activated CD4 T cells, gamma delta T cells, memory B cells, natural killer T cells, and neutrophils (Fig. S16C-D; P < 0.01). The low-risk group, however, exhibited diminished immune activity, indicative of immune suppression (Fig. S16C-D; P < 0.01). RiskScoreT was negatively correlated with active B cells, activated CD8 T cells, and effector memory CD4 T cells (correlation > 0.3; Fig. S16E), and its high-risk group had lower TIIC abundance (Fig. 7G, I; P < 0.05). This suggests that the low-risk group patients may benefit from antitumor immune responses, enhance immunotherapy efficacy.

Regarding immune subtypes, both RiskScoreEpi and RiskScoreT high-risk groups had higher frequencies of C1 (wound healing) and C2 (IFN-γ dominant) subtypes but a lower prevalence of C3 (inflammatory) subtype (Fig. 7H; Fig. S16B). In the tumor–immunity cycle, RiskScoreT was negatively correlated with steps like priming and activation (step 3) and recognition of cancer cells by T cells (step 6). In contrast, RiskScoreEpi was positively correlated with the release of cancer antigens (step 1) and immune cell recruitment (step 4) (Fig. S16F). These findings suggest that RiskScore models offer insights into immune status and activity of the patient’s immune response.

Therapeutic efficacy prediction for high and low-risk groups based on RiskScore stratification

Differential expression analysis of immune checkpoints revealed higher expression levels in the RiskScoreEpi high-risk group (Fig. S17A; P < 0.05), with a positive correlation between RiskScoreEpi and CD274 expression (Pearson coefficient: r = 0.21, P = 0.0063; Fig. 8C), suggesting potential benefits from IC blockade therapy. However, the RiskScoreEpi high-risk group had higher TIDE and Exclusion scores (Fig. 8D-E; P < 0.01), indicating increased immune evasion and reduced sensitivity to ICB treatment. RiskScoreEpi correlated positively with the TIDE score (r = 0.29, P = 0.00012; Fig. 8F), with fewer patients in the high-risk group predicted to respond to immunotherapy (38% vs 65%, P = 0.00001; Fig. 8G). Patients in low-risk group showed higher IC50 values for Bortezomib (P = 8e−04), Dasatinib (P = 0.00039), Imatinib (P = 0.01), Paclitaxel (P = 8.8e−05), Sorafenib (P = 0.032), Sunitinib (P = 0.036), and Vinorelbine (P = 0.0015) (Fig. 8H-N).

Fig. 8
figure 8

Analysis of immune checkpoint, immunotherapy response, and drug sensitivity between different risk groups in the TCGA dataset. A Comparisons of immune checkpoint gene expression in the RiskScoreT high- and low-risk groups. B Correlation analysis between the RiskScoreT and PDCD1 expression. C Correlation analysis between the RiskScoreEpi and CD274 expression. D, E Comparisons of TIDE score (D) and T cell Exclusion (E) in the RiskScoreEpi high- and low-risk groups. F Correlation analysis between the RiskScoreEpi and TIDE score. G Bar plot showing the difference in ICB response between the RiskScoreEpi high- and low-risk groups. HN Box plots displaying the estimated IC50 of Bortezomib (H), Dasatinib (I), Imatinib (J), Paclitaxel (K), Sorafenib (L), Sunitinib (M), and Vinorelbine (N) in the RiskScoreEpi high- and low-risk groups. (*P < 0.05, ** P < 0.01, *** P < 0.0001). TCGA, The Cancer Genome Atlas Program; TIDE, Tumor Immune Dysfunction and Exclusion; ICB, immune checkpoint blockade; IC50, half-maximal inhibitory concentration

In contrast, most ICs were upregulated in the RiskScoreT low-risk group (Fig. 8A), with PDCD1 expression showing a negative correlation with RiskScoreT (Pearson coefficient: r = − 0.23, P = 0.0025; Fig. 8B), indicating potential benefit from IC blockade therapy. Dysfunction scores were higher in the RiskScoreT low-risk group (Fig. S17C; P < 0.05), but TIDE and Exclusion scores showed no significant differences between high- and low-risk groups (Fig. S17B, D). The RiskScoreT high-risk group demonstrated increased sensitivity to Dasatinib (P = 0.025), Paclitaxel (P = 0.0011), and Erlotinib (P = 0.003) (Fig. S17F-H).

Discussion

In this study, we have delineated the PDAC cell atlas, identified prognostic cell populations, and scrutinized intracellular heterogeneity. Epithelial cells, integral to tumor development, serve as the origin of malignant neoplastic cells [13, 14]. Type I ductal cells are relatively normal and present in both non-cancerous and neoplastic tissues, while type II ductal cells, expanded in PDAC, demonstrate malignancy [18]. Despite observing two populations of epithelial cells with varying prognostic correlations, subsequent analyses affirmed their malignant nature, albeit with differing degrees of malignancy. These findings collectively imply that, in aggregate, epithelial cells are associated with a poor prognosis.

Within the PDAC TME, various subtypes of T cells with diverse functions exist. The presence of Tregs contributes to a poor prognosis by suppressing the functions of effector T cells, impeding the response to immunotherapy in patients [81, 82]. T cell exhaustion plays a pivotal role in immune dysfunction and evasion among PDAC patients. Prolonged exposure of T cells to persistent antigens and inflammation in cancer patients leads to continuous stimulation, ultimately resulting in T cell exhaustion [86]. Tregs and depletion T cells gradually increase during tumor progression [16]. Our study exclusively identifies T cell populations associated with a poor prognosis. Subsequent analyses revealed a significant upregulation of depletion markers in T cells, indicating a progressive depletion of these cells during tumor development. Furthermore, the impaired antitumor function of T cells may be exacerbated by the presence of immunosuppressive cells within the TME, including Treg cells, granulocytes, and macrophages [87, 88].

Cellular deconvolution emerges as a potent tool for analyzing DNAm in bulk tissues. TCGA-PAAD samples, stratified into two clusters based on their DNA methylation profiles, exhibit significantly different prognoses. Consequently, we opted to deconvolute these two clusters and compared the proportions of cell types between patients in these clusters, and we observe that cell populations associated with poorer prognosis are more abundant in tumors from patients with shorter survival times. This trend is further confirmed in the eight samples we collected and monitored for survival time, depicting a gradual decline in patient survival time concomitant with an increase in the proportions of T cells and epithelial cells in tumor tissues.

Prior studies identified several methylation markers for PDAC prognosis in bulk tissue [89,90,91,92]. However, tracing the impact of these markers on the functional aspects of tumor cell populations poses a significant challenge. In this study, we obtained cell-type-specific sites for T cells and epithelial cells. These markers not only denote a significant change in methylation levels but also signify an increase in specific cell types. As anticipated, prognostic models were developed based on these markers, further confirming their role as independent prognostic factors. We also filtered several clinical factors and constructed nomograms in conjunction with our prognostic models to predict survival time for each patient. Given the varied immune levels and therapeutic responses among different risk groups, these predictive models could serve as valuable tools for predicting PDAC patients who would benefit more from drug and immunotherapy. This enables the stratification of individuals and customization of treatment strategies, enhancing treatment efficacy and improving patient outcomes.

Here we attempted to transpose the gene expression signatures of prognosis-associated cell subpopulations into DNA methylation profiles. Our study represents an effort in the clinical application of scRNA-seq as we translated expensive single-cell technologies into more cost-effective DNA methylation assays. This facilitates the clinical translation of single-cell technologies and extends their utility to large clinical cohort populations.

Regrettably, the deconvolution method fell short of achieving the desired precision in prognostic cell subpopulations, impeding in-depth validation to dissect the relationship between subpopulations within a cell type and prognosis. Nonetheless, there are additional limitations in our study. Firstly, this study predominantly relied on retrospective data, necessitating further prospective studies. Additionally, the available sample size was relatively modest, warranting larger-scale studies to validate and refine our conclusions. Secondly, given the inherent nature of bioinformatics analysis, future experimental endeavors are needed to explore the molecular mechanisms and biological functions of these prognostic methylation sites in PDAC progression.

Conclusion

In this study, the integration of scRNA-seq data and DNAm data facilitated the development of predictive models based on epithelial cell and T cell-specific DNA methylation sites, enabling effective prognosis prediction for PDAC patients. Patient stratification using our models revealed distinctive prognoses and immune microenvironments, suggesting potential benefits for immunotherapy and drug therapy tailored to each patient. Notably, the construction of nomograms provides valuable tools for clinical treatment decisions. The implementation of our prognostic models offers clinicians a cost-effective and accurate method for predicting outcomes in PDAC.

Availability of data and materials

All authors read and approved the final manuscript. The datasets supporting the conclusions of this article are available in the Genome Sequence Archive [93]. The raw DNAm data are hyperlinked to the dataset at https://ngdc.cncb.ac.cn/(PRJCA021998). Publicly available datasets analyzed in this study can be found in The Cancer Genome Atlas (https://portal.gdc.cancer.gov/), the International Cancer Genome Consortium (ICGC) database (https://dcc.icgc.org/), the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) (GSE155698, GSE201230, GSE212966, GSE214295, GSE156405, GSE85916, GSE28735, GSE57495, GSE62452, GSE71729), and the ArrayExpress database (https://www.ebi.ac.uk/biostudies/arrayexpress) (E-MTAB-6134). The original contributions presented in the study are included in the article/Supplementary Material; the analysis code is available from the corresponding author upon reasonable request.

Abbreviations

PDAC:

Pancreatic adenocarcinoma

scRNA-seq:

Single-cell transcriptomics

DNAm:

DNA methylation

OS:

Overall survival

TME:

Tumor microenvironment

TCGA:

The Cancer Genome Atlas

ICGC:

The International Cancer Genome Consortium

CNV:

Copy number variation

UMAP:

Uniform Manifold Approximation and Projection

DEGs:

Differentially expressed genes

PR:

Prognostic-related

EMT:

Epithelial–mesenchymal transition

SNPs:

Single-nucleotide polymorphisms

BMIQ:

Beta mixture quantile expansion

SVD:

Singular value decomposition

FDR:

False discovery rate

DMP:

Differentially methylated probe

CDF:

Cumulative distribution function

KM:

Kaplan–Meier

LASSO:

Least Absolute Shrinkage and Selection Operator

ROC:

Receiver operating characteristic

DCA:

Decision curve analysis

TPM:

Transcripts per million

TMB:

Tumor mutation load

GO:

Gene Ontology

KEGG:

Kyoto Encyclopedia of Genes and Genomes

GSVA:

Gene Set Variation Analysis

ssGSEA:

Single-sample gene set enrichment analysis

TIP:

Tracking Tumor Immunophenotype database

ICs:

Immune checkpoints

GDSC:

Genomics of Drug Sensitivity in Cancer

IC50:

Half-maximal inhibitory concentration

TIICs:

Tumor-infiltrating immune cells

References

  1. Kamisawa T, Wood LD, Itoi T, Takaori K. Pancreatic cancer. Lancet. 2016;388(10039):73–85.

    Article  CAS  PubMed  Google Scholar 

  2. Park W, Chawla A, O’Reilly EM. Pancreatic cancer: a review. JAMA. 2021;326(9):851–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Mizrahi JD, Surana R, Valle JW, Shroff RT. Pancreatic cancer. Lancet. 2020;395(10242):2008–20.

    Article  CAS  PubMed  Google Scholar 

  4. Ryan DP, Hong TS, Bardeesy N. Pancreatic adenocarcinoma. N Engl J Med. 2014;371(22):2140–1.

    PubMed  Google Scholar 

  5. Garrido-Laguna I, Hidalgo M. Pancreatic cancer: from state-of-the-art treatments to promising novel therapies. Nat Rev Clin Oncol. 2015;12(6):319–34.

    Article  CAS  PubMed  Google Scholar 

  6. Cai J, Chen H, Lu M, Zhang Y, Lu B, You L, Zhang T, Dai M, Zhao Y. Advances in the epidemiology of pancreatic cancer: trends, risk factors, screening, and prognosis. Cancer Lett. 2021;520:1–11.

    Article  CAS  PubMed  Google Scholar 

  7. Rocha FG. Landmark series: immunotherapy and targeted therapy for pancreatic cancer. Ann Surg Oncol. 2021;28(3):1400–6.

    Article  PubMed  Google Scholar 

  8. Fan JQ, Wang MF, Chen HL, Shang D, Das JK, Song J. Current advances and outlooks in immunotherapy for pancreatic ductal adenocarcinoma. Mol Cancer. 2020;19(1):32.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Ren B, Cui M, Yang G, Wang H, Feng M, You L, Zhao Y. Tumor microenvironment participates in metastasis of pancreatic cancer. Mol Cancer. 2018;17(1):108.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Wartenberg M, Zlobec I, Perren A, Koelzer VH, Gloor B, Lugli A, Karamitopoulou E. Accumulation of FOXP3+T-cells in the tumor microenvironment is associated with an epithelial-mesenchymal-transition-type tumor budding phenotype and is an independent prognostic factor in surgically resected pancreatic ductal adenocarcinoma. Oncotarget. 2015;6(6):4190–201.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Wang X, Lang M, Zhao T, Feng X, Zheng C, Huang C, Hao J, Dong J, Luo L, Li X, et al. Cancer-FOXP3 directly activated CCL5 to recruit FOXP3(+)Treg cells in pancreatic ductal adenocarcinoma. Oncogene. 2017;36(21):3048–58.

    Article  CAS  PubMed  Google Scholar 

  12. Özdemir BC, Pentcheva-Hoang T, Carstens JL, Zheng X, Wu CC, Simpson TR, Laklai H, Sugimoto H, Kahlert C, Novitskiy SV, et al. Depletion of carcinoma-associated fibroblasts and fibrosis induces immunosuppression and accelerates pancreas cancer with reduced survival. Cancer Cell. 2014;25(6):719–34.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Pylayeva-Gupta Y, Lee KE, Hajdu CH, Miller G, Bar-Sagi D. Oncogenic Kras-induced GM-CSF production promotes the development of pancreatic neoplasia. Cancer Cell. 2012;21(6):836–47.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. von Figura G, Fukuda A, Roy N, Liku ME, Morris Iv JP, Kim GE, Russ HA, Firpo MA, Mulvihill SJ, Dawson DW, et al. The chromatin regulator Brg1 suppresses formation of intraductal papillary mucinous neoplasm and pancreatic ductal adenocarcinoma. Nat Cell Biol. 2014;16(3):255–67.

    Article  Google Scholar 

  15. Papalexi E, Satija R. Single-cell RNA sequencing to explore immune cell heterogeneity. Nat Rev Immunol. 2018;18(1):35–45.

    Article  CAS  PubMed  Google Scholar 

  16. Chen K, Wang Q, Li M, Guo H, Liu W, Wang F, Tian X, Yang Y. Single-cell RNA-seq reveals dynamic change in tumor microenvironment during pancreatic ductal adenocarcinoma malignant progression. EBioMedicine. 2021;66:103315.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Lei Y, Tang R, Xu J, Wang W, Zhang B, Liu J, Yu X, Shi S. Applications of single-cell sequencing in cancer research: progress and perspectives. J Hematol Oncol. 2021;14(1):91.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Peng J, Sun BF, Chen CY, Zhou JY, Chen YS, Chen H, Liu L, Huang D, Jiang J, Cui GS, et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 2019;29(9):725–38.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Elyada E, Bolisetty M, Laise P, Flynn WF, Courtois ET, Burkhart RA, Teinor JA, Belleau P, Biffi G, Lucito MS, et al. Cross-species single-cell analysis of pancreatic ductal adenocarcinoma reveals antigen-presenting cancer-associated fibroblasts. Cancer Discov. 2019;9(8):1102–23.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Hodges E, Molaro A, Dos Santos CO, Thekkat P, Song Q, Uren PJ, Park J, Butler J, Rafii S, McCombie WR, et al. Directional DNA methylation changes and complex intermediate states accompany lineage specificity in the adult hematopoietic compartment. Mol Cell. 2011;44(1):17–28.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Straussman R, Nejman D, Roberts D, Steinfeld I, Blum B, Benvenisty N, Simon I, Yakhini Z, Cedar H. Developmental programming of CpG island methylation profiles in the human genome. Nat Struct Mol Biol. 2009;16(5):564–71.

    Article  CAS  PubMed  Google Scholar 

  22. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473(7345):43–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Moore LD, Le T, Fan G. DNA methylation and its basic function. Neuropsychopharmacology. 2013;38(1):23–38.

    Article  CAS  PubMed  Google Scholar 

  24. Dor Y, Cedar H. Principles of DNA methylation and their implications for biology and medicine. Lancet. 2018;392(10149):777–86.

    Article  CAS  PubMed  Google Scholar 

  25. Zhao Y, Yang M, Wang S, Abbas SJ, Zhang J, Li Y, Shao R, Liu Y. An overview of epigenetic methylation in pancreatic cancer progression. Front Oncol. 2022;12:854773.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Natale F, Vivo M, Falco G, Angrisano T. Deciphering DNA methylation signatures of pancreatic cancer and pancreatitis. Clin Epigenetics. 2019;11(1):132.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Constâncio V, Nunes SP, Henrique R, Jerónimo C. DNA methylation-based testing in liquid biopsies as detection and prognostic biomarkers for the four major cancer types. Cells. 2020;9(3):624.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Gutierrez A, Demond H, Brebi P, Ili CG. Novel Methylation Biomarkers for Colorectal Cancer Prognosis. Biomolecules. 2021;11(11):1722.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Xiao M, Liang X, Yan Z, Chen J, Zhu Y, Xie Y, Li Y, Li X, Gao Q, Feng F, et al. A DNA-methylation-driven genes based prognostic signature reveals immune microenvironment in pancreatic cancer. Front Immunol. 2022;13:803962.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Yin X, Kong L, Liu P. Identification of prognosis-related molecular subgroups based on DNA methylation in pancreatic cancer. Clin Epigenetics. 2021;13(1):109.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Zhu T, Liu J, Beck S, Pan S, Capper D, Lechner M, Thirlwell C, Breeze CE, Teschendorff AE. A pan-tissue DNA methylation atlas enables in silico decomposition of human tissue methylomes at cell-type resolution. Nat Methods. 2022;19(3):296–306.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, Khodadoust MS, Esfahani MS, Luca BA, Steiner D, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol. 2019;37(7):773–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Liu Y. scDeconv: an R package to deconvolve bulk DNA methylation data with scRNA-seq data and paired bulk RNA-DNA methylation data. Brief Bioinform. 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/bib/bbac150.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 2013;41(Database issue):D991-995.

    CAS  PubMed  Google Scholar 

  35. Goldman MJ, Craft B, Hastie M, Repečka K, McDade F, Kamath A, Banerjee A, Luo Y, Rogers D, Brooks AN, et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol. 2020;38(6):675–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013;45(10):1113–20.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Bailey P, Chang DK, Nones K, Johns AL, Patch AM, Gingras MC, Miller DK, Christ AN, Bruxner TJ, Quinn MC, et al. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature. 2016;531(7592):47–52.

    Article  CAS  PubMed  Google Scholar 

  38. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33(5):495–502.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Becht E, McInnes L, Healy J, Dutertre C-A, Kwok IW, Ng LG, Ginhoux F. Newell EWJNb: dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotech. 2019;37(1):38–44.

    Article  CAS  Google Scholar 

  41. Fang Y, Pei S, Huang K, Xu F, Xiang G, Lan L, Zheng X. Single-cell transcriptome reveals the metabolic and clinical features of a highly malignant cell subpopulation in pancreatic ductal adenocarcinoma. Front Cell Dev Biol. 2022;10:798165.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Xu Q, Chen S, Hu Y, Huang W. Single-cell RNA transcriptome reveals the intra-tumoral heterogeneity and regulators underlying tumor progression in metastatic pancreatic ductal adenocarcinoma. Cell Death Discov. 2021;7(1):331.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Lee JJ, Bernard V, Semaan A, Monberg ME, Huang J, Stephens BM, Lin D, Rajapakshe KI, Weston BR, Bhutani MS, et al. Elucidation of tumor-stromal heterogeneity and the ligand-receptor interactome by single-cell transcriptomics in real-world pancreatic cancer biopsies. Clin Cancer Res Off J Am Assoc Cancer Res. 2021;27(21):5912–21.

    Article  CAS  Google Scholar 

  44. Werba G, Weissinger D, Kawaler EA, Zhao E, Kalfakakou D, Dhara S, Wang L, Lim HB, Oh G, Jing X, et al. Single-cell RNA sequencing reveals the effects of chemotherapy on human pancreatic adenocarcinoma and its tumor microenvironment. Nat Commun. 2023;14(1):797.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Steele NG, Carpenter ES, Kemp SB, Sirihorachai VR, The S, Delrosario L, Lazarus J, Amir ED, Gunchick V, Espinoza C, et al. Multimodal mapping of the tumor and peripheral blood immune landscape in human pancreatic cancer. Nature Cancer. 2020;1(11):1097–112.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Dai M, Pei X, Wang XJ. Accurate and fast cell marker gene identification with COSG. Brief Bioinform. 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/bib/bbab579.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Sun D, Guan X, Moran AE, Wu LY, Qian DZ, Schedin P, Dai MS, Danilov AV, Alumkal JJ, Adey AC, et al. Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data. Nat Biotechnol. 2022;40(4):527–38.

    Article  CAS  PubMed  Google Scholar 

  48. Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, Cahill DP, Nahed BV, Curry WT, Martuza RL, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344(6190):1396–401.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Chen K, Liu X, Liu W, Wang F, Tian X, Yang Y. Development and validation of prognostic and diagnostic model for pancreatic ductal adenocarcinoma based on scRNA-seq and bulk-seq datasets. Hum Mol Genet. 2022;31(10):1705–19.

    Article  CAS  PubMed  Google Scholar 

  50. Aran D, Looney AP, Liu L, Wu E, Fong V, Hsu A, Chak S, Naikawadi RP, Wolters PJ, Abate AR, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol. 2019;20(2):163–72.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Yang X, Raum JC, Kim J, Yu R, Yang J, Rice G, Li C, Won KJ, Stanescu DE, Stoffers DA. A PDX1 cistrome and single-cell transcriptome resource of the developing pancreas. Development. 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.1242/dev.200432.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Olaniru OE, Kadolsky U, Kannambath S, Vaikkinen H, Fung K, Dhami P, Persaud SJ. Single-cell transcriptomic and spatial landscapes of the developing human pancreas. Cell Metab. 2023;35(1):184-199.e185.

    Article  CAS  PubMed  Google Scholar 

  53. Qiu X, Mao Q, Tang Y, Wang L, Chawla R, Pliner HA, Trapnell C. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods. 2017;14(10):979–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17:13.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Chu T, Wang Z, Pe’er D, Danko CG. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nature cancer. 2022;3(4):505–17.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Morris TJ, Butcher LM, Feber A, Teschendorff AE, Chakravarthy AR, Wojdacz TK, Beck S. ChAMP: 450k chip analysis methylation pipeline. Bioinformatics. 2013;30(3):428–30.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26(12):1572–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Loyfer N, Magenheim J, Peretz A, Cann G, Bredno J, Klochendler A, Fox-Fisher I, Shabi-Porat S, Hecht M, Pelet T, et al. A DNA methylation atlas of normal human cell types. Nature. 2023;613(7943):355–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Moffitt RA, Marayati R, Flate EL, Volmar KE, Loeza SG, Hoadley KA, Rashid NU, Williams LA, Eaton SC, Chung AH, et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma. Nat Genet. 2015;47(10):1168–78.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Rashid NU, Peng XL, Jin C, Moffitt RA, Volmar KE, Belt BA, Panni RZ, Nywening TM, Herrera SG, Moore KJ, et al. Purity independent subtyping of tumors (PurIST), a clinically robust, single-sample classifier for tumor subtyping in pancreatic cancer. Clin Cancer Res Off J Am Assoc Cancer Res. 2020;26(1):82–92.

    Article  Google Scholar 

  61. Mayakonda A, Lin DC, Assenov Y, Plass C, Koeffler HP. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018;28(11):1747–56.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16(5):284–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 2013;14:7.

    Article  Google Scholar 

  64. Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, Garnett MJ, Blüthgen N, Saez-Rodriguez J. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nat Commun. 2018;9(1):20.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Aran D, Hu Z, Butte AJ. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017;18(1):220.

    Article  PubMed  PubMed Central  Google Scholar 

  67. Li T, Fan J, Wang B, Traugh N, Chen Q, Liu JS, Li B, Liu XS. TIMER: a web server for comprehensive analysis of tumor-infiltrating immune cells. Can Res. 2017;77(21):e108–10.

    Article  CAS  Google Scholar 

  68. Zeng D, Ye Z, Shen R, Yu G, Wu J, Xiong Y, Zhou R, Qiu W, Huang N, Sun L, et al. IOBR: multi-omics immuno-oncology biological research to decode tumor microenvironment and signatures. Front Immunol. 2021;12:687975.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Thorsson V, Gibbs DL, Brown SD, Wolf D, Bortone DS, Ou Yang TH, Porta-Pardo E, Gao GF, Plaisier CL, Eddy JA, et al. The immune landscape of cancer. Immunity. 2018;48(4):812-830.e814.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Xu L, Deng C, Pang B, Zhang X, Liu W, Liao G, Yuan H, Cheng P, Li F, Long Z, et al. TIP: a web server for resolving tumor immunophenotype profiling. Cancer Res. 2018;78(23):6575–80.

    Article  CAS  PubMed  Google Scholar 

  71. Jiang P, Gu S, Pan D, Fu J, Sahu A, Hu X, Li Z, Traugh N, Bu X, Li B, et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat Med. 2018;24(10):1550–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, et al. Genomics of drug sensitivity in cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013;41(Database issue):D955-961.

    CAS  PubMed  Google Scholar 

  73. Geeleher P, Cox N, Huang RS. pRRophetic: an R package for prediction of clinical chemotherapeutic response from tumor gene expression levels. PLoS One. 2014;9(9):e107468.

    Article  PubMed  PubMed Central  Google Scholar 

  74. Biffi G, Oni TE, Spielman B, Hao Y, Elyada E, Park Y, Preall J, Tuveson DA. IL1-Induced JAK/STAT signaling is antagonized by TGFβ to shape CAF heterogeneity in pancreatic ductal adenocarcinoma. Cancer Discov. 2019;9(2):282–301.

    Article  PubMed  Google Scholar 

  75. Cui R, Yue W, Lattime EC, Stein MN, Xu Q, Tan XL. Targeting tumor-associated macrophages to combat pancreatic cancer. Oncotarget. 2016;7(31):50735–54.

    Article  PubMed  PubMed Central  Google Scholar 

  76. Geeraerts X, Bolli E, Fendt SM, Van Ginderachter JA. Macrophage metabolism as therapeutic target for cancer, atherosclerosis, and obesity. Front Immunol. 2017;8:289.

    Article  PubMed  PubMed Central  Google Scholar 

  77. Zhang J, Zhang Q, Lou Y, Fu Q, Chen Q, Wei T, Yang J, Tang J, Wang J, Chen Y, et al. Hypoxia-inducible factor-1α/interleukin-1β signaling enhances hepatoma epithelial-mesenchymal transition through macrophages in a hypoxic-inflammatory microenvironment. Hepatology. 2018;67(5):1872–89.

    Article  CAS  PubMed  Google Scholar 

  78. Helm O, Held-Feindt J, Grage-Griebenow E, Reiling N, Ungefroren H, Vogel I, Krüger U, Becker T, Ebsen M, Röcken C, et al. Tumor-associated macrophages exhibit pro- and anti-inflammatory properties by which they impact on pancreatic tumorigenesis. Int J Cancer. 2014;135(4):843–61.

    Article  CAS  PubMed  Google Scholar 

  79. Sica A, Mantovani A. Macrophage plasticity and polarization: in vivo veritas. J Clin Investig. 2012;122(3):787–95.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Ruffell B, Affara NI, Coussens LM. Differential macrophage programming in the tumor microenvironment. Trends Immunol. 2012;33(3):119–26.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Chang JH, Jiang Y, Pillarisetty VG. Role of immune cells in pancreatic cancer from bench to clinical application: an updated review. Medicine. 2016;95(49):e5541.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Chen K, Wang Y, Hou Y, Wang Q, Long D, Liu X, Tian X, Yang Y. Single cell RNA-seq reveals the CCL5/SDC1 receptor-ligand interaction between T cells and tumor cells in pancreatic cancer. Cancer Lett. 2022;545:215834.

    Article  CAS  PubMed  Google Scholar 

  83. Shukla SK, Purohit V, Mehla K, Gunda V, Chaika NV, Vernucci E, King RJ, Abrego J, Goode GD, Dasgupta A, et al. MUC1 and HIF-1alpha signaling crosstalk induces anabolic glucose metabolism to impart gemcitabine resistance to pancreatic cancer. Cancer Cell. 2017;32(1):71-87.e77.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Gebauer F, Wicklein D, Horst J, Sundermann P, Maar H, Streichert T, Tachezy M, Izbicki JR, Bockhorn M, Schumacher U. Carcinoembryonic antigen-related cell adhesion molecules (CEACAM) 1, 5 and 6 as biomarkers in pancreatic cancer. PLoS ONE. 2014;9(11):e113023.

    Article  PubMed  PubMed Central  Google Scholar 

  85. Yao H, Yang Z, Liu Z, Miao X, Yang L, Li D, Zou Q, Yuan Y. Glypican-3 and KRT19 are markers associating with metastasis and poor prognosis of pancreatic ductal adenocarcinoma. Cancer Biomark Sect A Dis Markers. 2016;17(4):397–404.

    CAS  Google Scholar 

  86. Wherry EJ, Kurachi M. Molecular and cellular insights into T cell exhaustion. Nat Rev Immunol. 2015;15(8):486–99.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Carstens JL, Correa de Sampaio P, Yang D, Barua S, Wang H, Rao A, Allison JP, LeBleu VS, Kalluri R. Spatial computation of intratumoral T cells correlates with survival of patients with pancreatic cancer. Nat Commun. 2017;8:15095.

    Article  PubMed  PubMed Central  Google Scholar 

  88. Wang WQ, Liu L, Xu HX, Wu CT, Xiang JF, Xu J, Liu C, Long J, Ni QX, Yu XJ. Infiltrating immune cells and gene mutations in pancreatic ductal adenocarcinoma. Br J Surg. 2016;103(9):1189–99.

    Article  CAS  PubMed  Google Scholar 

  89. Zhang Z, Zhu R, Sun W, Wang J, Liu J. Analysis of methylation-driven genes in pancreatic ductal adenocarcinoma for predicting prognosis. J Cancer. 2021;12(21):6507–18.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Chen G, Long J, Zhu R, Yang G, Qiu J, Zhao F, Liu Y, Tao J, Zhang T, Zhao Y. Identification and validation of constructing the prognostic model with four DNA methylation-driven genes in pancreatic cancer. Front Cell Dev Biol. 2021;9:709669.

    Article  PubMed  Google Scholar 

  91. Cao T, Wu H, Ji T. Bioinformatics-based construction of prognosis-related methylation prediction model for pancreatic cancer patients and its application value. Front Pharmacol. 2023;14:1086309.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Deng GC, Sun DC, Zhou Q, Lv Y, Yan H, Han QL, Dai GH. Identification of DNA methylation-driven genes and construction of a nomogram to predict overall survival in pancreatic cancer. BMC Genomics. 2021;22(1):791.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Chen T, Chen X, Zhang S, Zhu J, Tang B, Wang A, Dong L, Zhang Z, Yu C, Sun Y, et al. The genome sequence archive family: toward explosive data growth and diverse data types. Genomics Proteomics Bioinform. 2021;19(4):578–83.

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank Dr. Jianming Zeng of the University of Macau and all the members of his bioinformatics team and bioinformatics for generously sharing their experience and codes. We appreciate the TCGA and GEO cohorts and contributors providing their platforms and their datasets.

Funding

This work was supported by grants from the Higher Education Discipline Innovation Project (111 Project No. B13003) to D.Z, the Natural Science Foundation of Zhejiang Province (Grant No. LTGD24H160010), the Chinese Medical Science and Technology Project of Zhejiang Province (Grant No.2023ZL265, 2024ZL278, 2024ZL283), the Medical Science and Technology Project of Zhejiang Province (Grant No. 2022KY021), and the National Natural Science Foundation of China (Grant No. 82274020; 82003851; 81973396). These funders were not involved in any aspect of the present study.

Author information

Authors and Affiliations

Authors

Contributions

YQZ, JD(Du), and DKZ conducted the data analysis and drafted the manuscript, while JD(Du), JD(Dong), PL, YHZ, and DKZ procured the tissue samples and clinical data, then conducted the corresponding DNA methylation microarray experiments. JD(Du), JD(Dong), YH, and YHZ brought their expertise in basic and clinical oncology. YQZ, HLF, FFZ, LLS, and DKZ interpreted the outcomes and conducted statistical analyses. DKZ and JD(Du) designed the experiments and supervised the study. All authors critically reviewed the draft and provided critical reviews, leading to the final version of the manuscript’s approval.

Corresponding authors

Correspondence to Dake Zhang or Yuhua Zhang.

Ethics declarations

Ethics approval and consent to participate

All enrolled patients provided written informed consent, and the study procedures were approved by the Ethics Committee of Zhejiang Provincial People’s Hospital, ZJPPHEC2023I (089), and Zhejiang Cancer Hospital, IRB-2023–1039 (IIT).

Consent for publication

Not applicable.

Competing interests

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, J., Zhao, Y., Dong, J. et al. Single-cell transcriptomics reveal the prognostic roles of epithelial and T cells and DNA methylation-based prognostic models in pancreatic cancer. Clin Epigenet 16, 188 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13148-024-01800-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13148-024-01800-0

Keywords