Advertisement

DESNT: A Poor Prognosis Category of Human Prostate Cancer

Open AccessPublished:March 07, 2017DOI:https://doi.org/10.1016/j.euf.2017.01.016

      Abstract

      Background

      A critical problem in the clinical management of prostate cancer is that it is highly heterogeneous. Accurate prediction of individual cancer behaviour is therefore not achievable at the time of diagnosis leading to substantial overtreatment. It remains an enigma that, in contrast to breast cancer, unsupervised analyses of global expression profiles have not currently defined robust categories of prostate cancer with distinct clinical outcomes.

      Objective

      To devise a novel classification framework for human prostate cancer based on unsupervised mathematical approaches.

      Design, setting, and participants

      Our analyses are based on the hypothesis that previous attempts to classify prostate cancer have been unsuccessful because individual samples of prostate cancer frequently have heterogeneous compositions. To address this issue, we applied an unsupervised Bayesian procedure called Latent Process Decomposition to four independent prostate cancer transcriptome datasets obtained using samples from prostatectomy patients and containing between 78 and 182 participants.

      Outcome measurements and statistical analysis

      Biochemical failure was assessed using log-rank analysis and Cox regression analysis.

      Results and limitations

      Application of Latent Process Decomposition identified a common process in all four independent datasets examined. Cancers assigned to this process (designated DESNT cancers) are characterized by low expression of a core set of 45 genes, many encoding proteins involved in the cytoskeleton machinery, ion transport, and cell adhesion. For the three datasets with linked prostate-specific antigen failure data following prostatectomy, patients with DESNT cancer exhibited poor outcome relative to other patients (p = 2.65 × 10−5, p = 4.28 × 10−5, and p = 2.98 × 10−8). When these three datasets were combined the independent predictive value of DESNT membership was p = 1.61 × 10−7 compared with p = 1.00 × 10−5 for Gleason sum. A limitation of the study is that only prediction of prostate-specific antigen failure was examined.

      Conclusions

      Our results demonstrate the existence of a novel poor prognosis category of human prostate cancer and will assist in the targeting of therapy, helping avoid treatment-associated morbidity in men with indolent disease.

      Patient summary

      Prostate cancer, unlike breast cancer, does not have a robust classification framework. We propose that this failure has occurred because prostate cancer samples selected for analysis frequently have heterozygous compositions (individual samples are made up of many different parts that each have different characteristics). Applying a mathematical approach that can overcome this problem we identify a novel poor prognosis category of human prostate cancer called DESNT.

      Keywords

      1. Introduction

      Risk categories based on prostate-specific antigen (PSA), Gleason score, and Clinical Stage that predict PSA failure [
      • D’Amico A.V.
      Biochemical outcome after radical prostatectomy, external beam radiation therapy, or interstitial radiation therapy for clinically localized prostate cancer.
      ], underpin the treatment of localized prostate cancer, as illustrated, for example, by the UK National Institute for Health and Care Excellence guidelines [
      • Graham J.
      • Kirkbride P.
      • Cann K.
      • Hasler E.
      • Prettyjohns M.
      Prostate cancer: summary of updated NICE guidance.
      ]. Attempts to improve risk stratification have been made with the development of prognostic tests, such as Prolaris [
      • Cuzick J.
      • Swanson G.P.
      • Fisher G.
      • et al.
      Prognostic value of an RNA expression signature derived from cell cycle proliferation genes in patients with prostate cancer: a retrospective study.
      ], Oncotype DX [
      • Klein E.A.
      • Cooperberg M.R.
      • Magi-Galluzzi C.
      • et al.
      A 17-gene assay to predict prostate cancer aggressiveness in the context of Gleason grade heterogeneity, tumor multifocality, and biopsy undersampling.
      ], and Decipher [
      • Erho N.
      • Crisan A.
      • Vergara I.A.
      • et al.
      Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy.
      ]. Most such expression-based prognostic signatures for prostate cancer have in common that they were derived using supervised steps, involving either comparisons of aggressive and nonaggressive disease [
      • Erho N.
      • Crisan A.
      • Vergara I.A.
      • et al.
      Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy.
      ,
      • Glinsky G.V.
      • Glinskii A.B.
      • Stephenson A.J.
      • Hoffman R.M.
      • Gerald W.L.
      Gene expression profiling predicts clinical outcome of prostate cancer.
      ], or the selection of genes representing specific biological functions [
      • Cuzick J.
      • Swanson G.P.
      • Fisher G.
      • et al.
      Prognostic value of an RNA expression signature derived from cell cycle proliferation genes in patients with prostate cancer: a retrospective study.
      ,
      • Tomlins S.A.
      • Alshalalfa M.
      • Davicioni E.
      • et al.
      Characterization of 1577 primary prostate cancers reveals novel biological and clinicopathologic insights into molecular subtypes.
      ,
      • You S.
      • Knudsen B.S.
      • Erho N.
      • et al.
      Integrated classification of prostate cancer reveals a novel luminal subtype with poor outcome.
      ]. Alternatively, expression biomarkers may be linked to the presence of somatic copy number variations [
      • Ross-Adams H.
      • Lamb A.D.
      • Dunning M.J.
      • et al.
      Integration of copy number and transcriptomics provides risk stratification in prostate cancer: A discovery and validation cohort study.
      ]. In contrast, for breast cancer, unsupervised analysis of transcriptome profiles, using approaches such as hierarchical clustering has identified robust disease categories that have distinct clinical outcomes and that require different treatment strategies [
      • Sorlie T.
      • Tibshirani R.
      • Parker J.
      • et al.
      Repeated observation of breast tumor subtypes in independent gene expression data sets.
      ].
      Our hypothesis is that completely unsupervised classification of prostate cancer based on transcriptome data has not been successful previously [
      • Ross-Adams H.
      • Lamb A.D.
      • Dunning M.J.
      • et al.
      Integration of copy number and transcriptomics provides risk stratification in prostate cancer: A discovery and validation cohort study.
      ,
      • Taylor B.S.
      • Schultz N.
      • Hieronymus H.
      • et al.
      Integrative genomic profiling of human prostate cancer.
      ] because individual samples of prostate cancer can contain more than one contributing lineage [
      • Cooper C.S.
      • Eeles R.
      • Wedge D.C.
      • et al.
      Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue.
      ,
      • Cancer Genome Atlas Research Network
      The molecular taxonomy of primary prostate cancer.
      ] and frequently have heterogeneous compositions [
      • Boutros P.C.
      • Fraser M.
      • Harding N.J.
      • et al.
      Spatial genomic heterogeneity within localized, multifocal prostate cancer.
      ,
      • Clark J.
      • Attard G.
      • Jhavar S.
      • et al.
      Complex patterns of ETS gene alteration arise during cancer development in the human prostate.
      ,
      • Tsourlakis M.-C.
      • Stender A.
      • Quaas A.
      • et al.
      Heterogeneity of ERG expression in prostate cancer: a large section mapping study of entire prostatectomy specimens from 125 patients.
      ]. To test this idea, in the current study, we applied Latent Process Decomposition (LPD) [
      • Carrivick L.
      • Rogers S.
      • Clark J.
      • Campbell C.
      • Girolami M.
      • Cooper C.
      Identification of prognostic signatures in breast cancer microarray data using Bayesian techniques.
      ,
      • Rogers S.
      • Girolami M.
      • Campbell C.
      • Breitling R.
      The latent process decomposition of cDNA microarray data sets.
      ]. Based on the latent Dirichlet allocation method [
      • Blei D.M.
      • Ng A.Y.
      • Jordan M.I.
      Latent Dirichlet allocation.
      ], LPD assesses the structure of a dataset in the absence of knowledge of clinical outcome or biological role [
      • Carrivick L.
      • Rogers S.
      • Clark J.
      • Campbell C.
      • Girolami M.
      • Cooper C.
      Identification of prognostic signatures in breast cancer microarray data using Bayesian techniques.
      ]. In contrast to standard unsupervised clustering models (eg, k-means and hierarchical clustering), individual cancers are not assigned to a single cluster: instead gene expression levels in each cancer are modelled via combinations of latent processes. We previously used LPD to confirm the presence of basal and ERBB2 overexpressing categories in breast cancer datasets [
      • Carrivick L.
      • Rogers S.
      • Clark J.
      • Campbell C.
      • Girolami M.
      • Cooper C.
      Identification of prognostic signatures in breast cancer microarray data using Bayesian techniques.
      ], and to show that, based on blood expression profiles, patients with advanced prostate cancer can be stratified into two clinically distinct groups [
      • Olmos D.
      • Brewer D.
      • Clark J.
      • et al.
      Prognostic value of blood mRNA expression signatures in castration-resistant prostate cancer: a prospective, two-stage study.
      ].

      2. Materials and methods

      2.1 The CancerMap dataset

      Fresh prostate cancer specimens were obtained and processed from a systematic series of patients who had undergone a prostatectomy at the Royal Marsden National Health Service Foundation Trust and Addenbrooke’s Hospital, Cambridge as previously described [
      • Ross-Adams H.
      • Lamb A.D.
      • Dunning M.J.
      • et al.
      Integration of copy number and transcriptomics provides risk stratification in prostate cancer: A discovery and validation cohort study.
      ,
      • Warren A.Y.
      • Whitaker H.C.
      • Haynes B.
      • et al.
      Method for sampling tissue for research which preserves pathological data in radical prostatectomy.
      ,
      • Jhavar S.
      • Reid A.
      • Clark J.
      • et al.
      Detection of TMPRSS2-ERG translocations in human prostate cancer by expression profiling using GeneChip Human Exon 1.0 ST arrays.
      ]. The relevant local Research Ethics Committee approval was obtained. Expression profiles were determined and data was processed as previously described [
      • Jhavar S.
      • Reid A.
      • Clark J.
      • et al.
      Detection of TMPRSS2-ERG translocations in human prostate cancer by expression profiling using GeneChip Human Exon 1.0 ST arrays.
      ] using 1.0 Human Exon ST arrays (Affymetrix, Santa Clara, CA, USA) according to the manufacturer’s instructions. Data are available from the Gene Expression Omnibus: GSE94767. CancerMap patients did not receive neo-adjuvant treatment.

      2.2 Additional transcriptome datasets

      We analysed five prostate cancer microarray datasets that will be referred to as: Memorial Sloan Kettering Cancer Centre (MSKCC), CancerMap, CamCap, Stephenson, and Klein. The data used, platforms, and location of clinical data are presented in Fig. 1B. Each dataset was obtained using samples from prostatectomy patients. The CamCap dataset used in our study was produced by combining Illumina HumanHT-12 V4.0 expression beadchip (bead microarray) datasets (GEO: GSE70768 and GSE70769) obtained from two prostatectomy series (Cambridge and Stockholm) and consisted of 147 cancer and 73 normal samples [
      • Ross-Adams H.
      • Lamb A.D.
      • Dunning M.J.
      • et al.
      Integration of copy number and transcriptomics provides risk stratification in prostate cancer: A discovery and validation cohort study.
      ]. The CamCap and CancerMap datasets have 40 patients in common and thus are not independent. One RNAseq dataset consisting of 333 prostate cancers from The Cancer Genome Atlas (TCGA) was analysed which we refer to as TCGA [
      • Cancer Genome Atlas Research Network
      The molecular taxonomy of primary prostate cancer.
      ]. The counts per gene supplied by TCGA were used.
      Fig. 1
      Fig. 1Latent Process Decomposition (LPD), gene correlations, and clinical outcome. (A) LPD analysis of Affymetrix expression data from the Memorial Sloan Kettering Cancer Center (MSKCC) datasets divided the samples into eight processes, each represented here by a bar chart. Samples are represented in all eight processes and height of each bar corresponds to the proportion (pi) of the signature that can be assigned to each LPD process. Samples are assigned to the LPD group in which they exhibit the highest value of pi. LPD was performed using the 500 gene probes with the greatest variation in expression between samples in the MSKCC dataset. The process containing DESNT cancers is indicated. (B) List of datasets used in LPD analysis. The unique number of primary cancer and normal specimens used in LPD are indicated. The CancerMap and CamCap were not independent having 40 cancers in common. Clinical and molecular details for the CancerMap dataset are given in Supplementary Table 4 and Supplementary data. Clinical details for samples from other datasets used in this study can be found in Supplementary data. (C) Correlations of average levels of gene expression between cancers designated as DESNT. All six comparisons for the MSKCC, CancerMap, Stephenson, and Klein datasets are shown. The expression levels of each gene have been normalised across all samples to mean 0 and standard deviation 1. (D) Kaplan-Meier PSA failure plots for the MSKCC, CancerMap, and Stephenson datasets.
      BCR = biochemical recurrence; corr. = correlation; FF = fresh frozen specimen; FFPE = formalin-fixed paraffin embedded specimen; N/A = not applicable.

      2.3 LPD

      LPD [
      • Carrivick L.
      • Rogers S.
      • Clark J.
      • Campbell C.
      • Girolami M.
      • Cooper C.
      Identification of prognostic signatures in breast cancer microarray data using Bayesian techniques.
      ,
      • Rogers S.
      • Girolami M.
      • Campbell C.
      • Breitling R.
      The latent process decomposition of cDNA microarray data sets.
      ], an unsupervised Bayesian approach, was used to classify samples into subgroups called processes. We selected the 500 probesets with greatest variance across the MSKCC dataset for use in LPD. These probesets map to 492 genes. For each dataset, all probesets that map to these genes were used in LPD analyses (CancerMap: 507 probesets, CamCap:483, Stephenson: 609).
      LPD can objectively assess the most likely number of processes. We assessed the hold-out validation log-likelihood of the data computed at various number of processes and used a combination of both the uniform (equivalent to a maximum likelihood approach) and nonuniform (missed approach point approach) priors to choose the number of processes. For robustness, we restarted LPD 100 times with different seeds, for each dataset. Out of the 100 runs we selected a representative run that was used for subsequent analysis. The representative run, was the run with the survival log-rank p-value closest to the mode. For the Klein dataset, for which we do not have clinical data, we used the hold-out log-likelihood from LPD instead.

      2.4 Statistical tests

      All statistical tests were performed in R version 3.2.2 (https://www.r-project.org/). Correlations between the expression profiles between two datasets for a particular gene set and sample subgroup were calculated as follows:
      • 1.
        For each gene we select one probeset at random;
      • 2.
        for each probeset we transformed its distribution across all samples to a standard normal distribution;
      • 3.
        the average expression for each probeset across the samples in the subgroup is determined, to obtain an expression profile for the subgroup;
      • 4.
        the Pearson’s correlation between the expression profiles of the subgroups in the two datasets is determined.
      Differentially expressed probesets were identified using a moderated t-test implemented in the limma R package [
      • Ritchie M.E.
      • Phipson B.
      • Wu D.
      • et al.
      limma powers differential expression analyses for RNA-sequencing and microarray studies.
      ]. Genes are considered significantly differentially expressed if the adjusted p-value was below 0.01 (p values adjusted using the false discovery rate).
      Survival analyses were performed using Cox proportional hazards models, the log-rank test, and Kaplan-Meier estimator, with biochemical recurrence after prostatectomy as the end point. When several samples per patient were available, only the sample with the highest proportion of tumour tissue was used. Multivariate survival analyses were performed with the clinical covariates Gleason grade (≤ 7 and > 7), pathological stage (T1/T2 and T3/T4) and PSA levels (≤ 10 and > 10). We modelled the variables that did not satisfy the proportional hazards assumption (T-stage in MSKCC), as a product of the variable with the heavy-side function:
      g(t)={1,iftt00,otherwise


      where t0 is a time threshold. The multiplication of a predictor with the heavy-side function, divides the predictor into time intervals for which the extended Cox model computes different hazard ratios. Before carrying out multivariate analyses we assessed collinearity between the DESNT predictor and the other traditional indicators. To do this we calculated the variance inflation factor for each covariate in each model. Variance inflation factor varied between 1.005241 and 1.461661, suggesting a very weak correlation between the predictors.

      2.5 Deriving an optimal predictor of DESNT membership

      To derive an optimal predictor of DESNT membership the datasets were prepared so that they were comparable: probes were only retained if the associated gene was found in every microarray platform, only one randomly chosen probe was retained per gene, and the batch effects adjusted using the ComBat algorithm [
      • Johnson W.E.
      • Li C.
      • Rabinovic A.
      Adjusting batch effects in microarray expression data using empirical Bayes methods.
      ]. The MSKCC dataset was used as the training set and other datasets as test sets. Gene selection was performed using regularized general linear model approach (LASSO) implemented in the glmnet R package [
      • Friedman J.
      • Hastie T.
      • Tibshirani R.
      Regularization paths for generalized linear models via coordinate descent.
      ], starting with all genes that were significantly up or down regulated in DESNT in at least two of the total of five microarray dataset (1669 genes). LASSO was run 100 times and only genes that were selected in at least 25% of runs were retained. The optimal predictor was then derived using the random forest (RF) model [
      • Breiman L.
      Random Forests.
      ] implemented in the randomForest R package [
      • Liaw A.
      • Wiener M.
      Classification and regression by random Forest.
      ]. Default parameters were used, apart from the number of trees were set to 10001 and the class size imbalance was adjusted for by down-sampling the majority class to the frequency of the minority class.

      3. Results

      3.1 Identification of the DESNT cancer category

      Four independent transcriptome datasets (designated MSKCC [
      • Taylor B.S.
      • Schultz N.
      • Hieronymus H.
      • et al.
      Integrative genomic profiling of human prostate cancer.
      ], CancerMap, Klein [
      • Klein E.A.
      • Yousefi K.
      • Haddad Z.
      • et al.
      A genomic classifier improves prediction of metastatic disease within 5 years after surgery in node-negative high-risk prostate cancer patients managed by radical prostatectomy without adjuvant therapy.
      ], and Stephenson [
      • Stephenson A.J.
      • Smith A.
      • Kattan M.W.
      • et al.
      Integration of gene expression profiling and clinical variables to predict prostate carcinoma recurrence after radical prostatectomy.
      ]; Fig. 1B) obtained from prostatectomy specimens were analysed. LPD was performed using between three and eight underlying latent processes contributing to the overall expression profile as indicated from log-likelihood plots (Fig. 1B, Supplementary Fig. 1). Following the independent decomposition of each dataset, cancers were assigned to individual processes based on their highest pi value yielding the results shown in Fig. 1A and Supplementary Fig. 2. The pi is the contribution of each process “i” to the expression profile of an individual cancer: sum of pi over all processes = 1.
      Searching for relationships between the decompositions, a single process was identified that, based on correlations of gene expression levels, appeared to be common across all four datasets (Fig. 1C). To further investigate this association, for each dataset, we identified genes that were expressed at significantly lower or higher levels (p< 0.01 after correction for false discovery rate) in the cancers assigned to this process compared with all other cancers from the same dataset. This unveiled a shared set of 45 genes, all with lower expression (Fig. 2A, Supplementary Table 1). Many of the proteins encoded by these 45 core genes are components of the cytoskeleton or regulate its dynamics, while others are involved in cell adhesion and ion transport (Fig. 2B). Eleven of the 45 genes were members of published prognostic signatures for prostate cancer (Fig. 2C, Supplementary data). For example, MYLK, ACTG2, and CNN1 are down-regulated in a signature for cancer metastasis [
      • Ramaswamy S.
      • Ross K.N.
      • Lander E.S.
      • Golub T.R.
      A molecular signature of metastasis in primary solid tumors.
      ], while lower expression of TPM2 is associated with poorer outcomes as part of the Oncotype DX signature [
      • Klein E.A.
      • Cooperberg M.R.
      • Magi-Galluzzi C.
      • et al.
      A 17-gene assay to predict prostate cancer aggressiveness in the context of Gleason grade heterogeneity, tumor multifocality, and biopsy undersampling.
      ]. The cancers assigned to this common process are referred to as “DESNT” (latin DEScenduNT, they descend).
      Fig. 2
      Fig. 2Genes commonly down-regulated in DESNT poor prognosis prostate cancer. (A) Number of genes with significantly altered expression in DESNT cancers compared to non-DESNT cancers (p < 0.01 after correction for false discovery rate). Forty-five genes had lower expression in DESNT cancers in all four expression microarray datasets, based on a stringency requirement of being down-regulated in at least 80 of 100 independent Latent Process Decomposition runs. (B) List of the 45 genes according to biological grouping. Previous published evidence is represented as superscripts and the supporting references are provided in Supplementary Table 1. Encoded protein functions are shown in Supplementary Table 5. Although some of the 45 genes are preferentially expressed in stromal tissue we found no correlation between stromal content and clinical outcome in both the CancerMap and CamCap patient series, where data on cellular composition were available. When patients were stratified into two groups (above and below median stromal content) Kaplan-Meier plots failed to show outcome difference for both the CancerMap (Log-rank test, p = 0.159) and CamCap (p = 0.261) patient series. (c) Relationship between the genes in published poor prognosis signatures for prostate cancer and the DESNT classification for human prostate cancer, represented as a circos plot. Links to the 45 commonly down-regulated genes are shown in brown. References quoted in the circos plot are listed in the Supplemental data and detailed gene relationships are shown in Supplementary data.

      3.2 Patients with DESNT cancers exhibit poor prognosis

      Using linked clinical data available for the MSKCC expression dataset we found that patients with DESNT cancer exhibited poor outcomes when compared with patients assigned to other processes (p = 2.65 × 10−5, Log-rank test; Fig. 1D). Validation was provided in two further datasets where PSA failure data following prostatectomy were available (Fig. 1D): for both the Stephenson and CancerMap datasets patients with DESNT cancer exhibited a poor outcome (p = 4.28 × 10−5 and p = 2.98 × 10−8, respectively). The number of cancers in each group is indicated in the bottom right corner of each Kaplan-Meier plot. The number of patients with PSA failure is indicated in parentheses. In multivariate analysis, including Gleason sum, Stage, and PSA, assignment as a DESNT cancer was an independent predictor of poor outcome in the Stephenson and CancerMap datasets (p = 1.83 × 10−4 and p = 3.66 × 10−3, Cox regression model) but not in the MSKCC dataset (p = 0.327; Table 1, Supplementary Fig. 3). When the three datasets were combined the independent predictive value of DESNT membership was p = 1.61 × 10−7 (Supplementary Fig. 3), compared with p = 1.00 × 10−5 for Gleason sum. Including surgical margin status in the multivariate analysis had little influence on these values giving p = 3.63 × 10−7 for DESNT compared to p = 1.80 × 10−5 for Gleason Sum. The combined multivariate model is a significant improvement over a baseline Cox proportional hazard ratio model containing Gleason, PSA, and Clinical Stage (p = 9.528 × 10−7; likelihood ratio test). The poor prognosis DESNT process was also identified in the CamCap dataset [
      • Ross-Adams H.
      • Lamb A.D.
      • Dunning M.J.
      • et al.
      Integration of copy number and transcriptomics provides risk stratification in prostate cancer: A discovery and validation cohort study.
      ] (Table 1, Supplementary Figs. 3 and 4), which was excluded from the above analysis because it was not independent: there is a substantial overlap with cancers included in CancerMap (Fig. 1B).
      Table 1Poor clinical outcome of patients with DESNT cancer.
      Latent Process Decomposition
      DatasetUnivariate p-valueMultivariate p-value
      MSKCC2.65 × 10−53.27 × 10−1
      CancerMap2.98 × 10−83.66 × 10−3
      Stephenson4.28 × 10−51.83 × 10−4
      CamCap1.22 × 10−32.90 × 10−2
      Random forest
      DatasetUnivariate p-valueMultivariate p-value
      MSKCC1.85 × 10−36.05 × 10−1
      CancerMap4.80 × 10−41.45 × 10−2
      Stephenson1.75 × 10−44.56 × 10−4
      CamCap1.61 × 10−51.31 × 10−4
      TCGA5.41 × 10−42.59 × 10−2
      MSKCC = Memorial Sloan Kettering Cancer Center.
      For each dataset comparisons were made between prostate-specific antigen failures reported for DESNT and non-DESNT cancers. For Latent Process Decomposition (LPD) the log-rank p-values represent the modal LPD run selected from the 100 independent LPD runs as described in the Materials and methods. For multivariate analyses Gleason sum, prostate-specific antigen at diagnosis and Pathological Stage are included for all datasets with the exception of the The Cancer Genome Atlas (TCGA) dataset where only Gleason sum and Clinical Stage data were available. The full analyses are presented in Fig. 3 and Supplementary Fig. 3.

      3.3 A RF classifier for identifying DESNT cancer

      We wished to develop a classifier that, unlike LPD, was not computer processing intensive and that could be applied both to a wider range of datasets and to individual cancers. There were 1669 genes with significantly altered expression between DESNT and non-DESNT cancers in at least two datasets were selected for analysis. A LASSO logistic regression model was used to identify genes that were the best predictors of DESNT membership in the MSKCC dataset leading to the selection of a set of 20 genes (Supplementary Table 2), which had a one gene overlap (ACTG2) to the 45 genes with significantly lower expression in DESNT cancers. Using random forest (RF) classification these 20 genes provided high specificity and sensitivity for predicting that individual cancers were DESNT in both the MSKCC training dataset and in three validation datasets (Supplementary Fig. 5). For the two validation datasets (Stephenson and CancerMap) with linked PSA failure data the predicted cancer subgroup exhibited poorer clinical outcome in both univariate and multivariate analyses, in agreement with the results observed using LPD (Table 1, Fig. 3).
      Fig. 3
      Fig. 3Analysis of outcome for DESNT cancers identified by random forest (RF) classification. (A) Kaplan-Meier prostate-specific antigen (PSA) failure plots for the Memorial Sloan Kettering Cancer Center (MSKCC), (B) CancerMap, (C) Stephenson, (D) CamCap, and (E) The Cancer Genome Atlas (TCGA) datasets. For each dataset the cancers assigned to DESNT using the 20 gene RF classifier are compared to the remaining cancers. The number of cancers in each group is indicated in the bottom right corner of each plot. The number of cancers with PSA failure is indicated in parentheses. Multivariate analyses were performed as described in the Materials and methods for the (F) MSKCC, (G) CancerMap, (H) Stephenson, (I) CamCap, and (J) TCGA datasets. Pathological (Path) Stage covariates for MSKCC and Stephenson datasets did not meet the proportional hazards assumptions of the Cox model and have been modelled as time-dependent variables, as described in the Materials and methods.
      BCR = biochemical recurrence.

      3.4 DESNT cancers in TCGA dataset

      When RF classification was applied to RNAseq data from 333 prostate cancers described by TCGA [
      • Cancer Genome Atlas Research Network
      The molecular taxonomy of primary prostate cancer.
      ] a patient subgroup was identified that was confirmed as DESNT based on: (1) correlations of gene expression levels with DESNT cancer groups in other datasets (Supplementary Fig. 6), (2) demonstration of overlaps of differentially expressed genes between DESNT and non-DESNT cancers with the core down-regulated gene set (45/45 genes), and (3) its poorer clinical outcome based on PSA failure (p = 5.4 × 10−4) compared to non-DESNT patients (Table 1, Fig. 3E).
      For the TCGA dataset, we failed to find correlations between assignment as a DESNT cancer and the presence of any specific genetic alteration (p> 0.05 after correction for false discovery rate, χ2 test; Fig. 4). Of particular note, there was no correlation to ETS-gene status (p = 0.136, χ2 test; Fig. 4). A lack of correlation between DESNT cancers and ERG-gene rearrangement, determined using the fluorescence in situ hybridization break-apart assay [
      • Tomlins S.A.
      • Rhodes D.R.
      • Perner S.
      • et al.
      Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer.
      ], was confirmed using CancerMap samples (LPD-DESNT, p = 0.549; RF-DESNT, p = 0.2623, χ2 test: DESNT cancers identified by LPD and by RF approaches are referred to respectively as LPD-DESNT and RF-DESNT). These observations are consistent with the lack of correlation between ERG status and clinical outcome [
      • Weischenfeldt J.
      • Simon R.
      • Feuerbach L.
      • et al.
      Integrative genomic analyses reveal an androgen-driven somatic alteration landscape in early-onset prostate cancer.
      ], although different views on the relationship between ERG-gene status and clinical outcome have been expressed [
      • Clark J.P.
      Cooper CS: ETS gene fusions in prostate cancer.
      ]. Since ETS-gene alteration, found in around half of prostate cancers [
      • Cancer Genome Atlas Research Network
      The molecular taxonomy of primary prostate cancer.
      ,
      • Tomlins S.A.
      • Rhodes D.R.
      • Perner S.
      • et al.
      Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer.
      ] is considered to be an early step in prostate cancer development [
      • Clark J.
      • Attard G.
      • Jhavar S.
      • et al.
      Complex patterns of ETS gene alteration arise during cancer development in the human prostate.
      ,
      • Park K.
      • Dalton J.T.
      • Narayanan R.
      • Barbieri C.E.
      • Hancock M.L.
      • Bostwick D.G.
      • et al.
      TMPRSS2:ERG gene fusion predicts subsequent detection of prostate cancer in patients with high-grade prostatic intraepithelial neoplasia.
      ] it is likely that changes involved in the generation of DESNT cancer represent a later event that is common to both ETS-positive and ETS-negative cancers.
      Fig. 4
      Fig. 4Comparison of random forest (RF)-DESNT and non-RF-DESNT cancers in The Cancer Genome Atlas dataset. A 20-gene RF classifier was used to identify DESNT cancers (designated RF-DESNT cancers). The types of genetic alteration are shown for each gene (mutations, fusions, deletions, and over-expression). Clinical parameters including biochemical recurrence (BCR) are represented at the bottom together with groups for iCluster, methylation, somatic copy number alteration (SVNA), and messenger RNA (mRNA)
      [
      • Cancer Genome Atlas Research Network
      The molecular taxonomy of primary prostate cancer.
      ]
      . When mutations and homozygous deletions for each gene were combined RF-DESNT cancers contained an excess of genetic alterations in BRCA2 (p = 0.021, χ2 test) and TP53 (p = 0.0038), but after correcting for multiple testing these differences were not significant (p > 0.05).
      For RF-DESNT cancers in the TGCA series many of the 45 core genes exhibited altered levels of CpG gene methylation compared to non-RF-DESNT cancers (Supplementary Table 3) suggesting a possible role in controlling gene expression. Supporting this idea, for sixteen of the 45 core genes epigenetic down-regulation in human cancer has been previously reported, including six genes in prostate cancer (CLU, DPYSL3, GSTP1, KCNMA1, SNAI2, and SVIL; Fig. 2B, Supplementary Table 1). CpG methylation of five of the genes (FBLN1, GPX3, GSTP1, KCNMA1, TIMP3) has previously been linked to cancer aggression.

      4. Discussion

      Evidence from The European Randomized study of Screening for Prostate Cancer demonstrates that PSA screening can reduce mortality from prostate cancer by 21% [
      • Schröder F.H.
      • Hugosson J.
      • Roobol M.J.
      • et al.
      Screening and prostate cancer mortality: results of the European Randomised Study of Screening for Prostate Cancer (ERSPC) at 13 years of follow-up.
      ]. However, a critical problem is that the progression of prostate cancer is highly heterogeneous [
      • D'Amico A.V.
      Cancer-specific mortality after surgery or radiation for patients with clinically localized prostate cancer managed during the prostate-specific antigen era.
      ,
      • Buyyounouski M.K.
      • Pickles T.
      • Kestin L.L.
      • Allison R.
      • Williams S.G.
      Validating the interval to biochemical failure for the identification of potentially lethal prostate cancer.
      ] and PSA screening leads to the detection of up to 50% of cancers that are clinically irrelevant [
      • Draisma G.
      • Etzioni R.
      • Tsodikov A.
      • et al.
      Lead time and overdiagnosis in prostate-specific antigen screening: importance of methods and context.
      ,
      • Etzioni R.
      • Gulati R.
      • Mallinger L.
      • Mandelblatt J.
      Influence of study features and methods on overdiagnosis estimates in breast and prostate cancer screening.
      ]: that is cancers that would never have caused symptoms in a man’s lifetime in the absence of screening. Unsupervised analyses of breast cancer datasets using hierarchical clustering previously revealed the existence basal, ERBB2-overexpressing and luminal cancer categories [
      • Sorlie T.
      • Tibshirani R.
      • Parker J.
      • et al.
      Repeated observation of breast tumor subtypes in independent gene expression data sets.
      ]. This mathematical approach has not proven successful when applied to prostate cancer microarray datasets [
      • Ross-Adams H.
      • Lamb A.D.
      • Dunning M.J.
      • et al.
      Integration of copy number and transcriptomics provides risk stratification in prostate cancer: A discovery and validation cohort study.
      ,
      • Taylor B.S.
      • Schultz N.
      • Hieronymus H.
      • et al.
      Integrative genomic profiling of human prostate cancer.
      ]. However, in our study the use of LPD, an unsupervised method that takes into account the issue of cancer heterogeneity, has revealed the existence of a novel category of prostate cancer, designated DESNT, common across all datasets. The subsequent linking to clinical data revealed that DESNT cancers exhibit poor prognosis.
      It was notable that membership of the DESNT cancer groups was not an independent predictor of clinical outcome in the MSKCC dataset. It is possible that the difference may simply reflect statistical variation since the size of the DESNT group in several datasets was small (MSKCC, 13%; CancerMap, 8%; Stephenson, 31%; Klein, 23%). Critically, however, when the datasets with linked clinical data were combined DESNT membership remained an independent predictor of clinical outcome. We failed to detect systematic differences between MSKCC and other datasets used in multivariate analyses (Supplementary Fig. 3H).
      We have not, in this study, investigated the biological function and mechanisms of alterations of expression of the 45 core genes. However, gene down-regulation mediated by CpG methylation is well documented in human cancer, as is the association of CpG methylation of single genes with aggressive cancer behaviour (Supplementary Table 1). The results found for DESNT cancers are consistent with these observations, but would suggest that it is the combine under expression of multiple genes that represents a critical determinant of cancer progression and aggression. Several of the genes found to have lower expression in DESNT cancer (ACTA2, CNN1, LMOD1) encode proteins primarily expressed in smooth muscle cells or myofibroblast, indicative of an altered tumour-stromal environment. We failed to find a correlation between stromal content and clinical outcome in the CamCap and CancerMap datasets (Fig. 2). However, this does not exclude the possibility that DESNT cancers themselves may have lower stromal content, in part explaining the lower expression of these genes.
      Other under-expressed genes encode components of the actin cytoskeleton or regulate its dynamics (eg, MLCK, MYL9, ACTN1, and TNS1). Increased malignancy may correlate with increased cell migratory behaviour, which in turn can involve deployment of particular types of cell adhesion and cytoskeletal machinery [
      • Friedl P.
      • Locker J.
      • Sahai E.
      • Segall J.E.
      Classifying collective cancer cell invasion.
      ]. A high dependency on actomyosin contractility is recognised as a hallmark of amoeboid movement. Down-regulation of these genes in DESNT cancers would argue against its involvement. The lower expression of focal adhesion components such as integrin α5, vinculin, and integrin-linked kinase, would also argue against involvement of mesenchymal type migration, which is dependent on these classes of genes [
      • Friedl P.
      • Locker J.
      • Sahai E.
      • Segall J.E.
      Classifying collective cancer cell invasion.
      ]. It is thus possible that the observed alterations may support involvement of collective migration or expansive growth phenotypes [
      • Friedl P.
      • Locker J.
      • Sahai E.
      • Segall J.E.
      Classifying collective cancer cell invasion.
      ].
      Notably, we failed to find any relationship between DESNT cancers and either copy number variant signatures (Fig. 2C) or DNA repair gene alterations (Fig. 4). Assignment of cancers within the DESNT classification framework together with the use of standard clinical indicators (Stage, Gleason sum, PSA), copy number variant signatures [
      • Taylor B.S.
      • Schultz N.
      • Hieronymus H.
      • et al.
      Integrative genomic profiling of human prostate cancer.
      ], expression biomarkers such as Prolaris [
      • Cuzick J.
      • Swanson G.P.
      • Fisher G.
      • et al.
      Prognostic value of an RNA expression signature derived from cell cycle proliferation genes in patients with prostate cancer: a retrospective study.
      ], Decipher [
      • Erho N.
      • Crisan A.
      • Vergara I.A.
      • et al.
      Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy.
      ], and Oncotype DX [
      • Klein E.A.
      • Cooperberg M.R.
      • Magi-Galluzzi C.
      • et al.
      A 17-gene assay to predict prostate cancer aggressiveness in the context of Gleason grade heterogeneity, tumor multifocality, and biopsy undersampling.
      ] identified in supervised analyses and urine biomarkers [
      • Van Neste L.
      • Hendriks R.J.
      • Dijkstra S.
      • et al.
      Detection of high-grade prostate cancer using a urinary molecular biomarker-based risk score.
      ], should significantly enhance the ability identify patients whose cancers should be targeted by radical therapies, avoiding the side effects of treatment, including impotence, in men with nonaggressive disease. In future studies, we are focusing on the development of both LPD- and RF-based tests that can be used to detect DESNT cancer in biopsy tissue in a clinical setting.
      Author contributions: Colin S. Cooper had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.Study concept and design: Moulton, Brewer, Luca, C. Cooper.
      Acquisition of data: C. Cooper, Neal, Eeles, Lynch, Whitaker, Sandra Edwards, Merson, Dennis, Hazel, Warren, Ross-Adams, The CancerMap Group.
      Analysis and interpretation of data: C. Cooper, Brewer, Luca, Moulton, D. Edwards, R Cooper, Sethia, Ball, Mills, Clark, Curley. Lamb, Ross-Adams, Eeles.
      Drafting of the manuscript: C. Cooper, Luca, Brewer, Moulton.
      Critical revision of the manuscript for important intellectual content: Lynch, Neal, Eeles, R. Cooper, Whitaker, D. Edwards, S. Edwards.
      Statistical analysis: Luca, Brewer.
      Obtaining funding: C. Cooper, Moulton, Brewer, Neal.
      Administrative, technical, or material support: None.
      Supervision: None.
      Other (Latent Process Decomposition analysis of data and statistical analysis): Luca, Brewer.Financial disclosures: Colin S. Cooper certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: C. Cooper, D. Brewer, Moulton, and Luca are coinventors on a patent application from the University of East Anglia on the detection of DESNT prostate cancer. Eeles has received educational grants from Illumina and GenProbe (formerly Tepnel), Vista Diagnostics, and Janssen Pharmaceuticals. She has received honoraria from Succint Communications for talks on prostate cancer genetics.Funding/Support and role of the sponsor: This work was funded by the Bob Champion Cancer Trust, The Masonic Charitable Foundation successor to The Grand Charity, The King Family, and The University of East Anglia. We acknowledge support from Movember, from Prostate Cancer UK, Callum Barton, and from The Andy Ripley Memorial Fund. The research presented in this paper was carried out on the High Performance Computing Cluster supported by the Research and Specialist Computing Support service at the University of East Anglia. Cancer Research UK Grant 10047 funded the generation of the prostate CancerMap expression microarray dataset. We would like to acknowledge the support of the National Institute for Health Research which funds the Cambridge Bio-medical Research Centre, Cambridge UK. The sponsors did not participate in the design and conduct of the study; data collection, management, analysis, and interpretation; and manuscript preparation, review, and approval.

      Appendix A. Supplementary data

      References

        • D’Amico A.V.
        Biochemical outcome after radical prostatectomy, external beam radiation therapy, or interstitial radiation therapy for clinically localized prostate cancer.
        JAMA. 1998; 280: 969-974
        • Graham J.
        • Kirkbride P.
        • Cann K.
        • Hasler E.
        • Prettyjohns M.
        Prostate cancer: summary of updated NICE guidance.
        BMJ. 2014; 348 (f7524–4)
        • Cuzick J.
        • Swanson G.P.
        • Fisher G.
        • et al.
        Prognostic value of an RNA expression signature derived from cell cycle proliferation genes in patients with prostate cancer: a retrospective study.
        Lancet Oncol. 2011; 12: 245-255
        • Klein E.A.
        • Cooperberg M.R.
        • Magi-Galluzzi C.
        • et al.
        A 17-gene assay to predict prostate cancer aggressiveness in the context of Gleason grade heterogeneity, tumor multifocality, and biopsy undersampling.
        Eur Urol. 2014; 66: 550-560
        • Erho N.
        • Crisan A.
        • Vergara I.A.
        • et al.
        Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy.
        PLoS One. 2013; 8: e66855
        • Glinsky G.V.
        • Glinskii A.B.
        • Stephenson A.J.
        • Hoffman R.M.
        • Gerald W.L.
        Gene expression profiling predicts clinical outcome of prostate cancer.
        J Clin Invest. 2004; 113: 913-923
        • Tomlins S.A.
        • Alshalalfa M.
        • Davicioni E.
        • et al.
        Characterization of 1577 primary prostate cancers reveals novel biological and clinicopathologic insights into molecular subtypes.
        Eur Urol. 2015; 68: 555-567
        • You S.
        • Knudsen B.S.
        • Erho N.
        • et al.
        Integrated classification of prostate cancer reveals a novel luminal subtype with poor outcome.
        Cancer Res. 2016; 76: 4948-4958
        • Ross-Adams H.
        • Lamb A.D.
        • Dunning M.J.
        • et al.
        Integration of copy number and transcriptomics provides risk stratification in prostate cancer: A discovery and validation cohort study.
        EBioMedicine. 2015; 2: 1133-1144
        • Sorlie T.
        • Tibshirani R.
        • Parker J.
        • et al.
        Repeated observation of breast tumor subtypes in independent gene expression data sets.
        Proc Natl Acad Sci USA. 2003; 100: 8418-8423
        • Taylor B.S.
        • Schultz N.
        • Hieronymus H.
        • et al.
        Integrative genomic profiling of human prostate cancer.
        Cancer Cell. 2010; 18: 11-22
        • Cooper C.S.
        • Eeles R.
        • Wedge D.C.
        • et al.
        Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue.
        Nat Genet. 2015; 47: 367-372
        • Cancer Genome Atlas Research Network
        The molecular taxonomy of primary prostate cancer.
        Cell. 2015; 163: 1011-1025
        • Boutros P.C.
        • Fraser M.
        • Harding N.J.
        • et al.
        Spatial genomic heterogeneity within localized, multifocal prostate cancer.
        Nat Genet. 2015; 47: 736-745
        • Clark J.
        • Attard G.
        • Jhavar S.
        • et al.
        Complex patterns of ETS gene alteration arise during cancer development in the human prostate.
        Oncogene. 2008; 27: 1993-2003
        • Tsourlakis M.-C.
        • Stender A.
        • Quaas A.
        • et al.
        Heterogeneity of ERG expression in prostate cancer: a large section mapping study of entire prostatectomy specimens from 125 patients.
        BMC Cancer. 2016; 16: 641
        • Carrivick L.
        • Rogers S.
        • Clark J.
        • Campbell C.
        • Girolami M.
        • Cooper C.
        Identification of prognostic signatures in breast cancer microarray data using Bayesian techniques.
        J R Soc Interface. 2006; 3: 367-381
        • Rogers S.
        • Girolami M.
        • Campbell C.
        • Breitling R.
        The latent process decomposition of cDNA microarray data sets.
        IEEE/ACM Trans Comput Biol Bioinform. 2005; 2: 143-156
        • Blei D.M.
        • Ng A.Y.
        • Jordan M.I.
        Latent Dirichlet allocation.
        J Mach Learn Res. 2003; 3: 993-1022
        • Olmos D.
        • Brewer D.
        • Clark J.
        • et al.
        Prognostic value of blood mRNA expression signatures in castration-resistant prostate cancer: a prospective, two-stage study.
        Lancet Oncol. 2012; 13: 1114-1124
        • Warren A.Y.
        • Whitaker H.C.
        • Haynes B.
        • et al.
        Method for sampling tissue for research which preserves pathological data in radical prostatectomy.
        Prostate. 2013; 73: 194-202
        • Jhavar S.
        • Reid A.
        • Clark J.
        • et al.
        Detection of TMPRSS2-ERG translocations in human prostate cancer by expression profiling using GeneChip Human Exon 1.0 ST arrays.
        J Mol Diagn. 2008; 10: 50-57
        • Ritchie M.E.
        • Phipson B.
        • Wu D.
        • et al.
        limma powers differential expression analyses for RNA-sequencing and microarray studies.
        Nucleic Acids Res. 2015; 43: e47-7
        • Johnson W.E.
        • Li C.
        • Rabinovic A.
        Adjusting batch effects in microarray expression data using empirical Bayes methods.
        Biostatistics. 2007; 8: 118-127
        • Friedman J.
        • Hastie T.
        • Tibshirani R.
        Regularization paths for generalized linear models via coordinate descent.
        J Stat Softw. 2010; 33: 1-22
        • Breiman L.
        Random Forests.
        Mach Learn. 2001; 45: 5-32
        • Liaw A.
        • Wiener M.
        Classification and regression by random Forest.
        R News. 2002; 2/3: 18-22
        • Klein E.A.
        • Yousefi K.
        • Haddad Z.
        • et al.
        A genomic classifier improves prediction of metastatic disease within 5 years after surgery in node-negative high-risk prostate cancer patients managed by radical prostatectomy without adjuvant therapy.
        Eur Urol. 2015; 67: 778-786
        • Stephenson A.J.
        • Smith A.
        • Kattan M.W.
        • et al.
        Integration of gene expression profiling and clinical variables to predict prostate carcinoma recurrence after radical prostatectomy.
        Cancer. 2005; 104: 290-298
        • Ramaswamy S.
        • Ross K.N.
        • Lander E.S.
        • Golub T.R.
        A molecular signature of metastasis in primary solid tumors.
        Nat Genet. 2003; 33: 49-54
        • Tomlins S.A.
        • Rhodes D.R.
        • Perner S.
        • et al.
        Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer.
        Science. 2005; 310: 644-648
        • Weischenfeldt J.
        • Simon R.
        • Feuerbach L.
        • et al.
        Integrative genomic analyses reveal an androgen-driven somatic alteration landscape in early-onset prostate cancer.
        Cancer Cell. 2013; 23: 159-170
        • Clark J.P.
        Cooper CS: ETS gene fusions in prostate cancer.
        Nat Rev Urol. 2009; 6: 429-439
        • Park K.
        • Dalton J.T.
        • Narayanan R.
        • Barbieri C.E.
        • Hancock M.L.
        • Bostwick D.G.
        • et al.
        TMPRSS2:ERG gene fusion predicts subsequent detection of prostate cancer in patients with high-grade prostatic intraepithelial neoplasia.
        J Clin Oncol. 2014; 32: 206-211
        • Schröder F.H.
        • Hugosson J.
        • Roobol M.J.
        • et al.
        Screening and prostate cancer mortality: results of the European Randomised Study of Screening for Prostate Cancer (ERSPC) at 13 years of follow-up.
        Lancet. 2014; 384: 2027-2035
        • D'Amico A.V.
        Cancer-specific mortality after surgery or radiation for patients with clinically localized prostate cancer managed during the prostate-specific antigen era.
        J Clin Oncol. 2003; 21: 2163-2172
        • Buyyounouski M.K.
        • Pickles T.
        • Kestin L.L.
        • Allison R.
        • Williams S.G.
        Validating the interval to biochemical failure for the identification of potentially lethal prostate cancer.
        J Clin Oncol. 2012; 30: 1857-1863
        • Draisma G.
        • Etzioni R.
        • Tsodikov A.
        • et al.
        Lead time and overdiagnosis in prostate-specific antigen screening: importance of methods and context.
        J Natl Cancer Inst. 2009; 101: 374-383
        • Etzioni R.
        • Gulati R.
        • Mallinger L.
        • Mandelblatt J.
        Influence of study features and methods on overdiagnosis estimates in breast and prostate cancer screening.
        Ann Intern Med. 2013; 158: 831-838
        • Friedl P.
        • Locker J.
        • Sahai E.
        • Segall J.E.
        Classifying collective cancer cell invasion.
        Nat Cell Biol. 2012; 14: 777-783
        • Van Neste L.
        • Hendriks R.J.
        • Dijkstra S.
        • et al.
        Detection of high-grade prostate cancer using a urinary molecular biomarker-based risk score.
        Eur Urol. 2016; 70: 740-748