Biologically informed deep neural network for prostate cancer discovery - Nature - Alert Breaking News

P-NET design

We introduce P-NET, an artificial neural network with biologically informed, parsimonious architecture that accurately predicts metastasis in patients with prostate cancer on the basis of their genomic profiles. P-NET is a feedforward neural network with constraints on the nodes and edges. In P-NET, each node encodes some biological entity (for example, genes and pathways) and each edge represents a known relationship between the corresponding entities. The constraints on the nodes allow for better understanding of the state of different biological components. The constraints on the edges allow us to use a large number of nodes without increasing the number of edges, which leads to a smaller number of parameters compared to fully connected networks with the same number of nodes, and thus potentially fewer computations. The architecture was built using the Reactome pathway datasets⁴⁶. The whole Reactome dataset was downloaded and processed to form a layered network of five layers of pathways, one layer of genes, and one layer for features. This sparse model had slightly over 71,000 weights with the number of nodes per layer distributed as shown in Extended Data Fig. 1e. A dense network with the same number of nodes would have more than 270 million weights with the first layer containing more than 94% of the weights. A hybrid model which contains a sparse layer followed by dense layers still contains over 14 million weights. The number of dense weights is calculated as w_l = n_l × (n_l – 1 + 1), where w_l is the number of weights per layer l and n_l is the number of nodes of the same layer. Note that the P-NET model is not bound to a certain architecture, as the model architecture is automatically built by reading model specifications provided by the user via a gene matrix transposed file format (.gmt) file, and custom pathways, gene sets and modules with custom hierarchies can be provided by the user.

The meaning of the nodes, layers and connection of P-NET is encoded through a carefully engineered architecture and a set of restrictions on the connections of the network. The input layer is meant to represent features that can be measured and fed into the network. The second layer represents a set of genes of interest. The higher layers represent a hierarchy of pathways and biological processes that are manually curated. The first layer of P-NET is connected to the next layer via a set of one-to-one connections, and each node in the next layer is connected to exactly three nodes of the input layer representing mutations, copy number amplification and copy number deletions. This scheme results in a much smaller number of weights in the first layer compared with a fully connected network and the special pattern of the connection matrix results in more efficient training. The second layer is restricted to have connections reflecting the gene-pathway relationships as curated by the Reactome pathway dataset. The connections are encoded by a mask matrix M that is multiplied by the weights matrix W to zero-out all the connections that do not exist in the Reactome pathway dataset. For the next layers, a similar scheme is devised to control the connection between consecutive layers to reflect the real parent–child relationships that exist in the Reactome dataset. The output of each layer is calculated as y = f [(M *W)^Tx + b], where f is the activation function, M is the mask matrix, W is the weights matrix, x is the input matrix, b is the bias vector, and * is the Hadamard product (see Extended Data Fig. 1a–c). The activation of each node is kept into the range of [−1,1] by applying the tanh function (f={tanh }={({rm{e}}}^{2x}-1)/{({rm{e}}}^{2x}+1)) to the weighted inputs of the node. The activation of the outcome layers is calculated by the sigmoid function (sigma =1/(1+{{rm{e}}}^{-x})).

To allow each layer to be useful by itself, we added a predictive layer with sigmoid activation after each hidden layer. P-NET has a smaller number of nodes per layer in the later layers compared to the first layers Extended Data Fig. 1e. Since it is more challenging to fit the data using a smaller number of weights in the later layers, we used a higher loss weight for later layer outcomes during the optimization process. The final prediction of the network was calculated by taking the average of all the layer outcomes, Extended Data Fig. 1d. The learning rate was initialized to be 0.001 and actively reduced after every 50 epochs to allow for smooth convergence. Since we have an unbalanced dataset, we weighted the classes differently to reduce the network bias toward one class based on the bias in the training set. The model was trained using Adam optimizer⁴⁷ to reduce the binary cross-entropy loss functions (H=-frac{1}{N}Sigma {y}_{i}.{rm{log }}(p({y}_{i}))+(1-{y}_{i}).{rm{log }}(1-p({y}_{i}))), where ({y}_{i}) is the label for sample i, (p({y}_{i})) is the probability that sample i has a metastatic cancer as calculated using the sigmoid function (sigma ), and N is the total number of samples. Empirically we found that using adaptive learning rate besides Adam led to smoother convergence and improved prediction performance. We checked different gradient-based attribution methods to rank the features in all the layers, and we chose to use the DeepLIFT scheme as implemented in the DeepExplain library¹³.

DeepLIFT is a backpropagation-based attribution approach for assigning a sample-level importance score for each feature. In this work, we are interested in assigning scores for each node in each layer. Given a certain sample, a specific target t, and a set of layer nodes ({x}_{1}^{s},{x}_{2}^{s},{…,{x}}_{i}^{s},…,,{x}_{{n}_{l}}^{s}), where n_l is the number of nodes in a certain layer l, DeepLIFT calculates an importance score ({C}_{i}^{l,s}) for each node on the basis of the difference in the target activation t – t₀ such that the difference equals the aggregation of the calculated scores for all the nodes. That is, the difference in target activation is given by:

$$triangle t=t-{t}_{0}$$

Which equals the sum of all node scores when fed by a certain sample (S). That is,

$$triangle t=mathop{sum }limits_{i=1}^{{n}_{l}}{C}_{i}^{l,s}$$

We used the ‘Rescale rule’ of DeepLIFT as implemented by DeepExplain to calculate the sample-level importance of all nodes in all layers. Further details are available in ref. ¹³. To calculate the total node-level importance ({C}_{i}^{l}) we aggregated the sample-level importance score scores over all the n_s testing-set samples.

$${C}_{i}^{l}={rm{|}}mathop{sum }limits_{i=1}^{{n}_{s}}{C}_{i}^{l,s}{rm{|}}$$

Note that this is an absolute score (always positive) that measures the impact of a certain node on the outcome. The activation of the corresponding node i, however, could be positive or negative.

To reduce the bias introduced by over-annotation of certain nodes (nodes that are member of too many pathways), we adjusted the DeepLIFT scores using a graph informed function f that considers the connectivity of each node. The importance score ({C}_{i}^{l}) is divided by the node degree ({d}_{i}^{l}) if the node degree is larger than the mean of node degrees plus 5σ where σ is the standard deviation of node degrees.

$${d}_{i}^{l}={{fan}{rm{_}}{in}}_{i}^{l}+{{fan}{rm{_}}{out}}_{i}^{l}$$

$${{adjusted}{rm{_}}C}_{i}^{l}=f(x)={begin{array}{c}frac{{C}_{i}^{l}}{{d}_{i}^{l}},,{d}_{i}^{l} > mu +5sigma \ {C}_{i}^{l},,{otherwise}end{array}$$

P-NET training and evaluation

To check the utility of the developed model, we trained P-NET to predict cancer state (primary/metastatic) of patients with prostate cancer on the basis of their genomic profiles. We used tumour or germline-matched whole-exome sequencing of 1,013 patients along with the corresponding somatic mutations and copy number alterations that were prepared using a unified computational pipeline for harmonized somatic alteration derivation⁸ (annotated in this study as the ‘Armenia et al.’ cohort). The mutations were aggregated on the gene level with focus on nonsynonymous mutations to align with prior work on mutational significance in prostate cancer whole-exome datasets, excluding silent, intron, 3′ untranslated region (UTR), 5′ UTR, RNA and long intergenic non-coding RNA (lincRNA) mutations. The copy number alterations for each gene were assigned on the basis of the called segment-level copy number emphasizing high gains and deep deletions and excluding single-copy amplification and deletions, as defined by GISTIC2.0 and generated from the source data type. For secondary analyses involving RNA data (fusions, expression), bulk whole transcriptomes from the subset of the Armenia et al. cohort, where such data were available, were secured from their source studies (n = 455 from TCGA, n = 204 from SU2C-PCF consortia) for uniform alignment and quantification of RNA sequences. Reads were downloaded as FASTQs from TCGA (ISB-CGC; https://isb-cgc.appspot.com/) and as CRAMs from SU2C (from Amazon S3 bucket, dbGaP accession code, phs000915.v2.p2) and then converted to FASTQs using samtools fastq. In cases where an SU2C sample had both transcriptome capture and polyA sequencing, transcriptome capture was used to optimize for fusion detection as the primary use of these data. Adapters were trimmed with cutadapt v2.2 and reads were aligned using STAR aligner v2.7.2b^48,49. STAR-aligned bam files were passed into RSEM to generate gene-level transcript counts and transcript per million (TPM) quantifications using the GENCODE release 30 gene annotation lifted over to GRCh37. STAR chimeric junctions were supplied to STAR-Fusion v1.7.0 in kickstart mode to call fusions⁵⁰. Fusion calls were filtered down to those that included genes classified as oncogene or fusion in the Cancer Gene Census⁵¹. To test model flexibility for RNA-based fusion inputs, as a secondary analysis we also developed P-NET models trained to predict cancer state incorporating fusions or different definitions of copy number states (Extended Data Fig. 3, 4).

The prediction performance was measured using the average AUC, the AUPRC and the F1 score. The corresponding measures were reported for the testing split and also for the cross-validation setup. The input data were divided into a testing set (10%) and a development set (90%). The development set was further divided into a validation set that has the same size as the testing set and the remaining samples are reserved for training. For the cross-validation experiments, the development dataset was divided into five folds stratified by the label classes to account for the bias in the dataset. The external validation results are produced by a model that is trained on the main dataset and tested on two independent external validation datasets. To mitigate the bias issue in the main dataset, we trained two models on two balanced subsamples drawn from the main dataset. The prediction scores of the two models are averaged to produce the final predictions on the two external validation datasets. The implementation of the proposed system along with the reproducible results are available on GitHub (https://github.com/marakeby/pnet_prostate_paper).

Statistical analysis

The change in the area under the ROC curve between P-NET and other models is tested using DeLong test⁵². The P-values are corrected for multiple hypothesis testing using FDR. For other scores including AUPRC, accuracy, F1 and recall, bootstrapping statistical test with 2,000 sampling is used and the difference in score median was tested for significance. The resulting P value was corrected using the false-discovery rate (FDR) method. The AUC of five-fold cross-validation resulting from training and testing P-NET and dense models over multiple sample sizes is compared using a t-test of the means for the null hypothesis that two samples (P-NET scores and dense scores) have identical average (expected) values with the assumption that the populations have identical variances. The same test is applied to other scores including recall, precision, AUPRC, F1 and accuracy. For the survival analysis (Fig. 2d), a nonparametric log-rank test is used to compare estimates of the hazard functions of the two groups at each observed event time. A t-test of means is used to compare the reduction of prostate cancer cells proliferation in comparison to negative control in response to MDM4 depletion. Chi-squared test with Yates correction is used to compare the expected and observed frequencies of MDM4 high amplifications in two groups (patients with primary and metastatic tumours).

Analysis of a genome-scale ORF screen

A genome-scale ORF screen was previously performed in LNCaP cells⁴². In brief, cells were infected with a pooled ORF library, subject to puromycin selection to isolate cells containing the respective ORFs, and then seeded in low androgen medium (CSS) with enzalutamide. The relative effect of each ORF on cell proliferation was determined after 25 days in culture and is represented as Z-scores. Raw results of the ORF screen were obtained from the Hwang et al. source study. We postulated that amplified genes identified by P-NET regulate oncogenic functions in metastatic CRPC. To validate this hypothesis, we analysed this previously published genome-scale ORF screen performed in LNCaP cells, which identified genes that, when overexpressed, promoted resistance to the AR inhibitor, enzalutamide⁴² (Fig. 4b). LNCaP cells are dependent on AR and treatment with enzalutamide attenuates cell proliferation. On the basis of this analysis, MDM4 scored as a robust enzalutamide-resistant gene relative to other hits, including cell cycle regulators (CDK4 and CDK6) or those with roles in FGF signalling (FGFR2, FGFR3 and FGF6); these are two pathways implicated in driving resistance to anti-androgen therapies in clinical prostate cancers^27,53.

Sensitivity to RO-5963

LNCaP, LNCaP Abl, LNCaP 95, DU145, LAPC-4, LNCaP enzalutamide resistant, C4-2 and PC3 cells were seeded in 96-well plates at a density of 3,000 cells per well. After 24 h, cells were treated with increasing concentrations of RO-5963 for 4 days. Cell proliferation was determined using CellTiter-Glo assay. IC₅₀ values were determined using GraphPad Prism. Data are represented as the mean ± s.d. of three replicates. The experiment was repeated three times (raw data and analysis files in Supplementary Data 4). All cell lines tested negative for mycoplasma contamination. Authentication was performed using STR profiles and/or obtained directly from ATCC for all publicly available cell lines.

MDM4 gene-depletion experiments

Blasticidin-resistant Cas9-positive prostate cancer cells were cultured in 150 μg ml^–1 blasticidin (Thermo Fisher Scientific, NC9016621) for 72 h to enrich cells with optimal Cas9 activity. One million cells were seeded in parallel in 12-well plates and infected with lentiviruses expressing puromycin-resistant sgRNAs targeting MDM4 or GFP control. Cells were then subjected to puromycin selection for 3 days and then the cells were counted using a Vi-Cell and seeded for a proliferation assay. 7 days later, cells were counted again with a Vi-Cell to assess viability, representing a total of 12 days. The target sequence against GFP was CACCGGCCACAAGTTCAGCGTGTCG (sgGFP). The target sequences against MDM4 were AGATGTTGAACACTGAGCAG (sgMDM4-1) and CTCTCCTGGACAAATCAATC (sgMDM4-2).

Immunoblotting

Cells were lysed using 2× sample buffer (62.5 mM Tris pH 6.8, 2% SDS, 10% glycerol, Coomassie dye) and freshly added 4% β-mercaptoethanol. Lysed cells were scraped, transferred into a 1.5 ml microcentrifuge tube, sonicated for 15 s and boiled at 95 °C for 10 min.

Proteins were resolved in NuPAGE 4–12% Bis-Tris Protein gels (Thermo Fisher Scientific) and run with NuPAGE MOPS SDS Running Buffer (Thermo Fisher Scientific, NP0001). Proteins were transferred to nitrocellulose membranes using an iBlot apparatus (Thermo Fisher Scientific). Membranes were blocked in Odyssey Blocking Buffer (LI-COR Biosciences, 927-70010) for 1 h at room temperature, and membranes were then cut and incubated in primary antibodies diluted in Odyssey Blocking Buffer at 4 °C overnight. The following morning, membranes were washed with phosphate-buffered saline with 0.1% Tween (PBST) and incubated with fluorescent anti-rabbit or anti-mouse secondary antibodies at a dilution of 1:5,000 (Thermo Fisher Scientific, NC9401842 (rabbit) and NC0046410 (mouse)) for 1 h at room temperature. Membranes were again washed with PBST and then imaged using an Odyssey Imaging System (LI-COR Biosciences). Primary antibodies used include MDM4 (Abcam, ab16058) at a dilution of 1:500 and α-tubulin (Sigma, T9026) at a dilution of 1:1,000.

Gene depletion of MDM4 reduces prostate cancer cell viability

To determine how prostate cancer cells would respond to precision tools that target MDM4 at the gene level, we used CRISPR-Cas9 and two sgRNAs targeting distinct sequences of MDM4 in prostate cancer cell lines. Compared with a negative-control sgRNA (GFP), viability of 4 different prostate cancer cells was reduced by about 50–80% (Fig. 4c) in response to MDM4 depletion (Extended Data Fig. 9) after 12 days in culture. Altogether, we concluded that MDM4 regulates enzalutamide resistance, and that targeting MDM4 through either chemical or genetic approaches significantly attenuated the viability of prostate cancer cell lines. Our observations indicate that antagonizing MDM4 in metastatic CRPCs that harbour wild-type p53 is an attractive precision strategy. MDM4 antibodies (A300-287A) and (ab16058) were used together for immunoblotting experiments done in Extended Data Fig. 9.

Chemical inhibition of MDM4 reduces prostate cancer cell viability

Given the proposed role of MDM4 in driving enzalutamide resistance in prostate cancer cells, we sought to determine the response of prostate cancer cells to chemical inhibition of MDM4. We evaluated RO-5963, a small molecule MDM2/4 dual inhibitor with the greatest selectivity towards MDM4 in its class⁴³. This drug has previously demonstrated robust efficacy against MDM4 dependent cancer cell lines⁵⁴. We evaluated the effects of increasing concentrations of RO-5963 on prostate cancer cell proliferation.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Source link