May 4, 2024
Transient loss of Polycomb components induces an epigenetic cancer fate – Nature

Transient loss of Polycomb components induces an epigenetic cancer fate – Nature

Drosophila strains and genetics

Flies were raised on a standard cornmeal yeast extract medium at 25 °C unless otherwise indicated. Fly lines and crosses performed to deplete PRC1 subunits or to perform control experiments were generated from stocks provided by the Bloomington Drosophila Stock Center (BL) and the Vienna Drosophila Resource Center (VDRC), as indicated below for each experiment. The work with transgenic strains of Drosophila was performed under the ethical approval no. n6906C2 of the Ministère de l’Enseignement Supérieur, de la Recherche et de l’Innovation, issued on 8 April 2020.

For KD experiments of PRC1 subunits and generation of EICs, Gal80ts was used to control the temporal ph or Psc/Su(z)2 down-regulation by switching the temperature from 18 to 29 °C. KDs are generated in the larval EDs using the ey-FLP system. The rationale of the reversible KD system is the following: ph-RNAi, as well as the GFP marker, are under control of UAS sequences. Cells expressing ey-FLP (in pink in Fig. 1a) induce FLP-out of a transcriptional stop (located between two FRT sites and indicated in orange in Fig. 1a) in EDs, leading to expression of act-Gal4 (in light blue in Fig. 1a). tub-Gal80ts (in purple in Fig. 1a) encodes a ubiquitously expressed, temperature-sensitive Gal4 repressor. At restrictive temperature (29 °C), Gal80ts is inactivated. Gal4 activates UAS sequences, expressing ph-RNAi and GFP (as readout of ph-KD).

To perform KDs, flies were reared and crossed at 18 °C to inhibit Gal4 activity. A total of 80 virgin females were crossed with 20 males for each genotype and experiment. In all conditions (no, constant or transient KDs), flies were allowed to lay eggs at 18 °C for 4 h to synchronize embryonic and larval stages. As the timing of Drosophila development is temperature dependent, we adapted the timing for each KD condition to carry out phenotypic and molecular analyses at comparable developmental times. The genotypes of the flies on which we carried out the different KDs are listed below.

For ph-KD: ey-FLP, Act-gal4 (FRT.CD2 STOP) (BL#64095); TubGal80ts (BL#7019); UAS-ph-RNAi (VDRC#50028)/UAS-GFP (BL#64095).

For Psc-Su(z)2-KD: ey-FLP, Act-gal4 (FRT.CD2 STOP) (BL#64095); UAS-Psc-Su(z)2 RNAi (BL#38261, VDRC#100096); TubGal80ts (BL#7018)/UAS-GFP (BL#64095).

For control white-KD: ey-FLP, Act-gal4 (FRT.CD2 STOP) (BL#64095); TubGal80ts (BL#7019); UAS-w-RNAi (BL#33623)/UAS-GFP (BL#64095).

All dissections were performed on female larvae at the L3 stage. For the no ph-KD (no depletion), flies were kept at 18 °C throughout development and dissected 10 days AEL. For the constant ph-KD (constant depletion), flies were kept at 29 °C throughout development and dissected 5 days AEL. For the larval depletion (from L1 to L3) flies were kept at 18 °C for 48 h and shifted at 29 °C until dissection 5 days AEL. For the transient ph-KD at the L1 stage, flies were kept at 18 °C for 48 h, then shifted at 29 °C for 24 h and returned to 18 °C until dissection 9 or 11 days AEL. For the transient ph-KD at the L2 stage, flies were kept at 18 °C for 96 h, shifted at 29 °C for 24 h and returned to 18 °C until dissection 8 days AEL. For the transient ph-KD at the L3 early stage, flies were kept at 18 °C for 120 h, shifted at 29 °C for 24 h and returned to 18 °C until dissection 8 days AEL. For the transient ph-KD at the L3 late stage, flies were kept at 18 °C for 168 h, shifted at 29 °C for 24 h and returned to 18 °C until dissection 8 days AEL. For the transient Psc-Su(z)2-KD at the L1 stage, flies were kept at 18 °C for 48 h, shifted at 29 °C for 48 h and returned to 18 °C until dissection 8 days AEL. For all conditions, a minimum of three biological replicates was performed. For each replicate, 150 discs were scored in PH depletions and more than 30 discs were scored for PSC depletions. Constant and transient depletions of PH (PH-d and PH-p) or PSC-SU(Z)2 generated tumours in 100% of dissected tissues.

To assess viability, we measured adult hatching rate. For this purpose, after 4 h of egg laying, we applied the treatments described above to produce ph-KD at the desired times. The vials were maintained at 18 °C and the number of pupae was counted for each condition. The adult hatching rate was calculated by dividing the number of male and female adults hatched from pupae by the number of pupae.

For the zfh1-RNAi and Stat92E-RNAi rescue experiments under constant ph-KD, ey-FLP, Act5C-gal4 (FRT.CD2 STOP); + ; UAS-GFP (BL#64095) females were crossed with males of various genotypes. For negative control experiments, females were crossed with UAS-gfp-RNAi (BL#9331); UAS-w-RNAi (BL#33623) males. To confirm that the zfh1-RNAi and Stat92E-RNAi do not induce any significant change in the eye development we crossed female to UAS-zfh1-RNAi (VDRC#103205); UAS-w-RNAi (BL#33623) and UAS-Stat92E-RNAi (VDRC#43866); UAS-w-RNAi (BL#33623) males. Positive control experiments were conducted by crossing females with UAS-gfp-RNAi (BL#9331); UAS-ph-RNAi (VDRC#50028) males. For the rescue condition we crossed females to UAS-zfh1-RNAi (VDRC#103205); UAS-ph-RNAi (VDRC#50028) and UAS-Stat92E-RNAi (VDRC#43866); UAS-ph-RNAi (VDRC#50028) males. This systematic breeding strategy facilitated the investigation of the specific roles of zfh1 and Stat92E genes under constant ph-KD conditions.

Flies were reared and crossed at 18 °C and tumours were scored in the progeny reared at 18 °C. Note that in this genetic background there is no Gal80ts and therefore the KDs are obtained independently of the temperature. In the case of the ph-KD positive control, a tumour phenotype with 100% penetrance was observed in the progeny.

For the zfh1-RNAi and Stat92E-RNAi rescue experiments under transient ph-KD, ey-FLP, Act5C-gal4 (FRT.CD2 STOP) (BL#64095); + ; TubGal80ts (BL#7018)/TM6BTb females were crossed with males of various genotypes. For negative control experiments, females were crossed with UAS-gfp-RNAi (BL#9331); UAS-w-RNAi (BL#33623) males. To confirm that the zfh1-RNAi and Stat92E-RNAi do not induce any significant change in the eye development, we crossed female to UAS-zfh1-RNAi (VDRC#103205); UAS-w-RNAi (BL#33623) and UAS-Stat92E-RNAi (VDRC#43866); UAS-w-RNAi (BL#33623) males. Positive control experiments were conducted by crossing females with UAS-gfp-RNAi (BL#9331); UAS-ph-RNAi (VDRC#50028) males. For the rescue condition we crossed females to UAS-zfh1-RNAi (VDRC#103205); UAS-ph-RNAi (VDRC#50028) and UAS-Stat92E-RNAi (VDRC#43866); UAS-ph-RNAi (VDRC#50028) males. This systematic breeding strategy facilitated the investigation of the specific roles of the zfh1 and Stat92E genes under transient ph-KD conditions.

Flies were reared and crossed at 18 °C and flies were allowed to lay eggs overnight at 18 °C. For transient depletion, flies were kept at 18 °C for 48 h, then shifted at 29 °C for 24 h and returned to 18 °C until dissection 10 days AEL.

Allografts were performed according to the protocol described previously62. The following fly line was used: ey-FLP (BL#5580), Ubi-p63E(FRT.STOP)Stinger (BL#32249); Tub-Gal80ts (BL#7019); Act5C-Gal4(FRT.CD2), UAS-RFP (BL#30558)/UAS-ph-RNAi (VDRC#50028). Briefly, GFP-positive EDs from no-ph-KD, constant ph-KD or transient ph-KD L3 female larvae were dissected in PBS, cut into small pieces and injected into the abdomen of adult female hosts (BL#23650). The whole experiment was performed at 18 °C to avoid reactivation of ph-RNAi expression. To score tumour progression in allografts, flies were imaged every 2 days using Leica MZ FLIII to verify GFP as a readout of tumour growth. Tumours were dissected and re-injected when the host abdomen was fully GFP. Injected Drosophila pictures were taken using Ximea USB 3.1 Gen1 camera with a Sony CMOS-xiCAll sensor.

Immunostaining procedures

EDs from L3 female larvae were dissected at room temperature in 1× PBS and fixed in 4% formaldehyde for 20 min. Tissues were permeabilized for 1 h in 1× PBS + 0.5% Triton X-100 on a rotating wheel. Permeabilized tissues were blocked for 1 h in 3% BSA PBTr (1× PBS + 0.1% Triton X-100), and incubated O/N on a rotating wheel at 4 °C with primary antibodies diluted in PBTr + 1% BSA. For double-strand break staining, larvae were dissected at room temperature in 1× PBS, fixed in 4% paraformaldehyde for 30 min and primary antibodies were incubated for 2 h at room temperature. The following primary antibodies were used: goat anti-PH63 (1:500), mouse anti-ELAV (1:1,000, DSHB, catalogue no. 9F8A9), mouse anti-ABD-B (1:1,000, DSHS, catalogue no. 1A2E9), chicken anti-GFP (1:500, Invitrogen, catalogue no. A10262), rabbit anti-ZFH1 (ref. 49) (1:2,000) and rabbit anti-histone H2AvD pS137 (1:500, Rockland, catalogue no. 600-401-914). Then, samples were washed in PBTr three times before adding secondary antibodies in PBTr for 2 h at room temperature on a rotating wheel. The following secondary antibodies were used: donkey anti-goat Alexa Fluor 555 (1:1,000, Invitrogen, catalogue no. A-21432), donkey anti-mouse Alexa Fluor 647 (1:1,000, Invitrogen, catalogue no. A-31571), donkey anti-chicken (1:1,000, Clinisciences, catalogue no. 703-546-155), donkey anti-rabbit Alexa Fluor 555 (1:1,000, Invitrogen, catalogue no. A-31572), donkey anti-rabbit Alexa Fluor 488 (1:1,000, Invitrogen, catalogue no. A-21206). F-actin was stained by adding rhodamine phalloidin Alexa Fluor 555 (1:1,000, Invitrogen, catalogue no. R415) or Alexa Fluor 488 (1:1,000, Invitrogen, catalogue no. A12379). Tissues were washed three times in PBTr. DAPI (4,6-diamidino-2-phenylindole) staining was performed at a final concentration of 1 µg ml−1 for 15 min. Then discs were washed in PBTr and mounted in Vectashield medium (Eurobio Scientific, catalogue no. H-1000-10) or ProLong Gold antifade agent (Life Technologies, P36930). Image acquisition was performed using a Leica SP8-UV confocal microscope. ED areas were measured using Fiji64 by drawing contour lines around the DAPI-labelled tissue and measuring their surface. A minimum of 30 EDs was considered to measure average ED areas in each condition. Images for quantification of double-strand break foci were taken with a DeltaVision deconvolution microscope using a ×60 oil immersion objective and a CoolSNAP HQ2 camera. Images were processed using Deconvolution through SoftWoRx v.6.0. All experiments were performed in biological duplicates.

EdU staining

EdU experiments were performed using Click-iT Plus EdU Alexa fluor 555 Imaging kit (Invitrogen, catalogue no. C10638). The EDs of L3 female larvae were dissected at room temperature in Schneider medium. Then, EdU incorporation was performed for 15 min with 25 µM EdU solution on a rotating wheel at room temperature. After washing with PBS, tissues were fixed in 4% formaldehyde 30 min and washed three times with PBS. The imaginal discs were permeabilized for 1 h in 1× PBS + 0.5% Triton X-100 on a rotating wheel then blocked for 1 h in 1× PBS + 0.1% Triton X-100 + 3% BSA. EdU detection was performed according to the manufacturer’s instructions for 30 min on a rotating wheel at room temperature away from light. Next, 500 µl of Click-iT reaction cocktail were prepared per tube containing 20 EDs. After 1× PBS + 0.1% Triton wash DAPI staining was performed at a final concentration of 1 µg ml−1 for 15 min. Tissues were washed in 1× PBS + 0.1% Triton and discs were mounted in Vectashield medium. Image acquisition was performed using a Leica SP8-UV confocal microscope. Images of EdU stained EDs shown in Supplementary Videos were acquired using a Zeiss LSM980 Airyscan microscope in 4Y modality. Airyscan images of EdU stained EDs were processed with ZEN (v.3.6 Blue Edition, Zeiss) using default settings. Videos were created using Imaris (v.10.1, Oxford Instruments). All experiments were performed in biological duplicates.

Analysis of chromosomal abnormalities

Chromosome preparation and FISH were performed as previously described65,66. EDs from L3 stage larvae were dissected in 0.7% NaCl solution and incubated in Colchicine solution (3 ml of 0.7% NaCl + 100 µl of 10−3 M Colchicine) for 1 h at room temperature away from light. EDs were incubated in 0.5% sodium acetate for 7 min, followed by fixation (freshly prepared 2.5% PFA in 45% acetic acid) for 4 min on coverslip. EDs were pressed onto poly-lysine coated slides using manual force and snap frozen in liquid nitrogen. Slides were washed in 100% ethanol for 5 min, air dried and stained with fluorescence in situ hybridization (FISH) probes for AACAC, AATAT and 359 base pair (bp) repeats as previously described65. Probe sequences are: 5′-6-FAM-(AACAC)7, 5′-Cy3-TTTTCCAAATTTCGGTCATCAAATAATCAT and 5′-Cy5-(AATAT)6. FISH staining was used to help identify chromosomes in rearranged conditions. Microscopy acquisition was performed on a DeltaVision deconvolution microscope using a ×60 oil immersion objective and a CoolSNAP HQ2 camera. Images were processed for Deconvolution using SoftWoRx v.6.0.

Damage induction by X-ray exposure

L3 early-stage female larvae were transferred into a petri dish containing standard food medium, and were exposed to 5 Gy of X-rays using a Precision X-RAD iR160 irradiator. After irradiation, larval heads were dissected at indicated timepoints at room temperature in 1× PBS and fixed in 4% paraformaldehyde for 30 min before immunostaining. Microscopy and image analysis were performed as described above.

RT–qPCR experiments

L3 female larvae were dissected in Schneider medium on ice. Total RNA was extracted from EDs using TRIzol reagent. RNA purification was performed using the RNA Clean & Concentrator kit (Zymo Research, catalogue no. R1015). Reverse transcription was performed using Maxima First Strand complementary DNA synthesis kit (Invitrogen, catalogue no. K1642). Quantitative PCR (qPCR) was performed using LightCycler 480 SYBR Green I Master Mix (Roche, catalogue no. 04707516001). qPCR with reverse transcription (RT–qPCR) experiments were analysed using LightCycler and GraphPad Prism software. All experiments were performed in biological triplicates.

RNA-seq experiments

L3 female larvae were dissected in Schneider medium on ice. Total RNA was extracted from EDs using TRIzol reagent. RNA purification was performed using the RNA Clean & Concentrator kit (Zymo Research, catalogue no. R1015). Finally, poly-A RNA selection, library preparation and Illumina sequencing (20 M paired-end reads, 150 nt) were performed by Novogene (https://en.novogene.com/). All experiments were performed in triplicates.

gDNA sequencing

gDNA was isolated using QIAamp DNA Micro Kit (Qiagen) following the manufacturer’s instructions. For each biological replicate, roughly 70 EDs from wandering female larvae were dissected. In total, we sequenced four biological replicates for control samples (no ph-KD condition, that is, larvae of the crosses used for transient depletion that were reared at constant permissive temperature of 18 °C). Furthermore, 12 tumour samples were sequenced, that is, two biological replicates for six different depletion conditions as follows: (1) constant ph-KD; (2) transient ph-KD d9; (3) transient ph-KD d11; (4) early L3 ph-KD, 24 h recovery; (5) early L3 ph-KD, 96 h recovery and (6) early L3 ph-KD, 144 h recovery. All these conditions result in tumour formation. The gDNAs of all samples were processed for library preparation by Novogene (https://en.novogene.com/). Briefly, gDNA was fragmented to an average size of roughly 350 bp and then processed for DNA library preparation according to the manufacturer’s (Illumina) paired-end protocols. Sequencing was performed using the Illumina Novaseq 6000 platform to generate 150 bp paired-end reads with a coverage of at least ten times for 99% of the genome.

Western blot

Roughly 150 EDs were dissected in Schneider medium on ice per replicate. To collect sufficient material, EDs were dissected in batches, snap frozen in liquid nitrogen and stored at −80 °C. Discs were homogenized with a Tenbroeck directly in radioimmunoprecipitation assay lysis buffer (50 mM Tris pH 7.5, 150 mM NaCl, 1% NP40, 0.5% Na-deoxycholate, 0.1% SDS, 2× protease inhibitor) and incubated on ice for 10 min. If necessary, a second round of mechanical dissociation was performed. Samples were centrifuged for 10 min at 10,000g at 4 °C and the supernatant was transferred to a fresh tube. Proteins were quantified using BCA protein assay and 10 µg were used per gel lane, before 40 min of migration at 200 V in MES 20× migration buffer and 1 h of transfer (1 A). Membranes were blocked for 1 h in PBS + 0.2% Tween + 10% milk powder at room temperature, incubated O/N with primary antibodies in PBS + 0.2% Tween at 4 °C on a shaker and washed in PBS + 0.2% Tween. The following primary antibodies were used: rabbit anti-PH (1:200), rabbit anti-zfh1 (ref. 49) (1:2,000), mouse anti-beta tubulin (1:5,000, DSHB, catalogue no. AA12.1). HRP-conjugated secondary antibodies were incubated with the membrane for 2 h at room temperature. The following secondary antibodies were used: goat antirabbit (1:15,000, Sigma, catalogue no. A0545), rabbit antimouse (1:15,000, Sigma, catalogue no. A9044). Membranes were washed in PBS + 0.2% Tween and revealed using Super Signal West Dura kit (Pierce) and Chemidoc Bio-Rad. Western blots were analysed using ImageLab software v.6.1 from Bio-Rad. The full-size raw blot images are provided in the Supplementary Fig. 1.

ChIP–seq experiments

ChIP–seq on L3 EDs were performed as described previously41, with minor modifications, and 400 EDs were used per replicate. If necessary, several dissection and/or collection batches were frozen in liquid nitrogen and stored at −80 °C to collect sufficient material. Chromatin was sonicated using a Bioruptor Pico (Diagenode) for 10 min (30 s on, 30 s off). PH antibodies67 were diluted 1:100 for immunoprecipitation. After decrosslinking, DNA was purified using MicroChIP DiaPure columns from Diagenode. DNA libraries for sequencing were prepared using the NEBNext Ultra II DNA Library Prep Kit for Illumina. Sequencing (paired-end sequencing 150 bp, roughly 4 Gb per sample) was performed by Novogene (https://en.novogene.com/). All experiments were performed in biological duplicates.

CUT&RUN experiments

CUT&RUN experiments were performed as described by Kami Ahmad in protocols.io (https://doi.org/10.17504/protocols.io.umfeu3n) with minor modifications. We dissected 50 EDs in Schneider medium, centrifuged them for 3 min at 700g and washed them twice with wash+ buffer before adding concanavalin A-coated beads. MNase digestion (pAG-MNase Enzyme from Cell Signaling) was performed for 30 min on ice. After ProteinaseK digestion, DNA was recovered using SPRIselect beads and eluted in 50 μl of Tris-EDTA. DNA libraries for sequencing were prepared using the NEBNext Ultra II DNA Library Prep Kit for Illumina. Sequencing (paired-end sequencing 150 bp, roughly 2 Gb per sample) was performed by Novogene (https://en.novogene.com/). The following antibodies were used: H3K27me3 (1:100, Active Motif, catalogue no. 39155), H3K27Ac (1:100, Active Motif, catalogue no. 39133), H2AK118Ub (1:100, Cell Signaling, catalogue no. 8240). All experiments were performed in biological duplicates.

ATAC-Seq experiments

ATAC-Seq experiments were performed using the ATAC-Seq kit from Diagenode (catalogue no. C01080002). Ten EDs were used as starting material for each replicate and condition. Tagmentated DNA was amplified by PCR using 13 cycles and the purified DNA libraries were sequenced (paired-end sequencing 150 bp, roughly 2 Gb per sample) by Novogene (https://en.novogene.com/). All experiments were performed in biological duplicates.

Statistics and reproducibility

ChIP–seq, CUT&RUN and ATAC-Seq were performed in duplicates, following Encode’s standards (https://www.encodeproject.org/chip-seq/transcription_factor/#standards; https://www.encodeproject.org/atac-seq/#standards). RNA-seq were performed in triplicates, following Encode’s recommendations (https://www.encodeproject.org/data-standards/rna-seq/long-rnas/).

In general, immunostaining experiments were performed in biological duplicates. Each biological replicate was obtained from independent genetic crosses. The only exception was the phospho-H2AV staining shown in Fig. 1j and Extended Data Fig. 2c, which was performed once, but scoring tissues that came from six independent genetic crosses. For sample sizes of immunostaining experiments, see the sheet named ‘All IF sample numbers’ in Supplementary Table 6. For transcriptomic, RT–qPCR and western blot analysis, experiments were performed in biological triplicates. ATAC-Seq, CUT&RUN, ChIP–seq and immunostaining experiments were performed in biological duplicates. Each biological replicate was obtained from independent genetic crosses.

For experiments presented in Figs. 1 and 5, as well as Extended Data Figs. 1, 2, 3, 5 and 6, involving genetic crosses with different lines and in different conditions, followed by tissue area measurements and immunofluorescence, two independent biological replicates were performed with similar results. Measured areas and the number of tissues analysed in imaging are reported in Supplementary Table 6.

Allograft experiments were performed in two independent biological replicates. In the first replicate, one starting tumour obtained on constant PH depletion and one tumour obtained from transient PH depletion were used. In the second replicate, two constant PH depletion and two transient PH depletion tumours were injected. Results were similar for both replicates. The total number of injected host flies is reported in the graphs of the Extended Data Fig. 7b,c.

Bioinformatic analyses on Drosophila datasets

All in-house bioinformatic analyses were performed in R v.3.6.3 (https://www.R-project.org/). Computations on genomic coordinate files and downstream computations were conducted using the data.table R package (data.table: Extension of ‘data.frame’. https://r-datatable.com, https://Rdatatable.gitlab.io/data.table, https://github.com/Rdatatable/data.table, v.1.14.2). In all relevant panels of figures and Extended Data figures, box plots depict the median (line), upper and lower quartiles (box) ±1.5× interquartile range (whiskers) and outliers are not shown. For each relevant panel, the statistical test that was used is specified in the caption: NS denotes not significant (P > 0.05), *P < 5 × 10−2, **P < 1 × 10−2, ***P < 1 × 10−3, ****P < 1 × 10−5.

gDNA processing and mapping of somatic variants

gDNA variant calling was performed by Novogene (https://en.novogene.com/). Briefly, base calling was performed using Illumina pipeline CASAVA v.1.8.2, and subjected to quality control using fastp with the following parameters: -g -q 5 -u 50 -n 15 -l 150 –min_trim_length 10 –overlap_diff_limit 1–overlap_diff_percent_limit 10. Then, sequencing reads were aligned to the dm6 version of the Drosophila genome using Burrows–Wheeler aligner with default parameters and duplicate reads were removed using samtools and PICARD (http://picard.sourceforge.net). Raw SNP and InDel sets were called using GATK with the following parameters: –gcpHMM 10 -stand_emit_conf 10 -stand_call_conf 30. Then, SNPs were filtered using the following criteria: SNP QD < 2, FS > 60, MQ < 30, HaplotypeScore > 13, MappingQualityRankSum < −12.5, ReadPosRankSum < −8. For INDEL variants, the following criteria were used: QD < 2, FS > 200, ReadPosRankSum < −20. UCSC known genes were used for gene and region annotations. Finally, the variants were compared to a batch-matched control sample (no ph-KD), in the search for bona fide SNVs and InDels using the MuTect2 module of the GATK package. Only SNVs and InDels variants that passed Mutect2 filtering (FILTER = “PASS”) were considered for downstream analyses. Structural variants and CNVs were detected using breakdancer (https://github.com/genome/breakdancer) and CNVnator (https://github.com/abyzovlab/CNVnator) software packages, respectively.

Then, called variants were imported in R for downstream analyses. When looking at the fraction of tumour samples that contained a given alteration (Fig. 1h), we only retained SNVs or InDels with an allelic fraction greater than 0.2, structural variants that were supported by at least five reads and CNVs with an allelic fraction bigger than 1.5 (duplication) or smaller than 0.66 (deletion).

RNA-seq processing and differential analysis

After initial quality checks of the newly generated data using fastqc (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), the paired-end reads were aligned to a custom index consisting of the dm6 version of the Drosophila genome together with GFP, EGFP and mRFP1 sequences, using the align function from the Rsubread R package68 (v.2.0.1) with the following parameters: maxMismatches = 6, unique = TRUE. Next, aligned reads were counted for each D. melanogaster transcript (dmel_r6.36 annotation) using the featureCounts function from the Rsubread R package (v.2.0.1, isPairedEnd = TRUE) and differential expression analysis was performed using the DESeq2 R package69 (v.1.26.0, design = ~replicate + condition). The tables corresponding to the different comparisons are available in Supplementary Tables 1, 4 and 5.

For the differential analysis of the transcriptomes after no ph-KD (control), constant and transient ph-KD, each ph-RNAi sample was compared to temperature-matched w-RNAi controls (Fig. 2a and Extended Data Fig. 8b). DESeq2 outputs are available in Supplementary Tables 1 and 5. For the differential analysis of the transcriptomes after zfh1+w-KD, Stat92E+w-KD, gfp+ph-KD, zfh1+ph-KD and Stat92E+ph-KD, all were compared to temperature-matched gfp+w-KD (Supplementary Table 4).

Clustering of differentially expressed genes

For the clustering, we selected the genes that were differentially expressed (Padj < 0.05 and |log2fold2 fold change | >1) after constant or transient ph-KD (d9 or d11 AEL). In addition, we only considered the genes that did not show significant changes after no ph-KD (control). Then, log2 fold change values were clipped at the 5th and 95th percentiles and clustered using the supersom function from the kohonen R package70 (v.3.0.10). As day 9 and day 11 transient ph-KD yielded substantially similar transcriptomes, a two-layer self-organizing map was trained (layer 1, constant ph-KD; layer 2, D9 and D11 transient ph-KD) with similar weights for the two layers, using a 3 × 2 grid (topology = hexagonal, toroidal = TRUE). Clustering output is available in Supplementary Table 2.

CUT&RUN, ChIP–seq and ATAC-Seq processing, peak calling and differential analysis

After initial quality checks of the newly generated data using fastqc, the reads were aligned to the dm6 version of the Drosophila genome using bowtie 2 (ref. 71, v.2.3.5.1) with the following parameters: –local –very-sensitive-local –no-unal –no-mixed –no-discordant –phred33 -I 10 -X 700, and low mapping quality reads were discarded using samtools72 (-q 30, v.1.10, using htslib v.1.10.2-3).

PH, H3K27me3, H3K27Ac, H2AK118Ub and ATAC-Seq peaks and/or domains were called for each replicate separately and on merged reads using macs2 (ref. 73, v.2.2.7.1) with the following parameters: –keep-dup 1 -g dm -f BAMPE -B –SPMR. For PH ChIP–seq, the input sample was used as control. For H3K27me3, H3K27Ac and H2AK118Ub CUT&RUN, the IgG sample was used as control. Only peaks detected in both replicates (enrichment greater than 0 AND q value less than 0.05) and using merged replicates (enrichment greater than 2 AND q < 0.01) were retained for further analyses, after being merged with a minimum gap size of 250 bp for narrow peaks (PH, H3K27Ac and ATAC-Seq) and 2.5 kb for broad marks (H3K27me3 and H2AK118Ub). The macs2 bedgraph files were used for visualization purposes.

For the differential analysis of H3K27me3, H3K27Ac, H2AK118Ub CUT&RUN and ATAC-Seq, peaks and/or domains were first merged across all conditions (maximum gap of 250 bp for H3K27Ac and ATAC-Seq peaks; 2.5 kb for H3K27me3 and H2AK118Ub domains) and overlapping reads were counted using the featureCounts function from the Rsubread R package (v.2.0.1, isPairedEnd = TRUE). Differential analysis was then performed using the DESeq2 R package (v.1.26.0, size factors, total number of aligned reads; design, ~replicate + condition). The same procedure was used for the differential analysis of ATAC-Seq peaks between zfh1+ph-KD and gfp+ph-KD.

Clustering of differentially accessible ATAC-Seq peaks

For the clustering of ATAC-Seq peaks, we only considered the peaks showing a significant difference (Padj < 1 × 10−3 and |log2 fold change| >1) after constant or transient ph-KD (day 11 AEL) and with a minimum log10 base mean of 1.25 to avoid noisy peaks. The log2 fold change values were clipped at the 5th and 95th percentiles and clustered using the supersom function from the kohonen R package70 (v.3.0.10) using a four-layer self-organizing map (layer 1, log2fold change constant ph-KD; layer 2, log2fold change transient ph-KD; layer 3, Padj constant ph-KD; layer 4, Padj transient ph-KD) with similar weights for the four layers, using a 1 × 3 grid (topology = hexagonal, toroidal = TRUE). Full clustering output is available in Supplementary Table 3.

Classification of PcG target genes and peaks-to-gene assignment

To define PcG target genes, we defined a clean set of H3K27me3 domains in the control (no ph-KD) condition by removing artefactual splits due to sequencing gaps (github), resulting in 241 domains. Then, only the genes for which at least 50% of the gene body was overlapping with a H3K27me3 domain were considered as direct PcG target. When relevant, only irreversible, reversible and unaffected genes that were direct PcG targets when considered (Fig. 3). PcG target gene assignment is available in Supplementary Table 2 (PcG_bound and class columns).

To assess whether a gene was overlapping a H3K27me3 domain or a H3K27Ac peak in a given condition, we used different criteria. For H3K27me3 (Fig. 3b), only the genes for which at least 50% of the gene body was overlapping a confident H3K27me3 domain (‘CUT&RUN, ChIP–seq and ATAC-Seq processing, peak calling and differential analysis’ section above) were considered as hits. For H3K27Ac (Fig. 3c), only the genes containing a confident peak (‘CUT&RUN, ChIP–seq and ATAC-Seq processing, peak calling and differential analysis’ section above) in the gene body or up to 2.5 kb upstream of the TSS were considered as hits.

To assign PH peaks (Extended Data Fig. 6e) or ATAC-Seq peaks (Fig. 4b), peaks were assigned to the closest TSS with a maximum genomic separation of 25 kb (peaks that were located further away were not considered).

Gene Ontology terms enrichment

Gene Ontology terms associated with the genes of interest and a background set of genes, consisting of all the genes that passed DESeq2 initial filters, were retrieved using the AnnnotationDbi R package (https://bioconductor.org/packages/AnnotationDbi.html, v.1.48.0). For each Gene Ontology term, over-representation was assessed using a one-sided Fisher’s exact test (alternative = ‘greater’). Obtained P values were corrected for multiple testing using false discovery rate (FDR).

Motif enrichment

To search for DNA binding motifs enriched at each ATAC-Seq cluster, we used the centre of corresponding peaks ±250 bp (500 bp total). Resulting regions were analysed with the i-cisTarget online tool74, using v.6.0 of the position weight matrix database (consisting of 24,453 position weight matrics). Only top scoring motifs with a normalized enrichment score greater than 5.5 and a rank less than 50 were considered (Fig. 4e).

To search for motifs associated with increased or decreased accessibility after constant or transient ph-KD, we used a collection of non-redundant transcription factor motifs75 and counted their occurrences across all ATAC-Seq peaks ±250 bp, using the matchMotifs function from the motifmatch R package (v.1.18.0; https://doi.org/10.18129/B9.bioc.motifmatchr) with the following parameters: Pcutoff = 5 × 10−4, bg =  ‘genome’, genome = ‘dm6’. Of note, only motifs associated with a Drosophila transcription factor gene that passed initial DESeq2 initial filters were considered. Then, we fitted two LASSO regressions using the cv.glmnet and the glmnet functions from the glmnet package in R (v.4.1.4), with the following parameter: lambdas = 10seq(2, −3, by = −0.1), standardize, TRUE; nfolds, 5), aiming at predicting log2 fold changes after constant or transient ph-KD. The top 25 motifs with the strongest |s0| coefficients in any of the two models were used to train two linear models to predict log2 fold changes after transient or constant ph-KD. Only the motifs with a significant coefficient in at least one of the two linear models (P < 1 × 10−5) were considered (Fig. 4f).

Analysis of human solid tumours

The differential gene expression analysis was carried out by using a Mann–Whitney test and the TNMplot database, which contains transcriptome-level RNA-seq data for different tumour samples from The Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) repositories76.

The survival analysis was carried out using the Pan-Cancer (Bladder, Lung adenocarcinoma and Rectum adenocarcinoma) or gene array (Breast, Ovarian and Prostate) datasets77,78 of the online tool www.kmplot.com (accessed on 22 December 2022). The Pan-Cancer dataset is based on TCGA data generated using the Illumina HiSeq 2000 platform with survival information derived from the published sources79. The gene-array samples were obtained using Affymetrix HGU133A and HGU133plus2 gene chips. The samples were MAS5 normalized and the mean expression in each sample was scaled to 1,000. The most reliable probe sets to represent single genes were identified usNAiing JetSet80.

In the survival analysis, each cut-off value between the lower and upper quartiles of expression was analysed by Cox proportional hazards regression and FDR was computed to correct for multiple hypothesis testing. Then, the best performing cut-off was used when drawing the Kaplan–Meier survival plots that were generated to visualize the survival differences. Hazard rates with 95% confidence intervals were computed to numerically assess the survival time difference between the two cohorts. The statistical analysis was performed in the R statistical environment (www.r-project.org). The analysis results for single genes can be validated using the platforms at www.kmplot.com and www.tnmplot.com.

Analysis of cohorts of patients with multiple myeloma

For gene expression profiling data from patients with multiple myeloma, we used six cohorts that included Affymetrix gene expression data (HGU133plus2) of purified multiple myeloma cells from the TT2 (ref. 81) (Gene Expression Omnibus, accession number GSE2658), TT3 (ref. 82) (accession number E-TABM-1138 accession number GSE4583) and Hovon83 (accession number GSE19784) cohorts (345, 158 and 282 newly diagnosed patients with multiple myeloma who were treated with high-dose melphalan and autologous haematopoietic stem cell transplantation); the Mulligan cohort84 (188 patients at relapse treated by proteasome inhibitor in monotherapy); the Mtp cohort non-eligible for HDT85 (63 newly diagnosed patients with multiple myeloma who were not eligible for high-dose melphalan and autologous haematopoietic stem cell transplantation) and the Mtp Dara cohort85,86 (51 patients at relapse treated by anti-CD38 monoclonal antibody (Daratumumab)). Gene expression data were normalized with the MAS5 algorithm and processing of the data was performed using the webtool genomicscape (http://www.genomicscape.com), as done previously87,88, using the R environment (www.r-project.org). The prognostic values of PHC1, PHC2, PHC3, CBX2, CBX7 and BMI1 gene expression was investigated using the Maxstat R function and Kaplan–Meier survival curves as previously described89. The differential gene expression analysis between normal bone marrow plasma cells from healthy donors and multiple myeloma cells from patients was carried out by using the Mann–Whitney test. The prognostic value of PHC1, PHC2, PHC3, CBX2, CBX7 and BMI1 genes was combined using our previously published methodology89 (sum of the Cox b coefficients of each of the six genes, weighted by ±1 if the patient’s multiple myeloma cell signal for a given gene is above or below the probe set Maxstat value of the gene). Clustering was performed using the Morpheus software (https://software.broadinstitute.org/morpheus) and violin plots using GraphPad Prism software (http://www.graphpad.com/scientific-software/prism/).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link