Nonlinear control of transcription through enhancer–promoter interactions - Nature - Alert Breaking News

Culture of embryonic stem cells

All cell lines are based on E14 mES cells, provided by E. Heard’s laboratory. Cells were cultured on gelatin-coated culture plates in Glasgow minimum essential medium (Sigma-Aldrich, G5154) supplemented with 15% fetal calf serum (Eurobio Abcys), 1% l-glutamine (Thermo Fisher Scientific, 25030024), 1% sodium pyruvate MEM (Thermo Fisher Scientific, 11360039), 1% MEM non-essential amino acids (Thermo Fisher Scientific, 11140035) 100 µM β-mercaptoethanol, 20 U ml⁻¹ leukaemia inhibitory factor (Miltenyi Biotec, premium grade) in 8% CO₂ at 37 °C. Cells were tested for mycoplasma contamination once a month and no contamination was detected. After piggyBac-enhancer transposition, cells were cultured in standard E14 medium supplemented with 2i (1 µM MEK inhibitor PDO35901 (Axon, 1408) and 3 µM GSK3 inhibitor CHIR 99021 (Axon, 1386)).

Generation of enhancer–promoter piggyBac targeting vectors

Homology arms necessary for the knock-in, the Sox2 promoter, the SCR and the truncated version of the SCR (Ei) were amplified from E14 mES cell genomic DNA by Phusion High-Fidelity DNA Polymerase (Thermo Fisher Scientific, F549) using primers compatible with Gibson assembly cloning (NEB, E2611). The targeting vector was generated starting from the 3-SB-EF1-PBBAR-SB plasmid⁵⁰, gifted by Rob Mitra. To clone homology arms into the vector, BspEI and BclI restriction sites were introduced using Q5 Site-Directed Mutagenesis Kit (NEB, E0554). The left homology arm was cloned using Gibson assembly strategy by linearizing the vector with BspEI (NEB, R0540). The right homology arm was cloned using Gibson assembly strategy by linearizing the vector with BclI (NEB, R0160). The Sox2 promoter was cloned by first removing the Ef1a promoter from the 3-SB-EF1-PBBAR-SB vector using NdeI (NEB, R0111) and SalI (NEB, R0138) and subsequently using Gibson assembly strategy. The SCR and its truncated version (truncated SCR or Ei) were cloned between the piggyBac transposon-specific inverted terminal repeat sequences (ITR) by linearizing the vector with BamHI (NEB, R3136) and NheI (NEB, R3131). A transcriptional pause sequence from the human alpha2 globin gene and an SV40 poly(A) sequence were inserted at both 5′ and 3′ ends of the enhancers using Gibson assembly strategy. A selection cassette carrying the puromycin resistance gene driven by the PGK promoter and flanked by FRT sites was cloned in front of the Sox2 promoter by linearizing the piggyBac vector with the AsiSI (NEB, R0630) restriction enzyme. A list of the primers used for cloning is provided in Supplementary Table 1.

Generation of founder mES cell lines carrying the piggyBac transgene

The gRNA sequence for the knock-in of the piggyBac transgene on chromosome 15 was designed using the online tool (https://eu.idtdna.com/site/order/designtool/index/CRISPR_SEQUENCE) and purchased from Microsynth AG. gRNA sequence was cloned into the PX459 plasmid (Addgene) using the BsaI restriction site. E14 mES cell founder lines carrying the piggyBac transgene were generated using nucleofection with the Amaxa 4D-Nucleofector X-Unit and the P3 Primary Cell 4D-Nucleofector X Kit (Lonza, V4XP-3024 KT). Cells (2 × 10⁶) were collected with accutase (Sigma-Aldrich, A6964) and resuspended in 100 µl transfection solution (82 µl primary solution, 18 μl supplement, 1 μg piggyBac targeting vector carrying the SCR, truncated SCR or promoter alone, and 1 µg of PX459 ch15_gRNA/Cas9) and transferred into a single Nucleocuvette (Lonza). Nucleofection was performed using the protocol CG110. Transfected cells were directly seeded in prewarmed 37 °C culture in E14 standard medium. Then, 24 h after transfection, 1 µg ml⁻¹ of puromycin (InvivoGen, ant-pr-1) was added to the medium for 3 days to select cells transfected with PX459 gRNA/Cas9 vector. Cells were then cultured in standard E14 medium for an additional 4 days. To select cells with insertion of the piggyBac targeting vector, a second pulse of puromycin was carried out by culturing cells in standard medium supplemented with 1 µg ml⁻¹ of puromycin. After 3 days of selection, single cells were isolated by fluorescence-activated cell sorting (FACS) on 96-well plates. Sorted cells were kept for 2 days in standard E14 medium supplemented with 100 μg µl⁻¹ primocin (InvivoGen, ant-pm-1) and 10 µM ROCK inhibitor (STEMCELL Technologies, Y-27632). Cells were then cultured in standard E14 medium with 1 µg ml⁻¹ of puromycin. Genomic DNA was extracted by lysing cells with lysis buffer (100 mM Tris-HCl pH 8.0, 5 mM EDTA, 0.2% SDS, 50 mM NaCl, proteinase K and RNase) and subsequent isopropanol precipitation. Individual cell lines were analysed by genotyping PCR to determine heterozygous insertion of the piggyBac donor vector. Cell lines showing the corrected genotyping pattern were selected and expanded. A list of the primers used for genotyping is provided in Supplementary Table 1.

Puromycin resistance cassette removal

Cells (1 × 10⁶) were transfected with 2 µg of a pCAG-FlpO-P2A-HygroR plasmid encoding for the flippase (Flp) recombinase using Lipofectamine 3000 (Thermo Fisher Scientific, L3000008) according to the manufacturer’s instructions. Transfected cells were cultured in standard E14 medium for 7 days. Single cells were then isolated using FACS on 96-well plates. Genomic DNA was extracted by lysing cells with lysis buffer (100 mM Tris-HCl pH 8.0, 5 mM EDTA, 0.2% SDS, 50 mM NaCl, proteinase K and RNase) and subsequent isopropanol precipitation. Individual cell lines were analysed by genotyping PCR to verify the deletion of the puromycin resistance cassette. A list of the primers used for genotyping is provided in Supplementary Table 1. Cell lines showing the correct genotyping pattern were selected and expanded. Selected cell lines were processed for targeted Nanopore sequencing with Cas9-guided adapter ligation (nCATS)⁵¹ and only the ones showing unique integration of the piggyBac donor vector were used as founder lines for the enhancer mobilization experiments.

Mobilization of the piggyBac-enhancer cassette

A mouse codon-optimized version of the piggyBac transposase (PBase) was cloned in frame with the red fluorescent protein tagRFPt (Evrogen) into a pBroad3 vector (pBroad3_hyPBase_IRES_tagRFPt) using Gibson assembly cloning (NEB, E2611). Cells (2 × 10⁵) were transfected with 0.5 µg of pBroad3_hyPBase_IRES_tagRFPt using Lipofectamine 3000 (Thermo Fisher Scientific, L3000008) according to the manufacturer’s instructions. To increase the probability of enhancer transposition, typically 12 independent PBase transfections were performed at the same time in 24-well plates. Transfection efficiency as well as expression levels of hyPBase_IRES_tagRFPt transposase within the cell population were monitored by flow cytometry analysis. Then, 7 days after transfection with PBase, individual eGFP⁺ cell lines were isolated using FACS in 96-well plates. Sorted cells were kept for 2 days in standard E14 medium supplemented with 100 μg ml⁻¹ primocin (InvivoGen, ant-pm-1) and 10 µM ROCK inhibitor (STEMCELL Technologies, Y-27632). Cells were cultured in E14 standard medium for additional 7 days and triplicated for genomic DNA extraction, flow cytometry analysis and freezing.

Sample preparation for mapping piggyBac-enhancer insertion sites in individual cell lines

Mapping of enhancer insertion sites in individual cell lines was performed using splinkerette PCR. The protocol was performed as described previously⁵² with a small number of modifications. Genomic DNA from individual eGFP⁺ cell lines was extracted from 96-well plates using the Quick-DNA Universal 96 Kit (Zymo Research, D4071) according to the manufacturer’s instructions. Purified genomic DNA was digested by 0.5 µl of Bsp143I restriction enzyme (Thermo Fisher Scientific, FD0784) for 15 min at 37 °C followed by a heat-inactivation step at 65 °C for 20 min. Long (HMSpAa) and short (HMSpBb) splinkerette adapters were first resuspended with 5× NEBuffer 2 (NEB, B7002) to reach a concentration of 50 µM. Then, 50 µl of HMSpA adapter was mixed with 50 µl of HMSpBb adapter (Aa+Bb) to reach a concentration of 25 µM. The adapter mix was denatured and annealed by heating it to 95 °C for 5 min and then cooling to room temperature. Then, 25 pmol of annealed splinkerette adapters was ligated to the digested genomic DNA using 5 U of T4 DNA ligase (Thermo Fisher Scientific, EL0011) and incubating the samples for 1 h at 22 °C followed by a heat-inactivation step at 65 °C for 10 min. For splinkerette amplifications, PCR 1 was performed combining 2 µl of the splinkerette sample, 1 U of Platinum Taq polymerase (Thermo Fisher Scientific, 10966034), 0.1 µM of HMSp1 and 0.1 µM of PB5-1 (or PB3-1) primer, and splinkerette PCR 2 was performed using 2 µl of PCR 1, 1 U of Platinum Taq polymerase (Thermo Fisher Scientific, 10966034), 0.1 µM of HMSp2 and 0.1 µM of PB5-5 (or PB3-2) primer. The quality of PCR amplification was checked by agarose gel electrophoresis. Samples were sent for Sanger Sequencing (Microsynth AG) using the PB5-2 (or PB3-2) primer. A list of the primers used for splinkerette PCRs and sequencing is provided in Supplementary Table 1. Mapping of enhancer insertion sites in individual cell lines was performed as described in the ‘Mapping of piggyBac-enhancer insertion sites in individual cell lines’ section.

Flow cytometry eGFP fluorescence intensity measurements and analysis

eGFP⁺ cell lines were cultured in serum + 2i medium for 2 weeks before flow cytometry measurements. eGFP levels of individual cell lines were measured on the BD LSRII SORP flow cytometer using BD High Throughput Sampler (HTS), which enabled sample acquisition in 96-well plate format. Measurements were repeated three times for each clone. Mean eGFP fluorescence intensities were calculated for each clone using FlowJo and all three replicates were averaged.

Normalization of mean eGFP fluorescence intensities

Mean eGFP fluorescence levels of each cell line measured in flow cytometry were first corrected by subtracting the mean eGFP fluorescence intensities measured in wild-type E14 mES cells cultured in the same 96-well plate. The resulting mean intensities were then normalized by dividing them by the average mean intensities of all cell lines where the SCR was located within a 40 kb window centred at the promoter location, and multiplied by a common factor.

Sample preparation for high-throughput sequencing of piggyBac-enhancer insertion sites

Cells (5 × 10⁵) were transfected with 2 µg of PBase using Lipofectamine 3000 (Thermo Fisher Scientific, L3000008) according to the manufacturer’s instructions. Transfection efficiency as well as expression levels of PBase within the cell population were monitored by flow cytometry analysis. Then, 5 days after transfection with PBase, genomic DNA was purified using the DNeasy Blood & Tissue Kit (Qiagen, 69504). To reduce the contribution from cells in which excision of piggyBac-enhancer did not occur, we depleted eGFP sequences using an in vitro Cas9 digestion strategy. gRNA sequences for eGFP depletion were designed using the online tool (https://eu.idtdna.com/site/order/designtool/index/CRISPR_SEQUENCE) (Supplementary Table 1). Custom-designed Alt-R CRISPR-Cas9 crRNAs containing the gRNA sequences targeting eGFP (gRNA_1_3PRIME and gRNA_2_3PRIME), Alt-R CRISPR-Cas9 tracrRNA (IDT, 1072532) and Alt-R Streptococcus pyogenes Cas9 enzyme (IDT, 1081060) were purchased from IDT. In vitro cleavage of the eGFP fragment by Cas9 was performed according to the IDT protocol ‘In vitro cleavage of target DNA with ribonucleoprotein complex’. In brief, 100 μM of Alt-R CRISPR–Cas9 crRNA and 100 μM of Alt-R CRISPR–Cas9 tracrRNA were assembled by heating the duplex at 95 °C for 5 min and allowing to cool to room temperature (15–25 °C). To assemble the RNP complex, 10 μM of Alt-R guide RNA (crRNA:tracrRNA) and 10 μM of Alt-R SpCas9 enzyme were incubated at room temperature for 45 min. To perform in vitro digestion of eGFP, 300 ng of genomic DNA extracted from the pool cells transfected with the PBase was incubated for 2 h with 1 μM Cas9/RNP. After the digestion, 40 µg of proteinase K was added and the digested sample was further incubated at 56 °C for 10 min to release the DNA substrate from the Cas9 endonuclease. After purification using AMPURE beads XP (Beckman Coulter, A63881), genomic DNA was digested by 0.5 µl of Bsp143I restriction enzyme (Thermo Fisher Scientific, FD0784) for 15 min at 37 °C followed by a heat-inactivation step at 65 °C for 20 min. Annealed splinkerette adapters (Aa+Bb; 125 pmol) were then ligated to the digested genomic DNA using 30 U of T4 DNA ligase HC (Thermo Fisher Scientific, EL0013), and the samples were incubated for 1 h at 22 °C followed by a heat-inactivation step at 65 °C for 10 min. For splinkerette amplifications, 96 independent PCR 1 reactions were performed combining 100 ng of the splinkerette sample, 1 U of Platinum Taq polymerase (Thermo Fisher Scientific, 10966034), 0.1 µM of HMSp1 and 0.1 µM of PB3-1 primer, and splinkerette PCR 2 was performed using 4 µl of PCR 1 product, 1 U of Platinum Taq polymerase (Thermo Fisher Scientific, 10966034), 0.1 µM of HMSp2 and 0.1 µM of PB3-2 primer. A list of the primers used for splinkerette PCRs is provided in Supplementary Table 1. Splinkerette amplicon products were processed using the NEB Ultra II kit according to the manufacturer’s protocol, using 50 ng of input material. Mapping of genome-wide insertions was performed as described in the ‘Mapping of piggyBac-enhancer insertion sites in population-based splinkerette PCR’ section.

Sample preparation for tagmentation-based mapping of PiggyBac insertions

PiggyBac integrations in pools of cells were mapped using a Tn5-transposon-based ITR mapping technique based on ref. ⁵³ with minor alterations. Cells (2 × 10⁵) were transfected with 0.5 µg of PBase using Lipofectamine 3000 (Thermo Fisher Scientific, L3000008) according to the manufacturer’s instructions in 24-well plates. Eight independent transfections were performed in parallel. Transfection efficiency as well as expression levels of PBase within the cell population were monitored by flow cytometry analysis. Then, 7 days after transfection with PBase, 6 cell pools of 10,000 cells from low GFP values (gates low 1 and low 2) and 6 cell pools of 337 cells of high GFP values (gate high) were sorted in a 24-well plate. Sorted cells were kept for 2 days in standard E14 medium supplemented with 100 μg ml⁻¹ primocin (InvivoGen, ant-pm-1) and 10 µM ROCK inhibitor (StemCell Technologies, Y-27632). Cells were cultured in E14 standard medium for either 1 passage (pools from gates low 1 and low 2) or 2 passages (pools from gate high) and genomic DNA from individual pools was extracted using the Quick-DNA Miniprep Plus Kit (Zymo Research, D4069) according to the manufacturer’s instructions. The Tn5 transposon was produced as described in ref. ⁵⁴. The tagmentation reaction was performed as follows. The primers TAC0101 & TAC0102 (45 μl of 100 μM) each were mixed with 10 μl 10× Tris-EDTA (pH 8) and annealed by heating to 95 °C followed by a slow ramp down (0.1 °C s⁻¹) until 4 °C. The transposome is obtained by combining the adapters (1 μl of 1:2 diluted adapters) and the Tn5 transposon (1.5 μl of 2.7 mg ml⁻¹ stock) in 18.7 μl Tn5 dilution buffer (20 mM HEPES, 500 mM NaCl, 25% glycerol) and incubating the mix for 1 h at 37 °C. The tagmentation was performed by mixing 100 ng of genomic DNA with 1 µl of assembled transposome, 4 µl 5× TAPS-PEG buffer (50 mM TAPS-NAOH, 25 mM MgCl₂, 8% (v/v) PEG8000) in a final volume of 20 µl. The reaction was incubated at 55 °C for 10 min and quenched with 0.2% SDS afterwards. For the best mapping results, both sides of the PiggyBac transposon were processed to obtain 5′ ITR- and 3′ ITR-specific libraries. First, we enriched our target region by linear amplification PCR with 3′ ITR-specific (TAC0006) and 5′ ITR-specific (TAC0099) primers. The PCR mix was 3 µl of tagmented DNA, 1 µl of 1 μM enrichment primer, 2 µl dNTPs (10 mM), 4 µl 5× Phusion HF Buffer (NEB), 0.25 μl Phusion HS Flex polymerase (2 U µl⁻¹, NEB), in a final volume of 20 µl and amplified as follows: 30 s at 98 °C; 45 cycles of 10 s at 98 °C, 20 s at 62 °C and 30 s at 72 °C; then 20 s at 72 °C. PCR 1 of the library preparation was performed using TAC0161 (3′ ITR) and TAC0110 (5′ ITR) in combination with N5xx (Illumina, Nextera Index Kit). The PCR mix was 5 µl of enrichment PCR, 1 µl of 10 μM primers, 2 µl dNTPs (10 mM), 4 µl 5× Phusion HF Buffer and 0.25 µl Phusion HS Flex polymerase (NEB), in a final volume of 25 µl and amplified as follows: 30 s at 98 °C; 3 cycles of 10 s at 98 °C, 20 s at 62 °C and 30 s at 72 °C; and 8 cycles of 10 s at 98 °C, 50 s at 72 °C. In PCR 2 the N7xx (Illumina, Nextera Index Kit) adapters were added to the PiggyBac specific locations asfollows. PCR was performed with TAC0103 (both ITRs) and N7xx. The PCR mix was 2 µl of PCR1, 1 µl of 10 μM primers, 2 µl dNTPs (10 mM), 4 µl 5× Phusion HF Buffer and 0.25 µl Phusion polymerase (Thermo Fisher Scientific), in a final volume of 22 µl and amplified as follows: 30 s at 98 °C; 10 cycles of 10 s at 98 °C, 20 s at 63 °C and 30 s at 72 °C. Then, 5 µl of library was checked on a 1% agarose gel and different samples were pooled according to smear intensity. Finally, the library was purified by bead purification using CleanPCR (CleanNA) beads at a ratio 1:0.8 sample:beads. The final library was sequenced using the Illumina MiSeq (150 bp, paired-end) system. Mapping of genome-wide insertions was performed as described in the ‘Mapping of piggyBac-enhancer insertion sites by tagmentation’ section.

Deletion of genomic regions containing CTCF-binding sites

gRNA sequences for depletion of the genomic regions containing the CTCF-binding sites were designed using the online tool (https://eu.idtdna.com/site/order/designtool/index/CRISPR_SEQUENCE) and purchased from Microsynth AG (Supplementary Table 1). gRNA sequences were cloned into the PX459 plasmid (Addgene) using the BsaI restriction site. To remove the first forward CTCF-binding site (chromosome 15: 11520474–11520491), 3 × 10⁵ cells were transfected with 0.5 µg of PX459 CTCF_KO_gRNA3/Cas9 and 1 µg of PX459 CTCF_KO_gRNA10/Cas9 plasmids using Lipofectamine 2000 (Thermo Fisher Scientific, 11668019) according to the manufacturer’s instructions. To remove the second forward CTCF-binding sites (chromosome 15: 11683162–11683179), 1 × 10⁶ cells were transfected with 1 µg of PX459 gRNA2_CTCF_KO/Cas9 and 1 µg of PX459 gRNA6_CTCF_KO/Cas9 plasmids using Lipofectamine 2000 (Thermo Fisher Scientific, 11668019) according to the manufacturer’s instructions. Then, 24 h after transfection, 1 µg ml⁻¹ of puromycin was added to the medium for 3 days. Cells were then cultured in standard E14 medium for an additional 4 days. To select cell lines with homozygous deletion, single cells were isolated by FACS on 96-well plate. Sorted cells were kept for 2 days in E14 standard medium supplemented with 100 μg ml⁻¹ primocin (InvivoGen, ant-pm-1) and 10 µM ROCK inhibitor (STEMCELL Technologies, Y-27632). Cells were then cultured in standard E14 medium. Genomic DNA was extracted by lysing cells with lysis buffer (100 mM Tris-HCl pH 8.0, 5 mM EDTA, 0.2% SDS, 50 mM NaCl, proteinase K and RNase) and subsequent isopropanol precipitation. Individual cell lines were analysed by genotyping PCR to determine homozygous deletion of the genomic regions containing the CTCF-binding sites. Cell lines showing the corrected genotyping pattern were selected and expanded. A list of the primers used for genotyping is provided in Supplementary Table 1.

smRNA-FISH

Cells were collected with accutase (Sigma-Aldrich, A6964) and adsorbed on poly-l-lysine (Sigma-Aldrich, P8920) precoated coverslips. Cells were then fixed with 3% PFA (EMS, 15710) in PBS for 10 min at room temperature, washed with PBS and kept in 70% ethanol at −20 °C. After at least 24 h incubation in 70% ethanol, the coverslips were incubated for 10 min with freshly prepared wash buffer composed of 10% formamide (Millipore Sigma, S4117) in 2× SSC (Sigma-Aldrich, S6639). The coverslips were hybridized overnight (around 16 h) at 37 °C in freshly prepared hybridization buffer composed of 10% formamide, 10% dextran sulfate (Sigma-Aldrich, D6001) in 2× SSC and containing 125 nM of RNA-FISH probe sets against Sox2 labelled with Quasar 670 (Stellaris) and against eGFP labelled with Quasar 570 (Stellaris). After hybridization, the coverslips were washed twice with wash buffer prewarmed to 37 °C for 30 min at 37 °C with shaking, followed by 5 min incubation with 500 ng ml⁻¹ DAPI solution (Sigma-Aldrich, D9564) in PBS (Sigma-Aldrich, D8537). The coverslips were then washed twice in PBS and mounted on slides with Prolong Gold medium (Invitrogen, P36934) and cured at room temperature for 24 h. The coverslips were then sealed and imaged within 24 h.

RNA-FISH image acquisition

Images were acquired on a Zeiss Axion Observer Z1 microscope equipped with 100 mW 561 nm and 100 mW 642 nm HR diode solid-state lasers, an Andor iXion 885 EMCCD camera, and an α Plan-Fluar ×100/1.45 NA oil-immersion objective. Quasar 570 signal was collected with the DsRed ET filter set (AHF Analysentechnik, F46-005), Quasar 670 with Cy5 HC mFISH filter set (AHF Analysentechnik, F36-760) and DAPI with the Sp. Aqua HC-mFISH filter set (AHF Analysentechnik, F36-710). The typical exposure time for RNA-FISH probes was set to around 300–500 ms with 15–20 EM gain and 100% laser intensity. DAPI signal was typically imaged with an exposure time of 20 ms with EM gain 3 and 50% laser intensity. The pixel size of the images was 0.080 × 0.080 µm with a z-step of 0.25 µm for around 55–70 z-planes.

Image processing and quantification of mRNA numbers

Raw images were processed in KNIME, python and Fiji to extract the numbers of RNAs per cell. The KNIME workflow described below is based on a previously published workflow⁵⁵. z-stacks were first projected to a maximal projection for each fluorescence channel. Individual cells were then segmented using the DAPI channel using Gaussian convolution (σ = 3), followed by filtering using global threshold with Otsu filter, watershed and connected component analysis for nuclei segmentation. Cytoplasmic areas were then estimated with seeded watershed. Cells with nuclei partially outside the frame of view were automatically excluded. Cells containing obvious artifacts, wrongly segmented or not fully captured in xyz dimensions were manually excluded from the final analysis. Spot detection is based on the Laplacian of Gaussian method implemented in TrackMate⁵⁶. For the channels containing RNA-FISH probes signal, RNAs spots were detected after background subtraction (rolling ball radius 20–25 pixels) by selecting spot size 0.2 μm and threshold for spot detection based on visual inspection of multiple representative images. Spot detection is based on the Laplacian of Gaussian method from TrackMate. Subpixel localization of RNA spots was detected for RNA channels and a list of spots per cell for each experimental condition and replicate was generated. Spots in each channel were then aggregated by cell in python to extract the number of RNAs per cell.

Enhancer reporter assays

To generate vectors for the enhancer reporter assay, the Sox2 promoter, SCR and the truncated versions of the SCR (Ei and Eii) were amplified from E14 mES cell genomic DNA with Phusion High-Fidelity DNA Polymerase (Thermo Fisher Scientific, F549) using primers compatible with Gibson assembly strategy. The Sox2 promoter was cloned into the 3-SB-EF1-PBBAR-SB vector as described above. The SCR and the truncated versions Ei and Eii were cloned in front of the Sox2 promoter by linearizing the vector with AgeI (NEB, R3552) and subsequently using Gibson assembly cloning. A transcriptional pause sequence from the human α2-globin gene and an SV40 poly(A) sequence was inserted at both the 5′ and 3′ ends of the enhancers. To test enhancers activity, 3 × 10⁵ cells were co-transfected with 0.5 µg of the different versions piggyBac vectors and 0.5 µg of pBroad3_hyPBase_IRES_tagRFPt using Lipofectamine 2000 (Thermo Fisher Scientific, 11668019) according to the manufacturer’s instructions. As a control, only 0.5 µg of the piggyBac vector carrying the Sox2 promoter was transfected. 24 h after transfection, cells were collected and analysed by flow cytometry.

Capture-C sample preparation

Cells (20 × 10⁶) were cross-linked with 1% formaldehyde (EMS, 15710) for 10 min at room temperature and quenched with glycine (final concentration, 0.125 M). Cells were lysed in 1 M Tris-HCl pH 8.0, 5 M NaCl and 10% NP40 and complete protease inhibitor (Sigma-Aldrich, 11836170001) and enzymatically digested using 1,000 U of MboI (NEB, R0147). Digested chromatin was then ligated at 16 °C with 10,000 U of T4 DNA ligase (NEB, M0202) in ligase buffer supplemented with 10% Triton X-100 (Sigma-Aldrich, T8787) and 240 µg of BSA (NEB, B9000). Ligated samples were de-cross-linked with 400 μg proteinase K (Macherey Nagel, 740506) at 65 °C and phenol–chloroform purified. 3C library preparation and target enrichment using a custom-designed collection of 6,979 biotinylated RNA ‘baits’ targeting single MboI restriction fragments chromosome 15: 10283500–13195800 (mm9) (Supplementary Table 2; Agilent Technologies; designed as in ref. ⁵⁷) were performed according to the SureSelectXT Target Enrichment System for Illumina Paired-End Multiplexed Sequencing Library protocol. The only exceptions were the use of 9 µg of 3C input material (instead of 3 µg) and shearing of DNA using Covaris sonication with the following settings: duty factor: 10%; peak incident power: 175; cycles per burst: 200; treatment time: 480 s; bath temperature: 4 °C to 8 °C).

Targeted nCATS analysis

gRNA sequences targeting specific genomic regions of chromosome 15 external to the homology arms of the transgene were designed using the online tool (https://eu.idtdna.com/site/order/designtool/index/CRISPR_SEQUENCE) (Supplementary Table 1). Custom-designed Alt-R CRISPR–Cas9 crRNAs (5 crRNAs targeting the region upstream and 5 crRNAs targeting the region downstream the integrated transgene), Alt-R CRISPR–Cas9 tracrRNA (IDT, 1072532) and Alt-R SpCas9 enzyme (IDT, 1081060) were purchased from IDT. Sample preparation and Cas9 enrichment were performed according to a previously described protocol⁵¹ with a few modifications. Genomic DNA from mES cell founder lines was extracted using the Gentra Puregene Cell Kit (Qiagen, 158745) according to the manufacturer’s instructions. The quality of the high molecular mass DNA was checked using the TapeStation (Agilent) system. Typically, 5 µg of high molecular mass DNA was processed for incubation using shrimp alkaline phosphatase (rSAP; NEB, M0371) for 30 min at 37 °C followed by 5 min at 65 °C to dephosphorylate DNA-free ends. For Cas9 enrichment of the target region, all ten Alt-R CRISPR-Cas9 crRNAs were first pooled at an equimolar amount (100 µM) and subsequently incubated with 100 µM of Alt-R CRISPR–Cas9 tracrRNA at 95 °C for 5 min to assemble the Alt-R guide RNA duplex (crRNA:tracrRNA). To assemble the RNP complex, 4 pmol of Alt-R SpCas9 enzyme was incubated with 8 pmol Alt-R guide RNA (crRNA:tracrRNA) at room temperature for 20 min. In vitro digestion and A-tailing of the DNA were performed by adding 10 µl of the RNP complex, 10 mM of dATP (NEB, N0440) and 5 U of Taq Polymerase (NEB, M0267) and incubating the samples for 30 min at 37 °C followed by 5 min at 72 °C. Adapter ligation for Nanopore sequencing was performed using the Ligation Sequencing Kit (Nanopore, SQK-CAS109) according to the manufacturer’s instructions. After purification with AMPure PB beads (Witec, 100-265-900), the samples were loaded into the MniION system, selecting the SQK-CAS109 protocol.

Nanopore sequencing analysis

To map Nanopore sequencing reads, we first built a custom genome consisting of the transgene sequence flanked by ~10 kb mouse genomic sequence upstream and downstream of the target integration site. The custom genome can be found at GitHub (https://github.com/zhanyinx/Zuin_Roth_2021/blob/main/Nanopore/cassette/cassette.fa). Reads were mapped to the custom genome using minimap2 (v.2.17-r941) with the ‘-x map-ont’ parameter. Nanopore sequencing analysis has been implemented using Snakemake workflow (v.3.13.3). Reads were visualized using IGV (v.2.9.4). The full workflow can be found at GitHub (https://github.com/zhanyinx/Zuin_Roth_2021).

RNA-sequencing sample preparation and analysis

Mouse embryonic stem cells were collected with accutase (5 min, 37 °C) and counted. Cells (3 × 10⁵) were lysed with 300 µl TRIzol reagent. RNA was extracted using the Direct-Zol RNA extraction kit from Zymo. Library preparation was performed after Illumina TruSeq Stranded mRNA-seq according to the manufacturer protocol. Reads were mapped to the Mus musculus genome (build mm9) using STAR⁵⁸, using the following options: –outSJfilterReads Unique –outFilterType BySJout –outFilterMultimapNmax 10 –alignSJoverhangMin 6 –alignSJDBoverhangMin 2 –outFilterMismatchNoverLmax 0.04 –alignIntronMin 20 –alignIntronMax 1000000 –outSAMstrandField intronMotif –outFilterIntronMotifs RemoveNoncanonicalUnannotated –outSAMtype BAM SortedByCoordinate –seedSearchStartLmax 50 –twopassMode basic. Gene expression was quantified using qCount from QuasR package⁵⁹ using the ‘TxDb.Mmusculus.UCSC.mm9.knownGene’ database for gene annotation (Bioconductor package: Carlson M and Maintainer BP. TxDb.Mmusculus.UCSC.mm9.knownGene: Annotation package for TxDb object(s); R package v.3.2.2). Active promoters were defined as genes with log₂[RPKM + 0.1] higher than 1.5.

Capture-C analysis

Capture-C data were analysed using HiC-Pro⁶⁰ (v.2.11.4); the parameters can be found at GitHub (https://github.com/zhanyinx/Zuin_Roth_2021). In brief, read pairs were mapped to the mouse genome (build mm9). Chimeric reads were recovered after recognition of the ligation site. Only unique valid pairs mapping to the target regions were used to build contact maps. Iterative correction⁶¹ was then applied to the binned data. The target regions can be found at GitHub (https://github.com/zhanyinx/Zuin_Roth_2021). For SCR_ΔΔCTCF, SCR_ΔCTCF and the derived clonal lines, data from replicate one were used to make the quantification and plots throughout the manuscript.

Differential capture-C maps

To evaluate the structural perturbation induced by the insertion of the transgene and the mobilization of the enhancer (ectopic sequences), we accounted for differences in genomic distances due to the presence of the ectopic sequence. In the founder cell line (for example, SCR_ΔΔCTCF), insertion of the transgene modifies the genomic distance between loci upstream and downstream the insertion site. To account for these differences, we generated distance-normalized capture-C maps in which each entry corresponds to the interaction normalized to the corrected genomic distance between the interacting bins. Outliers (defined using the interquartile rule) or bins with no reported interactions from capture-C were treated as noise and filtered out. Singletons, defined as the top 0.1 percentile of Z-score, were also filtered out. The Z-score is defined as (obs – exp)/stdev, where obs is the capture-C signal for a given interaction and exp and stdev are the genome-wide average and standard deviation, respectively, of capture-C signals at the genomic distance separating the two loci. We next calculated the ratios between distance normalized and noise-filtered capture-C maps. A bilinear smoothing with a window of 2 bins was applied to the ratio maps to evaluate the structural perturbation induced by the insertion of the ectopic sequence.

Chromatin state calling with ChromHMM

Chromatin states were called using ChromHMM²⁸ with four states. The list of histone modification datasets used is provided in Supplementary Table 3. States with enrichment in H3K9me3 and H3K27me3 were merged, therefore resulting in three chromatin states: active (enriched in H3K27ac, H3K36me3, H3K4me1 and H3K9ac), repressive (enriched in H3K9me3 and H3K27me3) and neutral (no enrichment).

Mapping of piggyBac-enhancer insertion sites in population-based splinkerette PCR

To identify true-positive enhancer re-insertion sites, we first filtered out reads containing eGFP fragments. We then retained only read pairs for which one side mapped to the ITR sequence and the other side mapped to the splinkerette adapter sequence. We mapped separately the ITR/splinkerette sides of the read pair to the mouse genome (build mm9) using BWA mem⁶² with the default parameters. Only integration sites that had more than 20 reads from both ITR and splinkerette sides were retained.

Mapping of piggyBac-enhancer insertion sites in individual cell lines

To map the enhancer position in individual cell lines, Sanger sequencing (Microsynth) without the adapter sequences were filtered out. The first 24 bp of each read after the adapter was then mapped to the mouse genome (mm9) using vmatchPattern (Biostrings v.2.58.0). The script used to map Sanger sequencing can be found at GitHub (https://github.com/zhanyinx/Zuin_Roth_2021).

Mapping of piggyBac-enhancer insertion sites by tagmentation

Before aligning paired-end sequencing reads, reads were filtered using an adaptation of cutadapt⁶³, processing each read pair in multiple steps. Sequence patterns originating from Tn5 and each ITR were removed. The paired-end reads coming from both ITRs were treated the same. First, the presence of the unique part of the 5′ ITR and 3′ ITR sequence was detected at the start of the second read of the pair and, if present, this sequence was trimmed. Next, the sequence up to and including the TTAA site that was found on both the 5′ITR and 3′ITR was trimmed off. This sequence only partly contained the respective primers used for each ITR, and was used to filter reads that contained the sequence expected for a correct PCR product starting at the transposon. The sequence up to, but not including, the TTAA was removed. Next, all of the other sequence patterns coming from either Tn5 or the ITR were removed from the 5′ end of the first read in the pair and the 3′ end of both reads.

After filtering and trimming the reads, the reads were aligned to a reference genome with an in silico insertion of the split-GFP construct, but with a single TTAA motif instead of the PiggyBac transposon. This was done by aligning the homology arms found in the plasmid against mm10 reference genome. The complete sequence on the reference matching both arms was replaced by the plasmid sequence inserted.

Alignment was performed using Bowtie2 with the fragment length set to a minimum of 0 bp and maximum of 2,000 bp and the very-sensitive option was used. After reads were aligned to the genome, sambamba⁶⁴ was used to remove duplicates and samtools⁶⁵ was used to filter out read pairs that were not properly paired. We then designated, for each read pair, the position of the first 4 nucleotides of the second read as a putative insertion site. To calculate the fraction of reads originating from the non-mobilized position, the number of read pairs that overlapped the non-mobilized position (the TTAA replacing the PiggyBac of the in silico insert) was divided over the total number of reads originating from putative insertion sites supported by at least one read pair with a mapping quality higher than 2. Confident insertions were identified as those with at least one read for both 5′ and 3′ ITR.

Calibration of the mean number of mRNAs per cell with smRNA-FISH

A linear model was used to predict the average number of eGFP mRNAs on the basis of the mean eGFP intensity. The model was fitted on 7 data points corresponding to the average number of eGFP mRNAs obtained using single-molecule RNA fluorescence in situ and the mean eGFP intensity obtained by flow cytometry (Extended Data Fig. 1h; R² = 0.9749, P < 0.0001, t-test).

Mathematical model and parameter fitting

The phenomenological two-state model (Fig. 2) and the apparent two-state model deduced from the mechanistic enhancer–promoter model (Fig. 3) were both fitted simultaneously to the mean eGFP levels measured in individual cell lines and to the distributions of RNA numbers measured by smRNA-FISH in six cell lines where the SCR was located at different distances from the promoter. The mean number of mRNAs was calculated analytically and the steady-state distribution of the number of mRNA per cell was approximated numerically (Supplementary Information, model description). The parameters for the phenomenological two-state model are the minimum on rate ({k}_{{rm{on}}}^{0}), the minimum on rate ({k}_{{rm{on}}}^{1}), the off rate k_off, the initiation rate µ and the constant c and Hill exponent h, which together control the nonlinear dependency of k_on on contact probability. The parameters for the apparent two-state model are the basal on rate ({k}_{{rm{on}}}^{mathrm{basal}}), the enhanced on rate ({k}_{{rm{on}}}^{mathrm{enh}}), the off rate k_off, the initiation rate µ, the ratio between the forward and backward rates of the regulatory steps β and the number of regulatory steps n. All of these parameters were considered to be free in the fitting procedure. The apparent two-state model was also fitted to the binned mean number of mRNA molecules inferred from the eGFP⁺ cell lines with the truncated version of the SCR (Fig. 4). In this case, three versions of the apparent two-state model were fitted to the data using log-transformed likelihood ratios. The parameter β (version 1) or ({k}_{{rm{on}}}^{mathrm{enh}}) (model 2) or both (model 3) were considered to be free parameters, whereas the other parameters were fixed to the best fit values obtained for the full-length SCR dataset. Using log-transformed likelihood ratios, the fit of the three versions was compared to the fit of the model for which all of the parameters were considered to be free. The mathematical description of the enhancer–promoter communication model, the derivation of the apparent two-state model, and the fitting procedures are explained in detail in the Supplementary Information (model description).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Source link