May 4, 2024

Cell-type specialization is encoded by specific chromatin topologies – Nature

Randomization, blinding, and sample size

Randomization and blinding were not relevant for the current study. The experiments and the subsequent analyses were performed on wild-type animals or cell lines, for which no clinical trial, treatment or disease comparison was performed. Samples were processed in different laboratories by different people, and there was no selection criteria for the wild-type mice used in the study. The appropriate number of samples for a GAM dataset varies and depends on multiple parameters such as nuclear volume, level of chromatin compaction, quality of DNA extraction, and so on. Because most of these parameters can be assessed only after the data have been collected and processed, we recommend that the optimal resolution is defined during the collection of each GAM dataset, rather than trying to estimate optimal sample size before data collection. GAM data can be collected in multiple batches from the same starting material, therefore the sample size can be increased until the desired resolution is achieved. For scRNA-seq experiments in mES cells, no statistical method was used to predetermine sample size. Libraries were generated twice, from mES cells from different biological replicates, to account for experimental variability. For scATAC-seq experiments, no statistical method was used to predetermine sample size.

Animal maintenance

Collection of GAM data from DNs was performed using one C57Bl/6NCrl (RRID: IMSR_CR:027; WT) mouse, which was purchased from Charles River, and from one tyrosine hydroxylase–green fluorescent protein (TH–GFP; B6.Cg-Tg(TH-GFP)21-31/C57B6) mouse, obtained as previously described50,51. All procedures involving WT and TH–GFP animals were approved by the Imperial College London’s Animal Welfare and Ethics Review Body. Adult male mice aged 2–3 months were used. All mice had access to food and water ad libitum and were kept on a 12-h light/12-h dark cycle at 20–23 °C and 45 ± 5% humidity. WT and TH–GFP mice received an intraperitoneal injection of saline 14 days or 24 h, respectively, before tissue collection, and they were part of a larger experiment for a different study. Collection of single-nucleus ATAC-seq (snATAC-seq) data from the midbrain VTA was performed using male C57Bl/6Nl (RRID: IMSR_CR:027; WT) mice, aged 7 and 9 weeks, which were a gift from M. Gotthardt. Mice for snATAC-seq were housed in a temperature-controlled room at 22 ± 2 °C with humidity of 55 ± 10% in individually ventilated cages with 12-h light/12-h dark cycles and with access to food and water ad libitum. All experiments involving snATAC-seq animals were carried out following institutional guidelines as approved by LaGeSo Berlin and following the Directive 2010/63/EU of the European Parliament on the protection of animals used for scientific purposes. Organ preparation was done under license X9014/11.

Collection of GAM data from somatosensory oligodendrocyte cells was performed using Sox10::cre-RCE::loxP-EGFP animals52, which were obtained by crossing Sox10::cre animals53 on a C57BL/6j genetic background with RCE::loxP-EGFP animals54 on a C57BL/6×CD1 mixed genetic background, both available from The Jackson Laboratory. The cre allele was maintained in hemizygosity, whereas the reporter allele was maintained in hemizygosity or homozygosity. Experimental procedures for Sox10::cre-RCE::loxP-EGFP animals were performed following the European directive 2010/63/EU, local Swedish directive L150/SJVFS/2019:9, Saknr L150 and Karolinska Institutet complementary guidelines for the procurement and use of laboratory animals, Dnr 1937/03-640. The procedures described were approved by the local committee for ethical experiments on laboratory animals in Sweden (Stockholms Norra Djurförsöksetiska nämnd), licence number 130/15. One male mouse was killed at post-natal day 21 (P21). Mice were housed to a maximum number of 5 per cage in individually ventilated cages with the following light/dark cycle: dawn 6:00–7:00, daylight 7:00–18:00, dusk 18:00–19:00, night 19:00–6:00. All mice had access to food and water ad libitum and were housed at 22 °C and 50% humidity.

Collection of GAM data from hippocampal CA1 PGNs was performed using two 19-week-old male Satb2flox/flox mice. C57Bl/6NCrl (RRID: IMSR_CR:027; WT) mice were purchased from Charles River, Satb2flox/flox mice that carry the loxP flanked exon 4 have been previously described55. The experimental procedures were done according to the AustrianAnimal Experimentation Ethics Board (Bundesministerium für Wissenschaft und Verkehr, Kommission für Tierversuchsangelegenheiten). All mice had access to food and water ad libitum and were kept on a 12-h light/12-h dark cycle at 22.5 °C and 55 ± 10% humidity.

Tissue fixation and preparation

WT, TH–GFP and Satb2flox/flox mice were anaesthetised under isoflurane (4%), given a lethal intraperitoneal injection of pentobarbital (0.08 μl, 100 mg ml–1 Euthatal) and transcardially perfused with 50 ml ice-cold PBS followed by 50–100 ml 4% depolymerized paraformaldehyde (PFA; electron microscopy grade, methanol-free) in 250 mM HEPES–NaOH (pH 7.4–7.6). Sox10::cre-RCE::loxP-EGFP animals were killed using an intraperitoneal injection of ketaminol and xylazine followed by transcardialperfusion with 20 ml PBS and 20 ml 4% PFA in 250 mM HEPES (pH 7.4–7.6). Brains from WT or TH–GFP mice were removed, and the tissue containing the VTA was dissected from each hemisphere at room temperature and rapidly transferred to fixative. For Satb2flox/flox mice, the CA1 field ippocampus was dissected from each hemisphere at room temperature. For Sox10cre/RCE mice, brain tissue containing the somatosensory cortex was dissected at room temperature. Following dissection, tissue blocks were placed in 4% PFA in 250 mM HEPES–NaOH (pH 7.4–7.6) for post-fixation at 4 °C for 1 h. Brains were then placed in 8% PFA in 250 mM HEPES and incubated at 4 °C for 2–3 h. Tissue blocks were then placed in 1% PFA in 250 mM HEPES and kept at 4 °C until tissue was prepared for cryopreservation (up to 5 days, with daily solution changes).

Cryoblock preparation and cryosectioning

Fixed tissue samples from different brain regions were further dissected to produce about 1.5 × 3 mm tissue samples suitable for Tokuyasu cryosectioning2 (Extended Data Fig. 1a) at room temperature in 1% PFA in 250 mM HEPES. For the hippocampus, the dorsal CA1 region was further isolated. Approximately 1–3 × 1–3 mm blocks were dissected from all brain regions and were further incubated in 4% PFA in 250 mM HEPES at 4 °C for 1 h. The fixed tissue was transferred to 2.1 M sucrose in PBS and embedded for 16–24 h at 4 °C, before being positioned at the top of copper stub holders suitable for ultracryomicrotomy and frozen in liquid nitrogen. Cryopreserved tissue samples are kept indefinitely immersed under liquid nitrogen.

Frozen tissue blocks were cryosectioned with an Ultracryomicrotome (Leica Biosystems, EM UC7), with an approximate 220–230 nm thickness2. Cryosections were captured in drops of 2.1 M sucrose in PBS solution suspended in a copper wire loop and transferred to 10-mm glass coverslips for confocal imaging or onto a 4.0-µm polyethylene naphthalate (PEN; Leica Microsystems, 11600289) membrane on metal framed slides for laser microdissection.

Immunofluorescence detection of GAM samples for confocal microscopy

For confocal imaging, cryosections were incubated in sheep anti-TH (1:500; Pel Freez Arkansas, P60101-0), mouse anti-pan-histone H11-4 (1:500; Merck, MAB3422) or chicken anti-GFP (1:500; Abcam, ab13970) followed by donkey anti-sheep or goat anti-chicken IgG conjugated with Alexa Fluor-488 (for TH and GFP; Abcam) or donkey anti-mouse IgG conjugatedwith Alexa Fluor-555 or Alexa Fluor-488 (for pan-histone; Invitrogen).

For PGNs, cryosections were washed (3 times, 30 min in total) in PBS, permeabilized (5 min) in 0.3% Triton X-100 in PBS (v/v) and incubated (2 h, room temperature) in blocking solution (1% BSA (w/v), 5% fetal bovine serum (FBS (w/v), Gibco, 10270), 0.05% Triton X-100 (v/v) in PBS). After incubation (overnight, 4 °C) with primary antibody in blocking solution, the cryosections were washed (3–5 times, 30 min) in 0.025% Triton X-100 in PBS (v/v) and immunolabelled (1 h, room temperature) with secondary antibodies in blocking solution followed by 3 washes (15 min) in PBS. Cryosections were then counterstained (5 min) with 0.5 µg ml–1 4′,6′-diamino-2-phenylindole (DAPI; Sigma-Aldrich, D9542) in PBS, and then rinsed in PBS and water. Coverslips were mounted in Mowiol 4-88 solution in 5% glycerol, 0.1 M Tris-HCl (pH 8.5).

The number of SATB2-positive cells present in the hippocampal CA1 area of the Satb2flox/flox control mice was determined by counting nuclei positive for SATB2 immunostaining (1:100; Abcam, ab10563678). To avoid counting the same nuclei, only every 30th ultrathin section cut through the tissue was collected, and the remaining sections discarded. Twenty-five nuclei were identified in the pyramidal neuron layer per image in the DAPI channel, and only SATB2-positive cells were counted. We confirmed that most cells (96%) within the CA1 layer were PGNs (data not shown).

For DNs and OLGs, cryosections were washed (3 times, 30 min in total) in PBS, quenched (20 min) in PBS containing 20 mM glycine, then permeabilized (15 min) in 0.1% Triton X-100 in PBS (v/v). Cryosections were then incubated (1 h, room temperature) in blocking solution (1% BSA (w/v), 0.2% fish-skin gelatin (w/v), 0.05% casein (w/v) and 0.05% Tween-20 (v/v) in PBS). After incubation (overnight, 4 °C) with the antibody in blocking solution, the cryosections were washed (3–5 times, 1 h) in blocking solution and immunolabelled (1 h, room temperature) with secondary antibodies in blocking solution, followed by 3 washes (15 min) in 0.5% Tween-20 in PBS (v/v). Cryosections were then counterstained with 0.5 µg ml–1 DAPI in PBS, then rinsed in PBS. Coverslips were mounted in Mowiol 4-88.

Digital images were acquired with a Leica TCS SP8-STED confocal microscope (Leica Microsystems) using a ×63 oil-immersion objective (numerical aperture of 1.4) or a ×2 oil-immersion objective, using a pinhole equivalent to 1 Airy disk. Images were acquired using 405-nm excitation and 420–480-nm emission for DAPI, 488-nm excitation and 505–530-nm emission for TH or GFP, and 555-nm excitation and 560-nm emission using a long-pass filter at 1,024 × 1,024 pixel resolution. Images were processed using Fiji (v.2.0.0-rc-69/1.52p), and adjustments included the optimization of the dynamic signal range with contrast stretching.

Immunofluorescence detection of GAM samples for laser microdissection

For laser microdissection, cryosections on PEN membranes were washed, permeabilized and blocked as for confocal microscopy, and incubated with primary and secondary antibodies as indicated above except for the use of higher concentrations of primary antibodies, as follows: anti-TH (1:50), anti-pan-histone (1:50) or anti-GFP (1:50). Secondary antibodies were used at the same concentration. Cell staining was visualized using a Leica laser microdissection microscope (Leica Microsystems, LMD7000) using a ×3 dry objective. Following detection of cellular sections of the cell types of choice containing nuclear slices (nuclear profiles (NPs)), individual NPs were laser microdissected from the PEN membrane and collected into PCR adhesive caps (AdhesiveStrip 8C opaque, Carl Zeiss, 415190-9161-000). We used multiplex-GAM9, for which three NPs were collected into each adhesive cap and the presence of NPs in each lid was confirmed with a ×5 objective using a 420–480-nm emission filter. Control lids not containing NPs (water controls) were included for each dataset collection to keep track of contamination and noise amplification of whole-genome amplification (WGA) and library reactions, and can be found in Supplementary Table 2.

WGA of NPs

WGA was performed using an in-house protocol. In brief, NPs were lysed directly in the PCR adhesive caps for 4 h (or 24 h for 160 out of 585 GAM samples from DN replicate 1) at 60 °C in 1.2× lysis buffer (30 mM Tris-HCl pH 8.0, 2 mM EDTA pH 8.0, 800 mM guanidinium-HCl, 5 % (v/v) Tween 20, 0.5 % (v/v) Triton X-100) containing 2.116 units ml–1 Qiagen protease (Qiagen, 19155). After protease inactivation at 75 °C for 30 min, the extracted DNA was amplified using random hexamer primers with an adaptor sequence. The pre-amplification step was done using 2× DeepVent mix (2× Thermo polymerase buffer (10×), 400 µm dNTPs, 4 mM MgSO4 in ultrapure water), 0.5 µM GAT-7N primers (5′-GTG AGT GAT GGT TGA GGT AGT GTG GAG NNN NNN N) and 2 units µl–1 DeepVent (exo-) DNA polymerase (New England Biolabs, M0259L) in the programmable thermal cycler for 11 cycles. Primers that annealed to the general adaptor sequence were then used in a second exponential amplification reaction to increase the amount of product. The exponential amplification was done using 2× DeepVent mix, 10 mM dNTPs, 100 µM GAM-COM primers (5′-GTGAGTGATGGTTGAGGTAGTGTGGAG) and 2 units µl–1 DeepVent (exo-) DNA polymerase in the programmable thermal cycler for 26 cycles. For a small number of NPs from DNs (Supplementary Table 2), WGA was performed using a WGA4 kit (Sigma-Aldrich) using the manufacturer’s instructions; the recent formulation of this kit is no longer suitable for GAM data production from subcellular nuclear slices.

GAM library preparation and high-throughput sequencing

Following WGA, the samples were purified using SPRI beads (0.725 or 1.7 ratio of beads per sample volume). The DNA concentration of each purified sample was measured using a Quant-iT Pico Green dsDNA assay kit (Invitrogen, P7589) according to the manufacturer’s instructions. GAM libraries were prepared using an Illumina Nextera XT library preparation kit (Illumina, FC-131-1096) following the manufacturer’s instructions with an 80% reduced volume of reagents. Following library preparation, the DNA was purified using SPRI beads (1.7 ratio of beads per sample volume) and the concentration for each sample was measured using a Quant-iT PicoGreen dsDNA assay. An equal amount of DNA from each sample was pooled together (up to 196 samples), and the final pool was additionally purified three times using the SPRI beads (1.7 ratio of beads per sample volume). The final pool of libraries was analysed using DNA High Sensitivity on-chip electrophoresis on an Agilent 2100 Bioanalyzer to confirm the removal of primer dimers and to estimate the average size and DNA fragment size distribution in the pool. NGS libraries were sequenced on an Illumina NextSeq 500 machine according to the manufacturer’s instructions using single-end 75 bp reads. The number of sequenced reads for each sample can be found in Supplementary Table 2.

Tn5-based libraries are preferred for GAM data sequencing to increase fragment sequence variation, which helps avoid the need for dark cycles in the current Illumina machines. This choice greatly reduces the cost of sequencing and decreases the frequency of noise reads from absent windows seen with the previous protocol3.

GAM data sequence alignment

Sequenced reads from each GAM library were mapped to the mouse genome assembly GRCm38 (December 2011, mm10) with Bowtie2 (v.2.3.4.3) using default settings56. All non-uniquely mapped reads, reads with mapping quality <20 and PCR duplicates were excluded from further analyses.

GAM data window calling and sample QC

Positive genomic windows present within ultrathin nuclear slices were identified for each GAM library. In brief, the genome was split into equal-sized windows (50 kb), and the number of nucleotides sequenced in each bin was calculated for each GAM sample with bedtools57. Next, we determined the percentage of orphan windows (that is, positive windows that were flanked by two adjacent negative windows) for every percentile of the nucleotide coverage distribution and we identified the percentile with the lowest percentage of orphan windows for each GAM sample in the dataset. The number of nucleotides that corresponds to the percentile with the lowest percentage of orphan windows in each sample was used as an optimal coverage threshold for window identification in each sample. Windows were called positive if the number of nucleotides sequenced in each bin was greater than the determined optimal threshold.

Each dataset was assessed for QC by determining the percentage of orphan windows in each sample, the number of uniquely mapped reads to the mouse genome and the correlations from cross-well contamination for every sample (Supplementary Table 2). Most GAM libraries passed the QC analyses (86–96% in each dataset; Extended Data Fig. 1b, c). To assess the quality of sampling in each GAM dataset, we measured the frequency with which all possible intrachromosomal pairs of genomic windows are found in the same GAM sample; we found that 98.8–99.9% of all mappable pairs of windows were sampled at least once at resolution 50 kb at all genomic distances. Each sample was considered to be of good quality if they had <70% orphan windows, >50,000 uniquely mapped reads and a cross-well contamination score determined per collection plate of <0.4 (Jaccard index). The number of samples in each cell type that passed QC is summarized in Extended Data Fig. 2a. Following QC analysis, we noted that the 160 (out of 585) DN replicate 1 samples incubated with lysis buffer for 24 h had decreases in orphan windows (median = 26% and 36% for 24 h and 4 h, respectively) and increases in total genome coverage (median = 9% and 6% for 24 h and 4 h, respectively). Although these differences were minor, we recommend 24 h lysis for future work.

Publicly available GAM datasets from mES cells

For mES cells, GAM datasets were downloaded from the 4D Nucleome portal (https://data.4dnucleome.org/). We used 249 × 3 NP GAM datasets from mES cells (clone 46C), which were grown at 37 °C in a 5% CO2 incubator in Glasgow modified Eagle’s medium (MEM), supplemented with 10% FBS, 2 ng ml–1 leukaemia inhibitory factor (LIF) and 1 mM 2-mercaptoethanol, on 0.1% gelatin-coated dishes. Cells were passaged every other day. After the last passage, 24 h before collection, mES cells were re-plated in serum-free ESGRO Complete Clonal Grade medium (Merck, SF001- B). The list of 4DN sample identity numbers is provided in Supplementary Table 1.

Visualization of pairwise chromatin contact matrices

To visualize GAM data, contact matrices were calculated using pointwise mutual information (PMI) for all pairs of windows genome-wide. PMI describes the difference between the probability of a pair of genomic windows being found in the same NP given both their joint distribution and their individual distributions across all NPs. PMI was calculated using the following formula, where p(x) and p(y) are the individual distributions of genomic windows x and y, respectively, and p(x,y) are their joint distribution:

$${rm{PMI}}={rm{log }}(p(x,y)/p(x)p(y))$$

(1)

PMI can be bounded between −1 and 1 to produce a normalized PMI (NPMI) value given by the following formula:

$${rm{NPMI}}={rm{PMI}}/(-{rm{log }}(p(x,y)))$$

(2)

For visualization of the contact matrices, scale bars are adjusted in each genomic region displayed to a range between 0 and the 99th percentile of NPMI values for each cell type.

Insulation score and topological domain boundary calling

TAD calling was performed by calculating insulation scores in NPMI GAM contact matrices at 50-kb resolution, as previously described2,9. The insulation square method was chosen as it was previously shown that the domain borders detected in GAM data are also found in Hi-C, for which they are the most robust (most insulated)2,9. The insulation score was computed individually for each cell type and biological replicate, with insulation square sizes ranging from 100 to 1,000 kb. TAD boundaries were called using a 500-kb insulation square size and based on local minima of the insulation score. This approach does not detect meta-TADs or sub-TADs, and results in numbers and lengths of domains were similar to previous reports6,58. Future work with higher resolution GAM datasets will enable further analyses of the reorganization of domains at finer genomic scales to investigate changes in sub-TADs, which have been previously shown to occur following cell commitment to neuronal lineages59.

Within each dataset, boundaries that were touching or overlapping by at least one nucleotide were merged. Boundaries were further refined to consider only the minimum insulation score within the boundary and one window on each side, to produce a 3-bin ‘minimum insulation score’ boundary. In comparisons of boundaries between different datasets, 150-kb boundaries were considered different when separated by at least one 50-kb genomic bin, that is, if the centre of the boundaries are separated by at least 200 kb (note chromosome Y was excluded from the analysis). In Fig. 2b, we considered the boundary coordinate as the genomic window within a boundary with the lowest insulation value. TAD border coordinates for all cell types can be found in Supplementary Table 3, and the full range of insulation scores (100–1,000 kb) for all cell types can be found in Supplementary Table 4. UpSet plots for TAD border overlaps, compartments and TF motif analyses were generated using either custom Python or R scripts or using the UpSetR package (v.1.4.0)60.

Identification of compartments A and B

For compartment analysis, matrices of co-segregation frequency were determined using the ratio of independent occurrence of a single positive window in each sample over the pairwise co-occurrence of pairs of positive windows in a given pair of genomic windows2. GAM co-segregation matrices at 250-kb resolution were assigned to either A or B compartments, as previously described2. In brief, each chromosome was represented as a matrix of observed interactions O(i,j) between locus i and locus j (co-segregation) and separately for E(i,j), whereby each pair of genomic window is the mean number of contacts with the same distance between i and j. A matrix of observed over expected values O/E(i,j) was produced by dividing O by E. A correlation matrix C(i,j) was produced between column i and column j of the O/E matrix. PCA was performed for the first three components on matrix C before extracting the component with the best correlation to GC content. Loci with PCA eigenvector values with the same sign that correlate best with GC content were called A compartments, whereas regions with the opposite sign were B compartments. For visualizations and Pearson’s correlations between datasets, eigenvector values on the same chromosome in compartment A were normalized from 0 to 1, whereas values on the same chromosome in compartment B were normalized from −1 to 0. Compartments were considered common if they had the same compartment definition within the same genomic bin. Compartment changes between cell types were computed after considering compartments that were common between biological replicates unless otherwise indicated.

To identify and visualize gene expression differences among genes in changing compartments, k-means clustering was performed on triplicate pseudo-replicates of each cell type using a custom Python script (Extended Data Fig. 12a, b). The number of clusters were determined using the elbow method, with k-means = 6 for genes in compartment B in mES cells and compartment A in brain cells, and k-means = 5 for compartment A in mES cells and compartment B in brain cells.

mES cell culture for scRNA-seq and scATAC-seq

mES cells from the 46C clone, derived from E14tg2a and expressing GFP under the Sox1 promoter61, were a gift from D. Henrique (Instituto de Medicina Molecular, Faculdade Medicina Lisboa, Lisbon, Portugal). mES cells were cultured as previously described62. In brief, cells were routinely grown at 37 °C, 5% (v/v) CO2, on gelatine-coated (0.1% v/v) Nunc T25 flasks in Gibco Glasgow’s MEM (Invitrogen, 21710082), supplemented with 10% (v/v) fetal calf serum (BioScience LifeSciences, 7.01, batch number 110006) for scRNA-seq or Gibco FBS (Invitrogen, 10270-106, batch number 41F8126K) for ATAC-seq, 2,000 units ml–1 LIF (Millipore, ESG1107), 0.1 mM β-mercaptoethanol (Invitrogen, 31350-010), 2 mM l-glutamine (Invitrogen, 25030-024), 1 mM sodium pyruvate (Invitrogen, 11360070), 1% penicillin–streptomycin (Invitrogen, 15140122) and 1% MEM non-essential amino acids (Invitrogen, 11140035). Medium was changed every day and cells were split every other day. mES cell batches tested negative for Mycoplasma infection, which was performed according to the manufacturer’s instructions (AppliChem, A3744,0020). Before collecting material for scRNA-seq or ATAC-seq, cells were grown for 48 h in serum-free ESGRO Complete Clonal Grade medium (Merck, SF001- B), supplemented with 1,000 units ml–1 LIF, on gelatine -coated (Sigma, G1393-100 ml, 0.1% v/v) Nunc 10-cm dishes, with a change in medium after 24 h.

46C E14tg2 mES cells are not listed in the ICLAC Register of Misidentified Cell Lines. The 46C E14tg2 mES cell line was generated by insertion of an eGFP cassette under the control of the Sox1 promoter in E14tg2 cells. Reads aligned with the GFP sequence were identified in the GAM sequencing data from mES cells. In addition, genome sequencing data from GAM mES cell samples was mined for single nucleotide polymorphisms (SNPs). Although GAM sequencing reads are sparsely distributed across the genome, there was a 64% overlap of GAM mES cell SNPs with SNPs identified from the parental E14tg2 genome sequencing data (https://www.ncbi.nlm.nih.gov/sra?term=SRX389523; data not shown).

Single-cell mRNA library preparation

Two batches (denoted batch A and B) of single-cell mRNA-seq libraries were prepared according to the Fluidigm manual “Using the C1 Single-Cell Auto Prep System to Generate mRNA from Single Cells and Libraries for Sequencing”. Cell suspension was loaded on 10–17 μm C1 Single-Cell Auto Prep IFCs (Fluidigm, 100-5760, kit 100-6201). After loading, the chip was observed under the microscope to score cells as singlets, doublets, multiplets, debris or other. The chip was then loaded again on Fluidigm C1 IFCs, and cDNA was synthesized and pre-amplified in the chip using a Clontech SMARTer kit (Takara Clontech, 634833). In batch B, we included Spike-In Mix 1 (1:1,000; Life Technologies, 4456740) as per the Fluidigm manual. Illumina sequencing libraries were prepared using a Nextera XT kit (Illumina, FC- 131-1096) and a Nextera Index kit (Illumina, FC-131-1002), as previously described63. Libraries from each microfluidic chip (96 cells) were pooled and sequenced on 4 lanes on Illumina HiSeq 2000, 2×100-bp paired-end (batch A) or 1 lane on Illumina HiSeq 2000, 2×125-bp paired-end (batch B) at the Wellcome Trust Sanger Institute Sequencing Facility (Supplementary Table 15).

scRNA-seq data processing, mapping and expression estimates

To calculate expression estimates, mRNA-seq reads were mapped with STAR (spliced transcripts alignment to a reference, v.2.4.2a)64 and processed with RSEM using the ‘single-cell-prior’ option (RNA-seq by expectation-maximization, v.1.2.25)65. The references provided to STAR and RSEM were the GTF annotation from UCSC Known Genes (mm10, v.6) and the associated isoform–gene relationship information from the Known Isoforms table (UCSC), adding information for ERCC sequences in samples from batch B. Tables were downloaded from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables) and for ERCCs, from the ThermoFisher website (http://www.thermofisher.com/order/catalog/product/4456739). Gene-level expression estimates in ‘Expected Counts’ from RSEM were used for the analysis.

scRNA-seq data processing QC

Cells scored as doublets, multiplets or debris during visual inspection of the C1 chip were excluded from the analysis. Datasets were also excluded if any of the following conditions were met: <500,000 reads (calculated using sam-stats from ea-utils.1.1.2-537)66; <60% of reads mapped (calculated with sam-stats); <50% reads mapped to mRNA (picard-tools-2.5.0, http://broadinstitute.github.io/picard/); >15% of reads mapped to chrM (sam-stats); if present, >20% of reads mapped to ERCCs (sam-stats). Following processing, 98 single cells passed quality thresholds in the final dataset. Correlations between previously published mES cells (clone 46C) mRNA-seq bulk62 and the scRNA-seq mES cell transcriptomes were performed to assess the quality of the single-cell data. Correlations were performed as previously described67. Average single-cell expression was highly correlated with bulk RNA-seq data (Extended Data Fig. 4c).

scRNA-seq analysis

To utilize published single-cell transcriptomes from brain cell types of interest, we selected P21–22 OLGs68, P22–32 CA1 PGNs69 and P21–26 VTA DNs70 on the basis of the cell type and subtype definitions provided in the respective publications. The matrices of counts provided in each publication, along with the single-cell mES cell transcriptomes produced that passed QC, were combined with no prior batch correction due to the lack of equivalent cell types across all single-cell datasets. The combined matrix of counts was normalized by applying the LogNormalize method and scaled using Seurat (v.3.1.4)71. The scaled data were used for a PCA, followed by processing through dimensionality reduction using uniform manifold approximation and projection (UMAP)72 for visualization purposes using the Seurat R package71, with default parameters. Visualization of known cell-type-specific marker genes confirmed that the different transcriptomes are grouped into cell-type-specific clusters (Extended Data Fig. 4e). Single mES cell transcriptomes from batch A and B clustered together, and were pooled for further analyses. Genes that could not be mapped to the chosen reference GTF were removed (UCSC; accessed from iGenomes July 17, 2015; https://support.illumina.com/sequencing/sequencing_software/igenome.html).

To generate bigwig tracks for visualization, raw fastq files from each single cell within the same cell type were pooled into one fastq file. Reads were mapped to the mouse genome (mm10) using STAR with default parameters but–outFilterMultimapNmax 10. BAM files were sorted and indexed using Samtools (v.1.3.1)73 and normalized (reads per kilobase of transcript per million (RPKM)) bigwigs were generated using Deeptools (v.3.1.3)74 bamCoverage. To account for differences in the number of technical replicates in OLG samples, cells were divided into groups by the number of runs (1, 2 and 6). The median of the reads for the group with the lowest sequencing depth was used as a threshold to normalize the other groups (that is, the rest of the fastq files were randomly downsampled to that number of reads). The three groups of raw reads were pooled together and processed by applying the same method as for the other cell types. Pseudobulk expression was determined using the regularized log (R-log) value for each gene (Extended Data Fig. 4f, g). In each cell type, only the genes with R-log values of ≥2.5 in all pseudobulk replicates were considered expressed.

Differential gene expression analysis

For differential expression analysis for all cell types, pseudobulk replicate samples were obtained by randomly partitioning the total number of single cells per dataset into three groups and pooling all unique molecular identifiers (UMIs) per gene of cells belonging to the same replicate. To determine differentially expressed genes, all six possible pairwise comparisons between samples were performed using DEseq2 (v.1.24.0) with default parameters75. In addition, shrunken log2 fold-changes were added with the lfcShrink function, using default parameters. Genes classified as differentially expressed in at least one comparison were considered for further analysis (adjusted P value < 0.05; Benjamini–Hochberg multiple testing correction method). A summary table for the differential expression analysis of all cell types can be found in Supplementary Table 12. For the TF motif analysis, only the differentially expressed genes obtained from the comparison between DNs and PGNs were considered for further analysis (Extended Data Fig. 9c, d).

Tn5 purification

The pTXB1 plasmid carrying the Tn5-intein-CBD fusion construct with the hyperactive Tn5 protein containing the E54K and L372P mutations was obtained from Addgene (plasmid 60240). Tn5 expression and purification was performed as previously described76, except that the final storage buffer was 50 mM HEPES-KOH pH 7.2, 0.8 M NaCl, 0.1 mM EDTA, 1 mM dithiothreitol and 55% glycerol.

Tn5 adapter mix preparation

To generate 100 μM adapter mix, 200 μM Tn5MErev (5′-[phos]CTGTCTCTTATACACATC) was mixed with of 200 μM Tn5ME-A (5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG; Adapter_mixA, 1:1 ratio). Separately, 200 μM Tn5MErev was mixed with 1 volume of 200 μM Tn5ME-B (5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG; Adapter_mixB, 1:1 ratio). The two mixtures were incubated for 5 min at 95 °C and gradually cooled to 25 °C at a ramp rate of 0.1 °C s–1. Finally, the Adapter_mixA was mixed with Adapter_mixB at a 1:1 ratio for a final 100 μM adapter mix.

mES cell ATAC-seq library preparation

ATAC-seq libraries were generated from approximately 75,000 mES cell nuclei following the Omni ATAC protocol77 with a modified transposition reaction: TAPS-DMF buffer (50 mM TAPS-NaOH, pH 8.5, 25 mM MgCl2, 50% DMF), 0.1% Tween-20, 0.1% digitonin, in 0.25x PBS. A total of 3 μl of the Tn5 mix (5.6 μg Tn5 and 0.143 volume of 100 μM adapter mix) was added to the transposition reaction mix. Libraries were prepared as described in the Omni ATAC protocol. The final library was sequenced with an Illumina NextSeq 500 machine according to manufacturer’s instructions, using paired-end 75 bp reads (150 cycles).

Isolation of the VTA for snATAC-seq

Male C57Bl/6Nl (RRID: IMSR_CR:027; WT) mice, aged 7 and 9 weeks, were killed by cervical dislocation. Brains were removed and the tissue containing the midbrain VTA was dissected from each hemisphere at room temperature and rapidly frozen on dry ice. Frozen midbrain samples were kept at −80 °C until further processing.

DN snATAC-seq library preparation

Two 10X Genomics scATAC-seq libraries from the midbrain VTA, VTA-1 and VTA-2 (from mice aged 7 or 9 weeks, respectively), were generated from midbrain VTA samples according to the 10X Genomics manual “Nuclei Isolation from Mouse Brain Tissue for Single Cell ATAC Sequencing Rev B” for flash-frozen tissue with minor adjustments. In brief, 500 μl 0.1× lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 1% BSA, 0.01% Tween-20, 0.01% Nonidet P40 substitute, 0.001% digitonin, and 1× complete Mini, EDTA-free protease inhibitor cocktail, Millipore-Sigma, 11836170001) was added to the frozen samples and immediately homogenized using a pellet pestle (15 times), followed by 5 min incubation on ice. The lysate was pipette mixed 10 times, then incubated 10 min on ice. Finally, 500 μl of chilled wash buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 2 mM MgCl2, 1% BSA, 0.1% Tween-20) was added to the lysed cells, and the suspension was passed through a 30-μm CellTrics strainers (Th Geyer, 7648779). The final approximately 500 μl nuclei suspension was stained with DAPI (final concentration 0.03 μg ml–1) for about 5 min.

Around 200,000 DAPI-positive events were sorted using a BD FACSAria III flow cytometer with 70-µm nozzle configuration with sample and sort collection device cooling set to 4 °C into 300 μl Diluted Nuclei buffer (commercial buffer from 10X Genomics) in a 1.5-ml Eppendorf tube. A first gate excluded debris in a forward scatter/side scatter plot (see examples in Extended Data Fig. 4h, i). A consecutive, second gate in a DAPI-A/DAPI-H plot was used to exclude doublets and nuclei with incomplete DNA content (BD FACSDiva software, v.8.0.2). The collected nuclei were centrifuged at 500g for 5 min at 4 °C and resuspended in 20 µl Diluted Nuclei buffer. The nucleus concentration was determined using a Countess II FL Automated Cell Counter in DAPI fluorescence mode. snATAC-seq libraries were prepared per the Chromium Next GEM Single Cell ATAC Reagent kits v.1.1 User Guide. In brief, nuclei were loaded on a microfluidics chip together with transposition reagents, transposase enzyme, beads with oligo-dT tags and oil to create an emulsion. Afterwards, the transposase reaction takes place inside the droplets. The barcoded cDNA is recovered from the emulsion, amplified and cleaned using a bead purification process. The cDNA is then using for library construction, including enzymatic fragmentation, adapter ligation and sample index PCR. Libraries were sequenced with either an Illumina NextSeq 500 machine using paired-end 75 bp reads (for VTA-1, 150 cycles) or a NovaSeq 6000 using paired-end 75 bp reads (for VTA-2, 100 cycles).

ATAC-seq data processing, mapping, processing and QC

For bulk mES cell ATAC-seq, paired-end reads were mapped to the mouse genome (mm10) using Bowtie with the following parameters:–minins 25–maxins 2000–no-discordant–dovetail–soft-clipped-unmapped-tlen. Low-quality mapped reads (MQ < 30) and mitochondrial reads were removed. Duplicated reads were removed with Sambamba78 (v.0.6.8). Reads passing quality checks were converted to BAM format for further analyses.

For VTA snATAC-seq, paired-end reads were demultiplexed and mapped to the mouse genome (mm10) using the 10X Genomics Cellranger software (version cellranger-atac-1.2.0). The two VTA snATAC-seq libraries were analysed using ArchR software (v.0.9.1)79. Doublets were removed following default parameters in ArchR. Next, low-quality cells (identified as TSS enrichment score <4 and <2,500 unique fragments per cell) were removed for further analyses.

Next, dimensionality reduction was performed using the Latent semantic indexing (LSI) dimensionality reduction method from ArchR, with default parameters (except iterations = 10, resolution = 0.2, varFeatures = 60,000). The ArchR addHarmony function was used to run the Harmony algorithm for batch correction with default parameters, followed by clusters calling. Gene scores were determined as specified by ArchR79. DNs were identified as the cluster with higher gene scores for Th, a well-known DN marker, and confirmed by additional DN marker expression (for example, Lmx1b, Foxa2, Foxa1 and Slc6a3). The DN cluster is composed of 216 cells in total (113 from VTA-1 and 103 from the VTA-2). UMI duplicates were collapsed to one fragment. To visualize an approximation for gene expression, gene scores were calculated using the createArrowFiles (addGeneScoreMat = TRUE) function in ArchR.

Processing of published OLG and PGN scATAC-seq

scATAC-seq BAM files for OLGs were downloaded from the sciATAC-seq in vivo atlas of the mouse brain80. Next, reads were extracted from the BAM file that corresponded to cells from the cluster identified as oligodendrocytes from the prefrontal cortex (458 cells), to produce a pseudobulk ATAC BAM file. The original data, mapped to the mm9 genome, were converted to mm10 using the liftOver tool from UCSC utilities (https://genome.ucsc.edu/cgi-bin/hgLiftOver).

scATAC-seq datasets were obtained from hippocampal PGNs81. A BAM file containing all cell types was supplied by A. Adey (Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA). Reads were extracted from the BAM file that corresponded to the NR1 PGN population (270 cells) to produce a pseudobulk ATAC BAM file.

Generation of normalized ATAC-seq bigwig tracks

A size factor normalization was applied to generate ATAC-seq bigwig tracks comparable between mES cells, OLGs, PGNs and DNs. First, a count matrix was generated for all TSS regions (±250 bp), which contained reads from at least two of the four cell types. The TSS list was extracted from the genes.gtf file included in the cell ranger reference data (refdata-cellranger-atac-mm10-1.2.0l; https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/advanced/references). To calculate size factors, the TSS count matrix was processed through DESeqDataSetFromMatrix and estimateSizeFactors from the DESeq2 package75. For all cell types, the scale factor (SF) = (cell type size factor) ×  −1.

Each pseudobulk ATAC-seq BAM file from mES cells, PGNs and OLGs was converted to the bedGraph format using the genomeCoverageBed function from bedtools57 with the following parameters: -pc -bg -scale SF. For DNs, ATAC-seq fragment files were converted to the bedGraph format using the genomeCoverageBed function from bedtools57 with the following parameters: -g chrom.sizes -bg -scale SF. The mm10 chrom.sizes file was downloaded from UCSC using fetchChromSize from UCSC utilities (http://hgdownload.soe.ucsc.edu/admin/exe/). The bedGraph files were then converted to bigwig using the bedGraphToBigWig function from UCSC utilities.

DN and PGN ATAC-seq peak calling

ATAC-seq peaks were called in DNs following the iterative overlap peak merging procedure described in the ArchR package79. First, two pseudobulk replicates were generated by running the addGroupCoverages function and then reproducible peaks were called using the addReproduciblePeakSet function. For PGNs, peaks for the NR1 cluster were obtained from Sinnamon et al.81. For further analyses, peaks were considered positive if they were found in at least 10% of single nuclei (>10 nuclei in DNs; >13 cells in PGNs).

RNA and ATAC-seq length-scaled ATAC reads per million

To calculate length-scaled RNA reads per million (lsRRPM) for 479 long genes (>300 kb), the mES cell BAM file (paired-end) was read using the readGAlignmentPairs function from the GenomicAlignments function from the GenomicAlignments package in R (v.1.20.1; https://bioconductor.org/packages/release/bioc/html/GenomicAlignments.html). For published single-cell datasets (OLGs, PGNs, DNs; single-end libraries), BAM files were loaded using the readGAlignments function from the GenomicAlignment package. Owing to the very long length of some reads, all BAM fragments were resized to the 5′ end base pair to avoid overlapping with multiple features. Next, the following formula was used to compute lsRRPM values for each cell type and per gene:

$${rm{l}}{rm{s}}{rm{R}}{rm{R}}{rm{P}}{rm{M}}={rm{n}}{rm{u}}{rm{m}}{rm{b}}{rm{e}}{rm{r}},{rm{o}}{rm{f}},{rm{o}}{rm{v}}{rm{e}}{rm{r}}{rm{l}}{rm{a}}{rm{p}}{rm{s}},{rm{b}}{rm{e}}{rm{t}}{rm{w}}{rm{e}}{rm{e}}{rm{n}},{rm{R}}{rm{N}}{rm{A}},{rm{f}}{rm{r}}{rm{a}}{rm{g}}{rm{m}}{rm{e}}{rm{n}}{rm{t}}{rm{s}},{rm{a}}{rm{n}}{rm{d}},{rm{l}}{rm{o}}{rm{n}}{rm{g}},{rm{g}}{rm{e}}{rm{n}}{rm{e}},{rm{b}}{rm{o}}{rm{d}}{rm{y}},{rm{g}}{rm{e}}{rm{n}}{rm{e}},{rm{l}}{rm{e}}{rm{n}}{rm{g}}{rm{t}}{rm{h}},{(10}^{-6})times ,{rm{t}}{rm{o}}{rm{t}}{rm{a}}{rm{l}},{rm{n}}{rm{u}}{rm{m}}{rm{b}}{rm{e}}{rm{r}},{rm{o}}{rm{f}},{rm{R}}{rm{N}}{rm{A}},{rm{f}}{rm{r}}{rm{a}}{rm{g}}{rm{m}}{rm{e}}{rm{n}}{rm{t}}{rm{s}},{(10}^{-6}),$$

To calculate length-scaled ATAC reads per million (lsARPM) for 479 long genes (>300 kb), concordant paired-end fragments were extracted for all cell types using the readGAlignmentPairs function from the GenomicAlignments package in R with the following total number of fragments: 37,261,746 (mES cells), 2,121,258 (OLGs), 4,594,229 (PGNs) and 8,939,526 (DNs). Next, the following formula was used to compute lsARPM values for each cell-type and per gene:

$${rm{l}}{rm{s}}{rm{A}}{rm{R}}{rm{P}}{rm{M}}={rm{n}}{rm{u}}{rm{m}}{rm{b}}{rm{e}}{rm{r}},{rm{o}}{rm{f}},{rm{o}}{rm{v}}{rm{e}}{rm{r}}{rm{l}}{rm{a}}{rm{p}}{rm{s}},{rm{b}}{rm{e}}{rm{t}}{rm{w}}{rm{e}}{rm{e}}{rm{n}},{rm{A}}{rm{T}}{rm{A}}{rm{C}},{rm{f}}{rm{r}}{rm{a}}{rm{g}}{rm{m}}{rm{e}}{rm{n}}{rm{t}}{rm{s}},{rm{a}}{rm{n}}{rm{d}},{rm{l}}{rm{o}}{rm{n}}{rm{g}},{rm{g}}{rm{e}}{rm{n}}{rm{e}},{rm{b}}{rm{o}}{rm{d}}{rm{y}},{rm{g}}{rm{e}}{rm{n}}{rm{e}},{rm{l}}{rm{e}}{rm{n}}{rm{g}}{rm{t}}{rm{h}},{(10}^{-6})times ,{rm{t}}{rm{o}}{rm{t}}{rm{a}}{rm{l}},{rm{n}}{rm{u}}{rm{m}}{rm{b}}{rm{e}}{rm{r}},{rm{o}}{rm{f}},{rm{A}}{rm{T}}{rm{A}}{rm{C}},{rm{f}}{rm{r}}{rm{a}}{rm{g}}{rm{m}}{rm{e}}{rm{n}}{rm{t}}{rm{s}},{(10}^{-6}),$$

GO analysis

GO term enrichment analysis was performed using GOElite (v.1.2.4)82. In Extended Data Fig. 4n, DN snATAC-seq marker genes were extracted with the getMarkerFeatures function from ArchR with default parameters. Marker genes were selected as genes with log2 fold change values of >1 and false discovery rate of <0.01 in the DN cluster compared with all clusters from the VTA (total of 973 genes). All unique genes were used as the background GO dataset. In Fig. 2c, all genes expressed in at least one cell type, annotated to mm10, were used as the background dataset. In Fig. 4d, e, all genes expressed in PGNs or DNs were used as the background dataset, and in Fig. 5a, b, all unique genes were used. Default parameters were used for the GO enrichment: GO terms that were enriched above the background (significant permuted P values of <0.05, 2,000 permutations) were pruned to select the terms with the largest Z-score (>1.96) relative to all corresponding child or parent paths in a network of related terms (genes changed >2). GO terms which had a permuted P value of ≥0.01, contained fewer than 6 genes per GO term or from the ‘cellular_component’ ontology, were not reported in the main figures. A full list of unfiltered GO terms can be found in Supplementary Table 7.

MELTRON pipeline

To assess gene insulation differences, insulation square values at 10 length scales (100–1,000 kb) were calculated for genes >300 kb in length (n = 479; calculated for a minimum 8× 50-kb bins, that is, 400 kb minimum length). Cumulative probability distributions of insulation square values were calculated for each dataset, and the brain cells were compared to mES cell probability distributions for each gene by computing the maximum distance between the distributions and applying a Kolmogorov–Smirnov test. P values were corrected for multiple testing using the Bonferroni method, and –log10 transformed to obtain a domain melting score. Domain melting scores for each gene in each comparison can be found in Supplementary Table 8. For visualization, empirical cumulative probabilities and insulation score values were smoothed using a Gaussian kernel density estimate (adjust = 0.3).

Calculation of the transcis contact ratio

To determine the interaction strength of contacts to all (trans) somatic chromosomes relative to interaction strength to their own (cis) chromosome, cis and trans NPMI-normalized matrices were calculated at 250-kb resolution. Bins detected in less than 3%, or more than 75%, of 3 NP samples were removed from the analysis. To be sensitive to outliers, NPMI values of both cis (NPMIC) and trans (NPMIT) contacts for every bin were summarized with the arithmetic mean. The transcis contact ratio was then obtained using the following formula:

$$transmbox{–}cis,{rm{c}}{rm{o}}{rm{n}}{rm{t}}{rm{a}}{rm{c}}{rm{t}},{rm{r}}{rm{a}}{rm{t}}{rm{i}}{rm{o}}=frac{{sum {rm{N}}{rm{P}}{rm{M}}{rm{I}}}_{{rm{T}}}div{rm{g}}{rm{e}}{rm{n}}{rm{o}}{rm{m}}{rm{i}}{rm{c}},{rm{b}}{rm{i}}{rm{n}}{rm{s}}({n}_{{rm{T}}})}{{sum {rm{N}}{rm{P}}{rm{M}}{rm{I}}}_{{rm{C}}}div{rm{g}}{rm{e}}{rm{n}}{rm{o}}{rm{m}}{rm{i}}{rm{c}},{rm{b}}{rm{i}}{rm{n}}{rm{s}}({n}_{{rm{C}}})}$$

Transcis values of bins spanning long genes were summarized with the median.

Modelling and in silico GAM

To reconstruct 3D conformations of the Nrxn3 locus, we employed the Strings & Binders Switch (SBS) polymer model of chromatin83,84. In the SBS model, a chromatin region is modelled as a self-avoiding chain of beads, including different binding sites for diffusing, cognate, molecular binders. Binding sites of the same type can be bridged by their cognate binders, which then drives polymer folding. The optimal SBS polymers for the Nrxn3 locus in mES cells and DNs were inferred using PRISMR, a machine-learning-based procedure that finds the minimal arrangement of the polymer binding sites that best describe input pairwise contact data, such as Hi-C22 or GAM85. Here, PRISMR was applied to the GAM experimental data by considering the NPMI normalization on a 4.8 Mb region around the Nrxn3 gene (chromosome 12: 87,600,000–92,400,000; mm10) at 50-kb resolution in mES cells and DNs. The procedure returned optimal SBS polymer chains made of 1,440 beads, including 7 different types of binding sites, in both cell types. A full list of x, y and z coordinates for mES cell and DN polymer model structures can be found in Supplementary Tables 9 and 10, respectively.

Next, to generate thermodynamic ensembles of 3D conformations of the locus, molecular dynamics simulations were run of the optimal polymers, using the freely available LAMMPS software (v.5june2019)86. In these simulations, the system evolves according to the Langevin equation, with dynamics parameters derived from classical polymer physics studies87. Polymers are first initialized in self-avoiding conformations and then left to evolve to reach their equilibrium globular phase83. Beads and binders have the same diameter σ = 1, expressed in dimensionless units, and experience a hard-core repulsion by use of a truncated Lennard–Jones potential. Analogously, attractive interactions are modelled with short-ranged Lennard–Jones potentials83. A range of affinities between beads and cognate binders were sampled in the weak biochemical range, from 3.0 KBT to 8.0 KBT (where KB is the Boltzmann constant and T the system temperature). In addition, binders interact nonspecifically with the polymer with a lower affinity, sampled from 0 KBT to 2.7 KBT. For the sake of simplicity, the same affinity strengths were used for all different binding site types. The total binder concentration was taken above the polymer coil–globule transition threshold83. For each of the considered cases, ensembles of up to 450 distinct equilibrium configurations were derived. Full details about the model and simulations are discussed in Barbieri et al.83 and Chiariello et al.84.

In silico GAM NPMI matrices were obtained from the ensemble of 3D structures by applying the in silico GAM algorithm10, here generalized to simulate the GAM protocol with 3 NPs per GAM sample and to perform NPMI normalization. In silico GAM NPMI matrices can be obtained using previously published algorithms10, by aggregating the content of three in silico slices into one tube, and then applying the NPMI normalization formula (see the section ‘Visualization of pairwise chromatin contact matrices’, therein10). Specifically, the same number of slices were used as in the GAM experiments, 249 × 3 NPs for mES cellCs and 585 × 3 NPs for DNs. Pearson’s correlation coefficients were used to compare the in silico and experimental NPMI GAM matrices.

Example of single 3D conformations were rendered by a third-order spline of the polymer bead positions, with regions of interest highlighted in different colours. To quantify the size and variability of the 3D structures in mES cells and DNs, the average gyration radius (Rg) was measured from the selected domains encompassing and surrounding the Nrxn3 gene, expressed in dimensionless units σ in Fig. 3d, Extended Data Fig. 7e. Analyses and plots were produced with the Anaconda package v.4.7.12, and 3D structure visualizations were produced with POV Ray, v.3.7 (http://www.povray.org/download/).

Cryosections for FISH experiments

Fixed and cryopreserved hippocampal CA1 tissue and mES cells were cryosectioned as previously described (see ‘Cryoblock preparation and cryosectioning’ above) with an approximate thickness of 400 nm and transferred to glass coverslips (thickness number 1.5, diameter 10 mm) coated with laminin (Sigma-Aldrich, P8920) according to the manufacturer’s instructions for the three-colour FISH experiment (TSS, middle and TES), or washed in 100% ethanol and autoclaved for the immunofluorescence whole-gene FISH experiment (nucleolus, Rbfox1).

BAC probes labelling and precipitation

BACs targeting the Rbfox1 locus (Supplementary Table 11) were obtained from the BACPAC Resources Center (https://bacpacresources.org) and amplified from glycerol stocks using a MIDIprep kit (NucleoBond Xtra BAC purification kit, Machery-Nagel, 740436). Purified BACs were labelled using a nick translation kit (Abbott Molecular, 7J0001) according to the manufacturer’s instructions and the following fluorophores (all Invitrogen, Thermo Fisher Scientific): ChromaTide Alexa Fluor 488-5-dUTP (C11397), ChromaTide Alexa Fluor 568-5-dUTP (C11399) and Alexa Fluor 647-aha-dUTP (A32763). Labelled BAC probes were co-precipitated with yeast tRNA (20 μg μl–1 final concentration; Invitrogen, AM7119) and mouse Cot-1 DNA (3 μg μl–1 final concentration; Invitrogen, 18440-016) overnight at −20 °C. After clean up in 70% ethanol, the probes were dissolved in 100% deionized formamide (for 1 h; Sigma, F9037) before adding (1:1) a 2× hybridization mix (20% dextran sulfate, 0.1 M phosphate buffer in 4× saline-sodium citrate (SSC); mixing for 1 h), denatured (10 min, 80 °C), and reannealed (30 min, 37 °C) before hybridization.

Immunolabelling before FISH

Immunofluorescence labelling of the nucleolus was performed as described above (‘Immunofluorescence detection for confocal microscopy’) by incubating the cryosections overnight (at 4 °C) with a mouse monoclonal antibody anti-nucleophosmin B23 (a gift from H. Busch49), followed by incubation (1 h) with donkey antibodies raised against mouse IgG conjugated with Alexa Fluor-555 (Invitrogen). Before cryo-FISH, the bound antibodies were fixed (1 h, 4 °C) in 8% depolymerized PFA (EM-grade) in 250 mM HEPES–NaOH (pH 7.6) and rinsed in PBS.

Cryo-FISH

Cryo-FISH was performed as previously described2,23 with a few modifications. In brief, cryosections were washed (30 min) in 1× PBS, rinsed with 2× SSC (Sigma, S6639) and incubated (2 h, 37 °C) in 250 μg ml–1 RNase A (Sigma, R4642) in 2× SSC. After washing in 2× SSC, cryosections were treated (10 min) with 0.1 M HCl, dehydrated in ethanol (30%, 50%, 70%, 90%, 100% series, 3 min each on ice) and denatured (10 min) at 80 °C in 70% formamide, 2× SSC, 0.05 M phosphate buffer (pH 7.4). Cryosections were dehydrated as described above, and overlaid on hybridization mixture on HybriSlip (Invitrogen, H18202). After sealing with rubber cement and incubation (48 h, 37 °C) in a moist chamber, cryosections were washed (25 min, 42 °C) in 50% formamide in 2× SSC, (30 min, 60 °C) in 0.1× SSC and (10 min, 42 °C) in 0.1% Triton X-100 in 4× SSC. After rinsing with 1× PBS, coverslips were mounted in Vectashield mounting medium (anti-Fading) with DAPI (Vector Laboratories, H-1200).

Cryo-FISH microscopy

Cryo-FISH images were collected sequentially with a Leica TCS SP8-STED confocal microscope (Leica Microsystems DMI6000B-CS) using Leica Application Suite X v.3.5.5.19976 and a HC PL APO CS2 ×63/1.40 oil objective (numerical aperture of 1.4, Plan Apochromat) (see ‘Immunofluorescence detection for confocal microscopy’) using the following settings: 405-nm excitation and 420–500-nm emission (for DAPI), 488-nm excitation and 510–535-nm emission (for probes labelled with ChromaTide Alexa Fluor-488 and for nucleophosmin), 568-nm excitation and 586–620-nm emission (for probes labelled with ChromaTide Alexa Fluor-568), 647-nm excitation and 657–700-nm emission (for probes labelled with Alexa Fluor-647), and 555-nm excitation and 586–640-nm emission (for immunofluorescence labelling of nucleophosmin with Alexa Fluor-555). All images were collected with a ×4 zoom at 1,024 × 1,024 pixel resolution (pixel size of 0.0451 μm, resolution of 22.1760 pixels μm–1).

Cryo-FISH image analysis

Images were analysed using Fiji software (v.2.0.0-rc-69/1.52p)88. All images were pre-processed as previously described23. Genomic foci were visually identified, and areas of the manually defined objects were measured using the Fiji-Area tool. For the cryo-FISH experiment combined with immunofluorescence, the location of genomic loci in relation to the nuclear lamina or nucleolus was assessed on the basis of the overlap of foci with the nucleolus (identified by nucleophosmin immunolabelling) or the nuclear lamina (as defined by the periphery of the DAPI staining) by at least three pixels. To determine the distance between the TSS, middle and TES genomic foci, we took the centre of mass of the selected objects, as defined by Fiji-Center of mass function (the brightness-weighted average of the x and y coordinates of all pixels within the selected areas). Distances between the objects were measured using the Fiji-Line tool between the centres of mass defined for each object. Images for visualization in figure panels were processed using Fiji or Adobe Photoshop CS6, for which adjustments included the optimization of the dynamic signal range with contrast stretching.

Determination of differential contacts between GAM datasets

Significant differences in pairwise contacts between a pair of GAM datasets were determined as previously described with modifications9. In brief, genomic windows with low detection, defined as less than 2% of the distribution of all detected genomic windows for each chromosome, were removed from both datasets to be compared. Contacts were filtered to be within 0.5–5 Mb distance and above 0.15 NPMI, and NPMI contact frequencies at each genomic distance of each chromosome were normalized by computing a Z-score transformation, and a differential matrix (D) was derived by subtracting the two Z-score normalized matrices9.

TF-binding site analysis

To find TF-binding motifs present within specific contacts, significant differential contacts were determined for DNs and PGNs. Accessible regions within the differential contacts were determined using scATAC-seq for PGNs81 and DNs. To account for methodological differences, including lower sequencing depth in PGN scATAC-seq data (Extended Data Fig. 4l), we considered only the peaks that occurred in >10% of cells (>10 cells in DNs; >13 in PGNs). Motif finding within accessible regions in significant contacts was performed using the Regulatory Genomics Toolbox (v.0.12.3; https://www.regulatory-genomics.org/motif-analysis/introduction/) with TF motifs (from the HOCOMOCO database, v.11)89 obtained for TFs expressed in either DNs or PGNs (R-log ≥ 2.5) to determine the percentage of windows containing each TF motif. Next, TF motifs were filtered based on (1) the percentage of windows containing the motif (>5%) and (2) the differential expression in either PGNs or DNs (–log10(adjusted P value)  > 3, see ‘Differential gene expression analysis’ above), which resulted in 50 TF motifs for feature pair analysis (33 TF motifs from PGNs and 17 from DNs; Extended Data Fig. 9c, d).

Feature pairs associated with specific contacts were determined as previously described9 and testing the 1,275 combinations of motif pairs (1,225 heterotypic motif pairs and 50 homotypic motif pairs). The number of contacts containing each pair of selected TF motifs (PGNTF and DNTF), together with the percentage of total significant differential contacts in PGNs and DNs (PGN and DN), were used to determine the enrichment score for all TF feature pair interactions (that is, the ratio between frequencies of contacts in PGNs or DNs, (PGNTF/PGN)/(DNTF/DN)). The effectiveness of a TF pair for discriminating between contacts from PGNs and DNs was assessed by using the information gain measure90. Enrichment and information gain for all TF feature pair interactions, as well as differential expression values for TFs (DNs compared to PGNs), can be found in Supplementary Table 13. The top feature pairs were extracted on basis of the highest information gain (ten feature pairs), PGN enrichment (five feature pairs) and DN enrichment (five feature pairs) scores. Contact overlaps for top feature pairs were visualized using UpSet plots.

Network and community detection analysis of TF-binding sites in significant differential contacts

To determine the interconnectivity between different TF motifs found in accessible regions of significant differential contacts, the number of contacts for each pair of TF motifs (1,275 pairs) was determined. After filtering pairs of TF motifs involved in less than 20% of the total contacts (15,833 and 5,400 contacts minimum in PGNs and DNs, respectively), a network was built for each cell type with TF motifs as nodes and number of contacts as weighted edges. The Leiden algorithm was used to detect communities of strongly interconnected nodes, using the leiden package in R91,92, with a resolution of 1.01 for both PGNs and DNs (Extended Data Fig. 10f, Supplementary Table 14).

GAM aggregated contact plots

To visualize the average contact intensity for a set of genomic contacts, NPMI contact frequencies at each genomic distance of each chromosome were first normalized by computing a Z-score transformation. The resulting Z-score values were determined for each contact and for each contact in a 4-bin radius (50-kb bins). For each chromosome, Z-score values for each set of contacts and for the surrounding bins were summarized by the arithmetic mean. Mean values computed for each chromosome were added together and divided by the number of chromosomes.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Source link