May 28, 2024

Human neocortical expansion involves glutamatergic neuron diversification – Nature

Detailed descriptions of all experimental data collection methods in the form of technical white papers can also be found under ‘Documentation’ at http://celltypes.brain-map.org.

Human tissue acquisition

Surgical specimens were obtained from local hospitals (Harborview Medical Center, Swedish Medical Center and University of Washington Medical Center) in collaboration with local neurosurgeons. All patients (Supplementary Table 1) provided informed consent and experimental procedures were approved by hospital institute review boards before commencing the study. Tissue was placed in slicing artificial cerebral spinal fluid (ACSF) as soon as possible following resection. Slicing ACSF comprised52 (in mM): 92 N-methyl-d-glucamine chloride (NMDG-Cl), 2.5 KCl, 1.2 NaH2PO4, 30 NaHCO3, 20 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES), 25 d-glucose, 2 thiourea, 5 sodium-l-ascorbate, 3 sodium pyruvate, 0.5 CaCl2.4H2O and 10 MgSO4.7H2O. Before use, the solution was equilibrated with 95% O2, 5% CO2 and the pH was adjusted to 7.3 by addition of 5N HCl solution.  Osmolality was verified to be between 295–305 mOsm kg−1.  Human surgical tissue specimens were immediately transported (15–35 min) from the hospital site to the laboratory for futher processing.

Mouse breeding and husbandry

All procedures were carried out in accordance with the Institutional Animal Care and Use Committee at the Allen Institute for Brain Science. Animals (<5 mice per cage) were provided food and water ad libitum and were maintained on a regular 12-h light:dark cycle; rooms were kept at 21.1 °C and 45–70% humidity. Mice were maintained on the C57BL/6J background, and newly received or generated transgenic lines were backcrossed to C57BL/6J. Experimental animals were heterozygous for the recombinase transgenes and the reporter transgenes.

Tissue processing

For mouse experiments, male and females were used between the ages of postnatal day (P)45 and P70 were anaesthetized with 5% isoflurane and intracardially perfused with 25 or 50 ml of 0–4 °C slicing ACSF. Human or mouse acute brain slices (350 μm) were prepared with a Compresstome VF-300 (Precisionary Instruments) or VT1200S (Leica Biosystems) vibrating microtome modified for block-face image acquisition (Mako G125B PoE camera with custom integrated software) before each section to aid in registration to the common reference atlas. Brains or tissue blocks were mounted for slicing with the optimal orientation for preserving intactness of apical dendrites of neocortical pyramidal neurons.

Slices were transferred to an oxygenated and warmed (34 °C) slicing ACSF for 10 min, then transferred to room temperature holding ACSF of the composition52 (in mM): 92 NaCl, 2.5 KCl, 1.2 NaH2PO4, 30 NaHCO3, 20 HEPES, 25 d-glucose, 2 thiourea, 5 sodium-l-ascorbate, 3 sodium pyruvate, 2 CaCl2.4H2O and 2 MgSO4.7H2O for the remainder of the day until transferred for patch clamp recordings. Before use, the solution was equilibrated with 95% O2, 5% CO2 and the pH was adjusted to 7.3 using NaOH. Osmolality was verified to be between 295–305 mOsm kg−1.

Patch clamp recording

Slices were bathed in warm (32–34 °C) recording ACSF containing the following (in mM): 126 NaCl, 2.5 KCl, 1.25 NaH2PO4, 26 NaHCO3, 12.5 d-glucose, 2 CaCl2.4H2O and 2 MgSO4.7H2O (pH 7.3), continuously bubbled with 95% O2 and 5% CO2. The bath solution contained blockers of fast glutamatergic (1 mM kynurenic acid) and GABAergic synaptic transmission (0.1 mM picrotoxin). Thick-walled borosilicate glass (Warner Instruments, G150F-3) electrodes were manufactured (Narishige PC-10) with a resistance of 4–5 MΩ. Before recording, the electrodes were filled with ~1.0–1.5 µl of internal solution with biocytin (110 mM potassium gluconate, 10.0 mM HEPES, 0.2 mM ethylene glycol-bis (2-aminoethylether)-N,N,N′,N′-tetraacetic acid, 4 mM potassium chloride, 0.3 mM guanosine 5′-triphosphate sodium salt hydrate, 10 mM phosphocreatine disodium salt hydrate, 1 mM adenosine 5′-triphosphate magnesium salt, 20 µg ml−1 glycogen, 0.5 U µl−1 RNAse inhibitor (Takara, 2313A)  and 0.5% biocytin (Sigma B4261), pH 7.3). The pipette was mounted on a Multiclamp 700B amplifier headstage (Molecular Devices) fixed to a micromanipulator (PatchStar, Scientifica).

The composition of bath and internal solution as well as preparation methods were made to maximize the tissue quality, to align with solution compositions typically used in the field (to maximize the chance of comparison to previous studies), and modified to reduce RNAse activity and ensure maximal recovery of mRNA content.

Electrophysiology signals were recorded using an ITC-18 Data Acquisition Interface (HEKA). Commands were generated, signals were processed and amplifier metadata were acquired using MIES (https://github.com/AllenInstitute/MIES/), written in Igor Pro (Wavemetrics). Data were filtered (Bessel) at 10 kHz and digitized at 50 kHz. Data were reported uncorrected for the measured (Neher 1992) –14 mV liquid junction potential between the electrode and bath solutions.

Before data collection, all surfaces, equipment and materials were thoroughly cleaned in the following manner: a wipe down with DNA away (Thermo Scientific), RNAse Zap (Sigma-Aldrich) and finally with nuclease-free water.

For human slices, pyramidal shaped neurons in L2-3 were targeted. For mouse experiments, pyramidal neurons in L2/3 were targeted, either tdTomato pyramidal neurons when recording from a transgenic line that labels interneurons, or tdTomato+ neurons when recording from a line that labels different populations of L2/3 glutamatergic neurons, specifically Oxtr-T2A-Cre and Penk-IRES2-Cre-neo, each crossed to the Ai14 tsTomato reporter line.

After formation of a stable seal and break-in, the resting membrane potential of the neuron was recorded (typically within the first minute). A bias current was injected, either manually or automatically using algorithms within the MIES data acquisition package, for the remainder of the experiment to maintain that initial resting membrane potential. Bias currents remained stable for a minimum of 1 s before each stimulus current injection.

To be included in analysis, a cell needed to have a >1 GΩ seal recorded before break-in and an initial access resistance <20 MΩ and <15% of the input resistance (Rinput). To stay below this access resistance cut-off, cells with a low input resistance were successfully targeted with larger electrodes. For an individual sweep to be included, the following criteria were applied: (1) the bridge balance was <20 MΩ and <15% of the Rinput; (2) bias (leak) current 0 ± 100 pA; and (3) root mean square noise measurements in a short window (1.5 ms, to gauge high frequency noise) and longer window (500 ms, to measure patch instability) <0.07 mV and 0.5 mV, respectively.

Each cell was recorded using a standardized stimulus paradigm, including square pulses, ramps and noisy current injections, with the goal of extracting features that could be compared across cells, rather than tailoring each stimulus to the physiological input of that neuron.

Upon completion of electrophysiological examination, the pipette was centered on the soma or placed near the nucleus (if visible). A small amount of negative pressure was applied (~−30 mbar) to begin cytosol extraction and attract the nucleus to the tip of pipette. After approximately one minute, the soma had visibly shrunk and/or the nucleus was near the tip of the pipette. While maintaining the negative pressure, the pipette was slowly retracted in the x and z direction. Slow, continuous movement was maintained while monitoring pipette seal. Once the pipette seal reached >1 GΩ and the nucleus was visible on the tip of the pipette, the speed was increased to remove the pipette from the slice. The pipette containing internal solution, cytosol and nucleus was removed from pipette holder and contents were expelled into a PCR tube containing the lysis buffer (Takara, 634894).

cDNA amplification and library construction

We performed all steps of RNA-processing and sequencing as described for mouse Patch-seq cells41. We used the SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing (Takara, 634894) to reverse transcribe poly(A) RNA and amplify full-length cDNA according to the manufacturer’s instructions. We performed reverse transcription and cDNA amplification for 20 PCR cycles in 0.65 ml tubes, in sets of 88 tubes at a time. At least 1 control 8-strip was used per amplification set, which contained 4 wells without cells and 4 wells with 10 pg control RNA. Control RNA was either Universal Human RNA (UHR) (Takara 636538) or control RNA provided in the SMART- Seq v4 kit. All samples proceeded through Nextera XT DNA Library Preparation (Illumina FC-131-1096) using either Nextera XT Index Kit V2 Sets A-D(FC-131-2001,2002,2003,2004) or custom dual-indexes provided by Integrated DNA Technologies (IDT). Nextera XT DNA Library prep was performed according to manufacturer’s instructions except that the volumes of all reagents including cDNA input were decreased to 0.2× by volume.  Each sample was sequenced to approximately 1 million reads.

RNA-seq data processing

Fifty-base-pair paired-end reads were aligned to GRCh38.p2 using a RefSeq annotation gff file retrieved from NCBI on 11 December 2015 for human and to GRCm38 (mm10) using a RefSeq annotation gff file retrieved from NCBI on 18 January 2016 for mouse (https://www.ncbi.nlm.nih.gov/genome/annotation_euk/all/). Sequence alignment was performed using STAR v2.5.353 in two pass Mode. PCR duplicates were masked and removed using STAR option bamRemoveDuplicates. Only uniquely aligned reads were used for gene quantification. Gene counts were computed using the R Genomic Alignments package summarizeOverlaps function using IntersectionNotEmpty mode for exonic and intronic regions separately54.  Expression levels were calculated as counts of exonic plus intronic reads.  For most analyses, log2(counts per million (CPM) + 1)-transformed values were used.

Morphological reconstruction

Biocytin histology

A horseradish peroxidase (HRP) enzyme reaction using diaminobenzidine (DAB) as the chromogen was used to visualize the filled cells after electrophysiological recording, and 4,6-diamidino-2-phenylindole (DAPI) stain was used to identify cortical layers as described previously9.

Imaging of biocytin-labelled neurons

Mounted sections were imaged as described previously9. In brief, operators captured images on an upright AxioImager Z2 microscope (Zeiss, Germany) equipped with an Axiocam 506 monochrome camera and 0.63× Optivar lens. Two-dimensional tiled overview images were captured with a 20× objective lens (Zeiss Plan-NEOFLUAR 20×/0.5) in bright-field transmission and fluorescence channels. Tiled image stacks of individual cells were acquired at higher resolution in the transmission channel only for the purpose of automated and manual reconstruction. Light was transmitted using an oil-immersion condenser (1.4 NA). High-resolution stacks were captured with a 63× objective lens (Zeiss Plan-Apochromat 63×/1.4 Oil or Zeiss LD LCI Plan-Apochromat 63x/1.2 Imm Corr) at an interval of 0.28 µm (1.4 NA objective) or 0.44 µm (1.2 NA objective) along the z axis. Tiled images were stitched in ZEN software and exported as single-plane TIFF files.

Morphological reconstruction

Reconstructions of the dendrites and the full axon were generated for a subset of neurons with good quality transcriptomics, electrophysiology and biocytin fill. Reconstructions were generated based on a 3D image stack that was run through a Vaa3D-based image processing and reconstruction pipeline55. Images were used to generate an automated reconstruction of the neuron using TReMAP56. Alternatively, initial reconstructions were created manually using the reconstruction software PyKNOSSOS (Ariadne-service) or the citizen neuroscience game57 Mozak (Mosak.science). Automated or manually-initiated reconstructions were then extensively manually corrected and curated using a range of tools (for example, virtual finger and polyline) in the Mozak extension (Zoran Popovic, Center for Game Science, University of Washington) of Terafly tools58,59 in Vaa3D. Every attempt was made to generate a completely connected neuronal structure while remaining faithful to image data. If axonal processes could not be traced back to the main structure of the neuron, they were left unconnected.

Before morphological feature analysis, reconstructed neuronal morphologies were expanded in the dimension perpendicular to the cut surface to correct for shrinkage17,60 after tissue processing. The amount of shrinkage was calculated by comparing the distance of the soma to the cut surface during recording and after fixation and reconstruction. A tilt angle correction was also performed based on the estimated difference (via CCF registration) between the slicing angle and the direct pia-white matter direction at the cell’s location9.

Slice immunohistochemistry

Immunohistochemistry

Tissue slices (350 µm-thick) designated for histological profiling were fixed for 2–4 days in 4% paraformaldehyde (PFA) in phosphate-buffered saline (PBS) at 4 °C and transferred to PBS, 0.1% sodium azide for storage at 4 °C. Slices were then cryoprotected in 30% sucrose, frozen and re-sectioned at 30 µm using a sliding microtome (Leica SM2000R). Sections were stored in PBS with azide at 4 °C in preparation for immunohistochemical and Nissl staining. Specific probes (vendor, dilution) used were: Neu-N (Millipore, MAB377, 1:2,000); SMI-32 (Biolegend, 801704, 1:2,000); GFAP (Millipore, MAB360, 1:1,500); parvalbumin (Swant, PV235, 1:2,000); IBA1 (Wako 019-19741, 1:1,000); Ki-67 (Dako M724001-2, 1:200). Full immunohistology protocol details available at http://help.brain-map.org/download/attachments/8323525/CellTypes_Morph_Overview.pdf?version=4&modificationDate=1528310097913&api=v2

Slide imaging

Colorimetric immunohistochemistry and other histologically-stained whole slides (that is, Nissl-stained preparations) for bright-field imaging were scanned using an Aperio ScanScope XT slide scanner (Leica Biosystems). The samples were illuminated using a 21DC Halogen Lamp (Techniquip). Bright-field images were acquired using ScanScope Console (v101.0.0.18) and controller (ve101.0.4.446) at 10× magnification (objective lens 20×/0.75 NA Plan Apo, 0.5× magnifier) resulting in a pixel size of 1.0 µm per pixel.

Pathology scoring

For every case, each set of images per histological marker were independently scored by three pathologists using a 4-point scale, where 0 is normal and 3 is overtly pathological. Well-established histological markers were used to evaluate cellularity (Nissl), neuronal density and layer orientation (NeuN), astrogliosis (GFAP), microglial activation state (IBA1), non-phosphorylated neurofilament-H (using antibody SMI-32), and cellular proliferation (Ki-67).

mFISH

Fresh-frozen human postmortem brain tissues were sectioned at 14–16 μm onto Superfrost Plus glass slides (Fisher Scientific). Sections were dried for 20 min at −20 °C and then vacuum sealed and stored at −80 °C until use. The RNAscope multiplex fluorescent v1 kit was used per the manufacturer’s instructions for fresh-frozen tissue sections (ACD Bio), except that fixation was performed for 60 min in 4% paraformaldehyde in 1× PBS at 4 °C and protease treatment was shortened to 10 min. For combined RNAscope and immunohistochemistry, primary antibodies were applied to tissues after completion of mFISH staining. Primary mouse anti-neurofilament H (SMI-32, Biolegend, 801701) was applied to tissue sections at a dilution of 1:250. Secondary antibodies (1:250) were goat anti-mouse IgG (H+L) Alexa Fluor conjugates (594 or 647, ThermoFisher Scientific A-11005 or 21235). Sections were imaged using a 60× oil-immersion lens on a Nikon TiE fluorescence microscope equipped with NIS-Elements Advanced Research imaging software (version 4.20). For all RNAscope mFISH experiments, positive cells were called by manually counting RNA spots for each gene. Cells were called positive for a gene if they contained ≥3 RNA spots for that gene. Lipofuscin autofluorescence was distinguished from RNA spot signal based on the larger size of lipofuscin granules and broad fluorescence spectrum of lipofuscin. The following probe combinations were applied to label cell types of interest: (1) LTK: LTK (NM_002344.5) and LAMP5 (NM_012261.3); (2) GLP2R: GLP2R (NM_004246.2) and CUX2 (NM_015267.3); (3) FREM3: RORB (NM_006914.3) and FREM3 (NM_001168235.2); (4) CARM1P1: RORB and CARTPT (NM_004291.3); (5) COL22A1: RORB and COL22A1 (NM_152888.3); (6) Adamts2: Cbr3 (NM_173047.3), Neurod1 (NM_010894.2) and Cdh13 (NM_019707.4); (7) Rrad: Nr4a3 (NM_015743.3), Cux1 (NM_009986.4) and Cdh13; (8) Agmat: Pou3f2 (NM_008899.2), Igfbp7 (NM_001159518.1) and Coch (NM_001198835). Experiments were repeated on at least n = 2 donors per probe combination for both mouse and human.

Quantification of human and mouse soma size

Images of NeuN+ stained sections from human MTG (1 section per donor for 5 donors) and mouse VISp (1 section per mouse for 3 mice) (described above) were imported into ImageJ for processing. Regions of interest (ROIs) were drawn around cell bodies and exported as .roi files for downstream processing. In both species, L4 is defined as a band of densely packed, small granular cells, and the upper bound of this band (which includes overlying large pyramidal cells) is treated as the border between L3 and L4.  The border between L1 and L2 is defined as the sharp boundary between the cell-sparse zone of L1 and the is a cell-dense zone of L2. In mouse, the border between L2 and L3 is indistinguishable and not defined. In human MTG, the boundary between L2 and L3 can be closely approximated as transition from densely packed small pyramidal cells to less densely packed larger pyramidal cells, which is largely consistent among donors.

Soma areas were defined as the number of pixels contained in each ROI, scaled by the number of pixels per µm. Cortical depth was defined for each cell as the position of that cell centroid relative to pia (absolute depth) or relative to the L1/2 and L3/4 boundaries (scaled depth) at that position in the tissue. The number of neurons per mm2 of L2-3 neocortex (absolute density) is the number of neurons per image scaled by the area of the image where cell counts were assessed.  For measuring surface density and cell area across L2-3 cortical depth, L2-3 was split into 20 evenly sized bins and the relevant measurements within each bin were calculated independently per section (one section per donor) and the average and standard deviation across sections were reported.  The first and last bins are omitted from plots as they display boundary effects.  Relative (scaled) neuron density scales to 1 for each donor and is defined as the fraction of total neuron count in each bin.  In human, a nadir of scaled density was identified at −0.575, which we define as a quantitative boundary between superficial and deep L3 in this manuscript.

Analysis of data from dissociated cells and nuclei

Reference data used in this study include dissociated excitatory cells (mouse) or nuclei (human) collected from human MTG3 and mouse VISp26, and are all publicly accessible at the Allen Brain Map data portal (https://portal.brain-map.org/atlases-and-data/rnaseq). In human, cells from the five previously identified L2-3 glutamatergic types were retained, subsampling to match the laminar distribution of neurons included in the Patch-seq dataset as closely as possible, leaving a total of 2,948 neurons from LTK, GLP2R, FREM3, CARM1P1 and COL22A1 t-types.  In mouse, all neurons from the three L2/3 glutamatergic t-types (Adamts2, Rrad and Agmat) were retained.  Datasets were visualized as follows.  First, the top 2,000 most binary genes by beta score3 were selected. Beta score  is defined as the squared differences in proportions of cells or nuclei in each cluster that expressed a gene above 1, normalized by the sum of absolute differences plus a small constant (ε) to avoid division by zero. Scores ranged from 0 to 1, and a perfectly binary marker had a score equal to 1.  Second, the Seurat pipeline39,40 (more details below) was used to scale the data, reduce the dimensionality using principal component analysis (PCA) (30 PCs).  These PCs were then used to generate a UMAP61. Finally, data and metadata such as cluster, subcluster, layer and gene expression were then overlaid onto this UMAP space using different colored or shaded points.

Cluster heterogeneity is defined as average observed variance explained by the first PC compared with permuted data after accounting for differences in the number of cells per cell type.  To get this, we (1) randomly selected 80 cells from each cell type, (2) identified the 80 most variable genes using the FindVariableFeatures Seurat function with selection.method=”vst”, (3) performed PCA after removing outlier cells, (4) calculated the percent of variance explained by the first PC, (5) repeated steps 1–4 for 100 sets of data where the expression levels for each gene are shuffled across the 80 cells to break gene correlations but retain other gene statistics, and (6) identified the average and standard deviation of PC1 for observed versus permuted data.  Cluster discreteness is defined as the average number of differentially expressed genes between a given type and each of the remaining homologous t-types (LTKGLP2R and FREM3 t-types in human; RradAgmat and Adamts2 t-types in mouse).  In this case pairwise differential expression is defined using the de_score function in the scrattch.hicat R library26 after subsampling each cluster to 80 cells, and only the genes with higher expression in the relevant cluster are considered.  The getMarkers function from the genesorteR R library (https://github.com/mahmoudibrahim/genesorteR)42 was used to identify genes differentially expressed genes between deep FREM3 (f73 subtype; collected from L3 or L4 dissection), COL22A1 and CARM1P1 neurons, using all default parameters except quant = 0.7.  To validate the cell selection for deep L3 (since sublaminar dissection was not performed on the dissociated nuclei data), this analysis was repeated on Patch-seq neurons from these three types collected in deep L3 (scaled depth < −0.575). GO enrichment analysis was performed using ToppGene62 with default settings, and Bonferroni-corrected P-values are reported unless stated otherwise.

Dataset curation

Patch-seq cells were included in this dataset if they met the following criteria.  All neurons: (1) had high-quality transcriptomic data, measured as the normalized summed expression (NMS, adapted from the single-cell quality control measures in ref. 63) of ‘on’-type marker genes greater than 0.4; and (2) retained a soma through biocytin processing and imaging such that an accurate laminar association could be made.  In addition, mouse neurons were: (1) located within VISp; (2) either tdTomato or tdTomato+ from a line known to label glutamatergic neurons (that is, tdTomato+ neurons from known inhibitory mouse lines were excluded); (3) mapped to L2/3 IT VISp Rrad, L2/3 IT VISp Agmat, or L2/3 IT VISp Adamts2 using Seurat mapping (as described below); and (4) mapped to L2/3 IT VISp Rrad, L2/3 IT VISp Agmat, L2/3 IT VISp Adamts2, or L4 IT VISp Rspo1 in a separate Seurat mapping analysis where only reads located within gene introns are considered for both datasets.  This final filter removes Patch-seq cells that jointly express markers for GABAergic and glutamatergic cells, probably representing L2/3 GABAergic neurons contaminated with adjacent glutamatergic cells.  We do not find examples of such cells in human, possibly owing to a much smaller sampling of GABAergic cells than in the mouse.

Identifying t-types

Due to the differences in gene expression between Patch-seq and dissociated cells (see Extended Data Fig. 3b and a companion mouse study41), we used transcriptomes of dissociated human nuclei3 or cells26 as reference datasets for human and mouse, respectively, and mapped Patch-seq transcriptomes to the reference data to identify their cell types.  Prior to data transfer, we filtered genes potentially related to technical variables.  X and Y chromosomes were excluded to avoid nuclei mapping based on sex. Many mitochondrial genes have expression correlated with RNA-seq data quality in dissociated nuclei data3, so nuclear and mitochondrial genes downloaded from Human MitoCarta2.064 were also excluded.  We also find that Patch-seq cells often have high expression of non-neuronal marker genes, so any genes most highly expressed in a non-neuronal cell type are excluded.  Finally, any genes showing at least four-fold higher expression in dissociated nuclei versus Patch-seq cells in the included cell types (or vice versa) were excluded as potentially platform dependent.  In total 23,129 of 50,281 genes (46%) remained in human and a comparable fraction for mouse.  Variable genes for mapping were selected as described above for dissociated nuclei data visualization, by using the top 2,000 remaining genes by beta score as input into the procedure described below.

For both species, we mapped Patch-seq datasets to the relevant dissociated cells or nuclei reference using Seurat V3 (https://satijalab.org/seurat/)39,40 following the tutorial for integration and label transfer with default parameters for all functions, except when they differed from those used in the tutorial, and replacing variable gene selection with the genes described above. More specifically, we first defined a low (30)-dimensional PCA space of the dissociated cells or nuclei dataset and then project this onto the Patch-seq dataset.  We then found transfer anchors (cells that are mutual nearest neighbours between datasets) in this subspace.  Each anchor is weighted on the basis of the consistency of anchors in its local neighbourhood, and these anchors were then used as input to guide label transfer (or batch correction), as described previously65. We then scaled the data, reduced the dimensionality using PCA, and visualized the results with UMAP61. This process is done using the FindTransferAnchors and TransferData R functions, which provide both the best mapping cell type and a confidence score.  For mouse data, the three homologous types did not provide a heterogenous enough reference dataset, and therefore a larger set of glutamatergic and GABAergic cell types was used as reference.  Cell-type assignments for most cells were robust to choice of reference dataset and to changes in parameter settings.  Some cells with expression levels intermediate to two cell types changed calls between different runs; however, the cell-type-level results presented are robust to these small changes.

Multiple variations of three different mapping strategies were tested: (1) Seurat (as described in this Article), (2) a variation of the tree-mapping strategy previously described41 for analysis of GABAergic cell types in mouse Patch-seq and (3) a correlation-based strategy comparing expression of marker genes in Patch-seq versus cluster medians of the scRNA-seq data.  In all cases, 75–95% of cells mapped to the same cell type when comparing pairs of methods, consistent with similar analyses performed using dissociated FACS-sorted cells and therefore does not reflect a quality issue with Patch-seq data per se, but rather the fact that cell-type definitions are not totally discrete, and single cell measurements are not totally accurate (for example, owing to dropouts).  As such, many cells show expression patterns that are not unambiguously mapped between highly similar transcriptomic cell types, although how much of this is biological and how much technical is difficult to assess.  Despite the imperfect agreement of specific cells, the statistical results regarding morphology and electrophysiology for each cell type remain relatively unchanged regardless of the mapping method used.

Gene expression of Patch-seq cells was visualized by projection into the UMAP space calculated from dissociated nuclei using a combination of Seurat and the R implementation of the UMAP library (https://github.com/tkonopka/umap). More specifically, the Seurat data integration pipeline (functions FindIntegrationAnchors and IntegrateData) was used to calculated scaled data for both datasets and PCA was performed on this integrated space.  The first 30 PCs from both datasets, as well as the UMAP coordinates calculated for dissociated nuclei above were input into the UMAP pipeline and the ‘predict’ function was used to project the Patch-seq cells into UMAP coordinates.  As above, data and metadata were then overlaid on these UMAP coordinates.

Since dissociated nuclei were not collected using sublaminar dissections, ‘deep FREM3’ neurons were defined as FREM3 neurons dissected from L3 or L4 that were assigned to subtype f73 (Extended Data Fig. 1), which colocalizes with deep FREM3 Patch-seq neurons in UMAP space (Figs. 1, 3).  Furthermore, 77 of these 219 marker genes (including four genes shown in Fig. 4e) were also defined as marker genes by Patch-seq, where cortical depth was explicitly measured, suggesting the selection of deep FREM3 neurons in dissociated nuclei was reasonable.

Assessing transcript contamination

To quantify the effect of contamination and gene dropout in the Patch-seq dataset, we compared median gene expression levels of homologous t-types between platforms (Fig. 3a).  Dissociated nuclei and Patch-seq cells from matched human t-types were highly correlated (R = 0.85, P ≈ 0).  Relatively few genes (177 genome-wide) showed enriched expression in dissociated nuclei relative to Patch-seq cells, suggesting that high quality transcriptomes collected in this dataset do not show the increased dropout rate reported in our previous study in mouse41.  This is likely to be because we compared our human Patch-seq cells to a reference of dissociated nuclei, rather than the reference based on dissociated cells in mouse.  By contrast, we identified 2,670 genes with at least fourfold enrichment in Patch-seq, including genes associated with extra-nuclear compartments such as the mitochondria (P < 10–12) and ribosome (P < 10−9), genes regulating cell death (P < 10–18), RNA-binding genes (P < 10−8) including immediate early genes, and markers for non-neuronal cells such as microglia (P < 10–20). Some of the top genes in these categories include COX3, FOS and IL1B, which all show >100-fold enrichment in Patch-seq cells.  These results indicated that Patch-seq cells probably contain some RNA collected from extranuclear compartments and from nearby contaminating cells (particularly microglia), and may show some activity-dependent transcription. However, these effects were minor compared to cell-type differences and we find overall high consistency and similar quality between Patch-seq cells and dissociated nuclei.

Comparison of gene expression between species

Gene orthologues between mouse and human were pulled from the gene orthologues table on NIH (https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_orthologs.gz) on 22 November 2019.  Only genes with unique orthologues between mouse and human were included in cross-species analyses.

Electrophysiology feature analysis

Electrophysiological features were measured from responses elicited by short (3 ms) current pulses and long (1 s) current steps as previously described9. In brief, action potentials were detected by first identifying locations where the smoothed derivative of the membrane potential (dV/dt) exceeded 20 mV ms−1, then refining on the basis of several criteria including threshold-to-peak voltage, time differences and absolute peak height. For each action potential, threshold, height, width (at half-height), fast after-hyperpolarization (AHP) and interspike trough were calculated (trough and AHP were measured relative to threshold), along with maximal upstroke and downstroke rates dV/dt and the upstroke/downstroke ratio (that is, ratio of the peak upstroke to peak downstroke). Additional features from supratheshold sweeps included the rheobase and slope of the firing rate versus current curve (fI slope); the first spike latency and initial firing rate (inverse of first inter-spike interval), measured at rheobase; and the mean firing rate and spike frequency adaptation ratio (mean ratio of consecutive inter-spike intervals), measured at ~50 pA above rheobase. Subthreshold features included the resting membrane potential (RMP), input resistance and membrane time constant (tau) from response across or before hyperpolarizing long steps, and sag ratio from response at ~−100 pA. All feature calculation used the IPFX package (https://github.com/AllenInstitute/ipfx).

Morphology feature analysis

Morphological features were calculated as previously described9. In brief, feature definitions were collected from prior studies10,66. Features were calculated using the version of neuron_morphology package (https://github.com/alleninstitute/neuron_morphology/tree/dev). Reconstructed neurons were aligned in the direction perpendicular to pia and white matter. Additional features, such as the laminar distribution of axon, were calculated from the aligned morphologies. Shrinkage correction was not performed (see above), features predominantly determined by differences in the z-dimension were not analysed to minimize technical artifacts due to z-compression of the slice after processing.

Analysis of features by t-type and species

Combined datasets of electrophysiological and morphological features across homologous t-types from mouse and human were visualized by an analysis pipeline of data imputation and standardization, followed by projection to two dimensions using UMAP or SPCA (sklearn and umap python packages)67,68. Cells with more than 3 out of 18 electrophysiological features missing were dropped, the remaining missing features were imputed as the mean of 5 nearest neighbours (KNNImputer), and features were centred about the median and scaled by interquartile range (RobustScaler). The SPCA regularization parameter was adjusted to minimize non-zero features while preserving dataset structure. All features with coefficients over 0.05 were reported directly in the case of electrophysiology or summarized by feature categories for morphology.

For each feature, differentiation by t-type was assessed by running a one-way ANOVA for the feature by t-type, using the statsmodels package69. This analysis was repeated separately for the three mouse and human homologous t-types, as well as the three deep human t-types (with the subset of deep FREM3 cells only). Results were reported as fraction of variance explained ((eta )2 or R2) and heteroscedasticity-robust F test P-value (HC3), corrected for FDR (Benjamini–Hochberg procedure) across all features for each data modality. Post-hoc Mann–Whitney U tests were run across pairs of t-types in each group (human and mouse homologous types and deep human types) for top-ranked features from ANOVA, and results FDR-corrected.

For classification of t-types, features were normalized using the standard scaler scalar in sklearn (StandardScalar), and the data were randomly assigned with stratification to training (70%) and testing sets (30%). The random forest classifier was trained using the sklearn package with 600 decision trees. The classification performance was estimated after averaging the results of the classifiers trained on 1,000 random data splits and compared against performance for data with shuffled t-type labels. Confusion matrices shown are for a single representative train–test split.

Analysis of features by depth for FREM3 t-type

For each electrophysiology, morphology, and gene feature, the depth-related variability was assessed by a linear regression of the feature against relative L2-3 depth, using the statsmodels package69. Results were reported as fraction of variance explained (R2), Pearson correlation r, and heteroscedasticity-robust F test P-value (HC3), corrected for FDR (Benjamini–Hochberg procedure) across all features for each data modality. Owing to the large number of morphology and genes tested, results were summarized by calculating GO-term enrichment in ToppGene62 for the set of depth-correlated genes (FDR < 0.05), followed by subselection of representative GO terms using REViGO70. Groups of features were ranked by the group’s highest R2, and the features with highest correlation shown for the top groups.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Source link