Primate gastrulation and early organogenesis at single-cell resolution - Nature - Alert Breaking News

Ethical statement

This study was conducted in accordance with the Principles for the Ethical Treatment of Non-Human Primates issued by the Institute of Zoology, Chinese Academy of Sciences (IOZ, CAS), and was approved in advance by the Institutional Animal Care and Use Committee of IOZ, CAS (no. IOZ-EU-20191113 for all monkey experiments, no. IOZ-IACUC-2021-037 for all mouse experiments). Both followed relevant guidelines and regulations. hESC experiments in this study were performed at the UT Southwestern Medical Center and followed the International Society for Stem Cell Research guidelines for Stem Cell Research and Clinical Translation, 2021 (https://www.isscr.org/policy/guidelines-for-stem-cell-research-and-clinical-translation). hESC work was reviewed and approved by the UT Southwestern Stem Cell Oversight Committee.

Experiment models and biological sample preparation

Collection of embryonic samples

All Macaca fascicularis were of Southeast Asian origin. The animals were maintained at around 25 °C on a 12/12-h light/dark schedule and raised at the Xieerxin Biology Resource with the accreditation of the laboratory animal care facility in Beijing. All animals were given a commercial diet twice per day with tap water ad libitum and were fed vegetables and fruits once daily under careful veterinary supervision. Before the experiment, none of the animals had a clinical or experimental history that would affect physiological ageing or increase susceptibility to diseases.

Oocyte collection, intracytoplasmic sperm injection, pre-implantation embryo culture and transfer of pre-implantation embryos to foster mothers were performed as described by Yamasaki et al.⁵¹. Briefly, female cynomolgus monkeys around 6–8 years of age were chosen for oocyte collection by superovulation with follicle-stimulating hormone, and an implantable and programmable microfusion device was implanted subcutaneously under ultrasound detection. The day when the collected ova were artificially fertilized by sperm injection was designated as embryonic day 0 (E0). When the embryos developed with blastocoel cavities around E6–7, five or six high-quality embryos were selected and transferred to appropriate recipient female cynomolgus monkeys. The implanted embryos were further monitored by ultrasound scanning from E14 to identify successful pregnancies. Ketamine hydrochloride (0.1–0.2 ml kg^–1) was administered by intramuscular injection for the anaesthesia of pregnant monkeys. The implanted uterus was surgically removed at different developmental stages asexperimentally designed, from which embryonic tissues could be obtained. The sample size of the study was determined based on the availability of highly regulated primate embryo samples. In compliance with the 3R guidelines we reduced the number of animals used to a minimum, which allowed us to obtain a high-coverage transcriptome for each cell type and confidently perform downstream analyses.

C57BL/6 mice were housed under a 12/12-h light/dark cycle at around 25 °C. Natural mating was established between males and 6–8-week-old females, with 12:00 on the day of vaginal plug insertion considered to be E0.5. Postimplantation embryos were dissected from uteri at E7.5–8.5 for the experiments described below.

Isolation of embryonic cells

Monkey embryonic tissues were transferred to DMEM/F12 (DF12) medium (Gibco, no. 21331020) containing 5% Penicillin-Streptomycin (Gibco) and stored at 4 °C for a short period. After washing in PBS (Gibco), tissues were digested with 0.125% TrypLE (Gibco) and 0.025% DNase (Gibco) in DF12 at 37 °C with stirring for 10 min. The disaggregated cell suspension was passed through 40-μm sterilesieve mesh and washed thoroughly with DF12 containing 10% fetal bovine serum (Invitrogen). Sieved cells were precipitated and collected by centrifugation at 300 g for 5 min. Precipitated cells were resuspended with 5 ml of red blood cell lysis buffer for 3 min and then diluted with an additional 25 ml of DF12 medium. After removal of red blood cells, cells were recentrifuged and transferred to short-term storage at 4 °C.

Preparation of scRNA-seq library and sequencing

Single-cell libraries were constructed using Single Cell 3 Library & Gel Bead Kit v.3 according to the manufacturer’s protocol (10X Genomics)⁵². In short, cell counts were assessed busing a haemocytometer (Luna-FL, Logos Biosystems) with cell concentration adjusted to 1,000 μl^–1. About 16,000 cells were added to each channel of a 10X loading chip and then around 8,000 were captured. Captured cells were lysed, and the isolated RNA was barcoded through reverse transcription in individual gel bead in the emulsion. cDNA was then amplified to construct the library and the qualities of cDNA and cDNA libraries were assessed using Agilent 2100. Finally, the libraries were sequenced on an Illumina Hiseq X Ten platform (Annoroad Gene Technology).

Single-cell transcriptomic analysis

scRNA-seq data preprocessing

Raw fastq files were processed using Cell Ranger 3.1.0 software with default mapping arguments⁵². Reads were mapped to the Macaca fascicularis 5.0 genome. Next, the CellRanger ‘aggr’ command was used to normalize the sequencing depth of different samples, with mean reads per cell above 30,220 post normalization.

Filtering of cells, integration, dimensionality reduction and clustering

The filtered expression matrix with cell barcodes and gene names was loaded with the ‘Read10X’ function of the Seurat (v.4.0.0) R package⁵³. First, single cells with the number of detected genes (nFeature_RNA) above 500 and detected transcripts (nCount_RNA) above 1,000 were retained to exclude apoptotic or dead cells. Next, doublet or multiplet cells were determined with Scrublet, according to the recommended multiplet rate reference table from 10X Genomics⁵⁴. Next, Seurat objects of different samples (seven samples, Supplementary Table 1) were created independently, with the expression matrix and metadata containing cell barcodes, and cell multiplet information inferred by Scrublet, followed by merging of these Seurat objects. For monkey genes poorly annotated, gene names annotated by Macaca fascicularis 5.0 were further converted to those of human-based genes on the published annotation information to better interpret the data⁴. After exclusion of doublet or multiplet cells, 56,636 embryonic cells remained. Next, we used the dataset integration function of Seurat⁵³ to exclude individual heterogeneities between different monkeys. In brief, after normalization of the Seurat object we selected highlyvariably expressed genes by the ‘mean.var.plot’ method at the FindVariableFeatures step, with 2,117 genes found to have highly variable features. These feature genes of anchor and default 30 dimensions of canonical correlation analysis were used for FindIntegrationAnchors, IntegrateData, RunPCA and so on. A tree number of 50 was set as as default when finding integration anchors. Subsequently, the Seurat pipeline was used for dimensionality reduction (UMAP) and unsupervised clustering. In most cases we used the default settings of Seurat during dimensionality reduction and unsupervised clustering. To construct the UMAP plot we selected the number of dimensions mainly according to the ‘ElbowPlot’ function. For UMAP of 56,636 embryo cells we used the first 16 principal component analysis dimensions at the RunUMAP procedure; the seed used was 42, minimum distance was 0.3 and n.neighbors was 30 as the default setting of Seurat v.4, except that ‘umap.method’ was ‘umap-learn’ and the metric was ‘correlation’. For clustering of the 56,636 embryo cells the ‘k.parameter’ of 20 and ‘n.trees parameter’ of 50 were the default settings during the neighbour-finding process; the number of dimensions used for neighbour finding was 16, as also used for UMAP construction. A resolution of 0.9 was used at the ‘FindClusters’ step, as shown in Fig. 1b, different types of single cells grouped well.

Differentially expressed gene (DEG) and Gene Ontology (GO) analyses

We computed the DEGs of each cell cluster with RNA assay using the FindAllMarkers function of the Seurat package⁵³. Heatmaps were plotted based on the top ten highly expressed genes (according to adjusted P values and fold change) of each cell cluster. The DEGs of each cell cluster from mouse and monkey were used for GO enrichment and analysed by the clusterProfiler R package⁵⁵. GO terms were enriched by the ‘compareCluster’ function, and ‘ont=BP’ was set.

Pseudotime analysis

The ‘monocle3’ R package⁵⁶ was used to calculate the developmental pseudotime of single cells. The Seurat object was converted to a monocle3 object by the ‘as.cell_data_set’ command of the SeuratWrappers R package⁵³. The developmental trajectory was then constructed with the ‘learn_graph’ function of the monocle3 R package. After setting the developmental starting point, the ‘order cells’ command was used to analyse developmental pseudotimes. Finally, pseudotime trajectory was visualized with the ‘plot_cells’ function.

RNA velocity analysis

Read annotations for sequenced samples were performed using the ‘velocity run 10X’ command-line tool with BAM, genome annotation and repeat annotation files⁸. BAM files were generated by the default parameters of Cell Ranger software (10X Genomics)⁵². Macaca fascicularis 5.0 genome annotations were used to count molecules while separating them into three categories: spliced, unspliced or ambiguous. Repeat annotation files were downloaded from the UCSC genome browser. We then used the UMAP embedding matrix computed by the Seurat pipeline to construct the velocity map with the scVelo python package⁸. Briefly, the loom file containing three categories of count value was loaded to the R environment by the ‘ReadVelocity’ function of the SeuratWrappers package when the Seurat pipeline was completed. These data were added to the Seurat object, after which the Seurat object was converted to the ‘h5ad file’ with the SeuratDisk R package⁵³ and the ‘h5ad’ file was loaded by the ‘scv.read’ function of the scVelo python package⁵⁷. After the h5ad file was further filtered and normalized, ‘pp.moments’, ‘tl.velocity’ and ‘tl.velocity_graph’ commands were executed to compute RNA velocities. Finally, using the function ‘pl.velocity_embedding_stream’, RNA velocity vectors were projected onto to the UMAP produced by the Seurat pipeline.

To address concerns about 10X sequencing depth, PS-mesoderm lineage cells (Extended Data Fig. 2b) were divided into 282 microclusters by the Seurat unsupervised clustering method (resolution, 50) based on transcriptomic similarities. Spliced and unspliced transcripts of each microcluster were further merged (the sum of corresponding transcript count values in all cells of each microcluster was computed separately), then each microcluster was treated as a ‘pseudocell’. After microclustering, the new Seurat object was recreated with the merged nCount data and pseudocells were annotated according to the maximum cell population of each microcluster. The total number of detected genes and UMI per cell were increased (nCount (UMI), 1 × 10⁵ unspliced and 4 × 10⁵ spliced; nFeature (genes), 10,000 unspliced and 12,000 spliced), which was helpful in regard to compensating for the depth shortage of 3’ sequencing. After the Seurat pipeline, UMAP coordinates were substituted with mean UMAP values of cells in each original microcluster. RNA velocity vectors were then computed with the scVelo python package⁸. The validation of velocity on endoderm lineage based on microclustering was performed using a similar method. In addition, ‘Velocity_True’and ‘Velocity_False’ genes were exported from the ‘velocity_genes’ of the scVelo object, and ‘Conflict’ genes were computed based on the methods of Barile et al.⁵⁸.

Pseudotime trajectory analysis of Gut

The Seurat object with scale data of Gut was converted to the h5ad file by the SeuratDisk (v.0.0.0.9013) R package⁵⁹, and the h5ad file was then loaded to the python environment by the ‘sc.read’ function of the Scanpy (v.1.8.2) python package⁶⁰. Thereafter, principal components were recomputed with the ‘tl.pca’ function of Scanpy. The Force-directed graph was constructed with the 14 nearest neighbours with default principal components of the scale data (using the Scanpy ‘tl.draw_graph’ function), and the layout was generated with the ForceAtlas2 algorithm⁶¹. Graph abstraction was computed with the ‘tl.paga’ function of Scanpy v.1.8.2. The PAGA plot was drawn with the ‘pl.paga_compare’ function for improved correlation of cell clusters to the Force-directed graph. The threshold for connection of clusters was set to 0.15, node size scale to 3 and edge width scale to 0.8. Diffusion pseudotime⁶² was computed using the ‘tl.dpt’ function of Scanpy, with cluster 1 set as the starting point.

TF analysis

The pySCENIC analysis in Docker was carried out following three steps⁶³. The gene expression matrix was converted to loom file by the ‘loompy’ in python, then the ‘pyscenic grn’, ‘pyscience ctx’, and ‘pyscience aucell’ were used to infer the gene regulatory network, regulon prediction and cellular enrichment (area under the curve, AUC) processes with the corresponding cells. After gene regulatory network was produced by ‘pyscenic grn’, the regulon specificity scores were computed based on the cell clusters identified by Seurat, and we chose top regulons for each cell cluster following ‘pyscience ctx’. The AUC matrix was used to score regulon activity of each cell. The AUCell scores identified important regulons in cells by “pyscience aucell”. The result was a binary regulon activity matrix (binarized activity matrix) that determined in which cells Regulon is ‘on’. The SCENIC AUC heatmap was plotted with binarized activity regulons of each cell cluster by the ‘pheatmap’ R package with the annotation information in the Seurat object.

Cell–cell communication analysis

Cell annotation information and raw count expression matrix were exported from the Seurat file with suggested scripts using the CellPhoneDB protocol^64,65. Cell annotation information and count expression matrix were then used as input for CellPhoneDB statistical analysis with default settings, and this step together with the following plotting step was executed at the Linux command-line interface supported by the protocol. The database of receptor–ligand interactions was generated for human proteins, and the genes of the monkey have been transferred to human genes at the maximum extent to minimize differences in receptor–ligand interactions that might vary between monkeys and humans. Finally, we showed some notable interactions between relevant cell types with the dot-plot function of CellPhoneDB.

Comparison of single-cell transcriptomic dataset among mouse, human and monkey

To project monkey single-cell data onto the mouse UMAP, the mouse single-cell reference dataset was first prepared (Fig. 4a). scRNA-seq data of early mouse embryogenesis⁶ were obtained from EMBL-EBI ArrayExpress under experiment code no. E-MTAB-6967. The count expression matrix and cell annotation files supplied were used to create the Seurat object with the Seurat (v.4.0.0) R package⁵⁹. Using the method of Blanca Pijuan-Sala et al.⁶, 116,312 single-cell transcriptomes remained. The mouse Seurat object (reference dataset) was created with 13,805 monkey/mouse shared genes. Following the RunUMAP procedure (return.model, TRUE), UMAP cell-embedding values were replaced by those supplied in the cell annotation file of no. E-MTAB-6967. Cell clusters in the UMAP plot, as shown in Fig. 4a (right), were also annotated according to the annotation files supplied. The monkey Seurat object (query dataset) was also created, with 13,805 monkey/mouse shared genes. The anchors between mouse and monkey data were found with the FindTransferAnchors function (reference.reduction, ‘pca’; dims, 1:50; k.filter, NA), and the function MapQuery (reference.reduction, ‘pca’; reduction.model, ‘umap’) was used to project monkey embryo single-cell data onto the mouse embryo single-cell data-based UMAP structure. The cell clusters shown in Fig. 1b are shown in the projected UMAP plot in Fig. 4a (left).

To do integration analysis for monkey and human single-cell data, the human CS7 embryo² and various embryoid datasets including gastruloids^42,43, HFOs⁴⁵, neuruloids^46,47,48,49 and somitoids⁵⁰ were prepared (Extended Data Figs. 8a–d,g, 9a,b and 10a,d,g). We used the ‘biomaRt’ package to convert genes from cynomolgus monkeys and mice to human homologous genes⁵³. Seurat lists were split by samples or species, and each list was normalized using ‘NormalizeData’ function. Next, 2,000 genes were selected as anchor features. Using the R package ‘Seurat’ with the functions ‘FindIntegrationAnchors’ and ‘IntegrateData’, based on canonical correlation analysis and mutual nearest-neighbours algorithms, we acquired the integration Seurat objects of cynomolgus monkeys with mice then set the default assay as ‘integrated’. UMAPs were calculated using the function of ‘RunUMAP’ with dimensions set as 30. The same methods were performed for integration between natural monkey and human embryos² (or human embryoids).

Developmental staging of monkey and mouse embryos

We selected EPI, rostral neuroectoderm, SE, forebrain/midbrain/hindbrain and NC from the E6.5–8.5 mouse embryo dataset⁶ and compared them with their counterpart cells in CS8–11 cynomolgus monkey embryos using the ‘scmap’ R package^4,66. Furthermore, we selected 1,000 genes by setting ‘n_featurre=1000’ in the function ‘selectFeatures’ with the parameter threshold set to 0 in ‘scmapCluster’. Default parameters were used for all other steps. We performed the same strategy for the developmental stage comparison of mesoderm and endoderm between cynomolgus monkeys and mice.

Comparison of cellular signalling pathways across species

We downloaded gene information for the WNT, FGF, TGF-β, SHH, Hippoand Notch signalling pathways from the MSigDB database⁶⁷. Gene expression of various cell types was detected in cynomolgus monkey, human and mouse embryos and in human stem cell embryo models. Gene expression was scaled from −1 to 1 from the integrated data, and average expression level was measured by cell type using the ‘AverageExpression’ function in the Seurat R package.

IF analysis on paraffin-fixed embryo sections

Embryonic samples (monkey E22 and mouse E7.5–8.5 embryos) were immediately fixed in 4% paraformaldehyde overnight at 4 °C and subsequently embedded in paraffin. Sections (5 µm) on slides were dewaxed and rehydrated with xylene and ethanol gradients. Slides were immersed in 0.01 mol l^–1 citric acid buffer solution (C₆H₈O₇.H₂O:C₆H₅Na₃O₇.2H₂O, 1:9, pH 6.0) and heated in a microwave oven at 92–98 °C for 15 min for antigen retrieval. After cooling to room temperature, the slides were washed three times with 1× PBS (5 min each), incubated with 1% Triton X-100 for 30 min and blocked with 2% bovine serum albumin (BSA) for 30 min at room temperature. Next, the slides were incubated with primary antibodies (Supplementary Table 14), diluted with blocking solution overnight at 4 °C and washed three times with PBST (1× PBS with 0.05% Tween-20, 5 min each). The slides were then incubated with secondary antibodies diluted with blocking solution and 1 mg ml^–1 DAPI (Invitrogen, no. D3571) for 1 h. Finally, after washing three times with PBST (5 min each) the slides were mounted with anti-fade mounting medium (Gibco). IF images were captured by laser-scanning confocal microscope LSM 880 (Carl Zeiss) and processed with Imaris 9.0.2 software (Bitplane) and Zen 7.0 (Carl Zeiss).

Validation by stem cell models

Pluripotent stem cell culture

Human embryonic stem cell line H9 (WA09) was obtained from WiCell and authenticated by short tandemrepeat profiling. Mouse epiblast stem cells (mEpiSCs) and rhesus macaque ES cells were generated and identified as described in a previous study^68,69. Mycoplasma testing for cell lines was negative. hESCs were maintained in mTeSR Plus medium (STEMCELL Technologies) in a 0.5% Matrigel (BD Biosciences)-coated culture dish at 37 °C and in 5% CO₂. hESCs were dissociated with accutase (STEMCELL Technologies) and split in a 1:10 ratio. A single-cell suspension was seeded into a Matrigel-coated dish in mTeSR Plus medium containing 10 µM ROCK inhibitor (no. Y27632, Sigma-Aldrich). mEpiSCs were maintained on mouse embryonic fibroblast (MEF) feeder cells in a gelatin-coated dish, in NBFR medium containing DF12 and Neurobasal medium (Invitrogen) mixed in a 1:1 ratio, 0.5× N2 supplement (Invitrogen), 0.5× B27 supplement (Invitrogen), 2 mM GlutaMax (Gibco), 1× nonessential amino acids (NEAA, Gibco), 0.1 mM 2-mercaptoethanol (Sigma-Aldrich), 20 ng ml^–1 FGF2 and 2.5 μM IWR1. mEpiSCs were dissociated with TrypLE (ThermoFisher) and split in a 1:30 ratio. Rhesus macaque ES cells were maintained on mouse embryonic fibroblast feeder cells in a gelatin-coated dish, in NBFR medium supplemented with 5 mg ml^–1 BSA (MP Biomedicals). Cells were dissociated with TrypLE and split in a 1:10 ratio. A single-cell suspension was seeded in NBFR (5 mg ml^–1 BSA) medium containing 10 µM ROCK inhibitor.

In vitro differentiation of PSM-like cells

Presomitic mesoderm differentiation was carried out as described in a previous study³⁷. On the day of differentiation (day 0), 100,000–150,000 cells (10,416–15,625 cells cm^–2) were seeded in a Matrigel-coated, 35 mm dish in pluripotency maintenance medium. Cells were maintained in incubator at 37 °C for about 2 h before changing to differentiation medium; differentiation medium contains DF12 with 1× N2 supplement and Neurobasal medium (Invitrogen) with 1× B27 supplement in a 1:1 ratio. The medium was also supplemented with 2 mM Glutamax, 0.1 mM nonessential amino acids, 1 mM sodium pyruvate (Gibco), Penicillin-Streptomycin (Gibco), 10 µM CHIR99021 (Selleckchem) and 0.5 μM LDN193189 (Selleckchem). Differentiation medium was changed daily. The same protocol was used for differentiation of human, monkey and mouse PSM-like cells. For Hippo inhibition experiments, cells were treated with 0.5 µM 1-Oleoyl LPA (OCRIS, no. 3854) on day 1.

IF staining and microscopy

For IF staining, 10,416–15,625 cells cm^–2 were initially seeded on Matrigel-coated µ-Slide eight-well chambered coverslips (ibidi, for high-end microscopy). The cells were fixed at the indicated time points (days) in 4% paraformaldehyde for 15 min at room temperature. Fixed cells were washed twice with PBS before permeabilization and blocking with 3% donkey serum in PBST (1× PBS with 0.1% Triton X-100) for 1 h at room temperature. Samples were incubated with primary antibody (Supplementary Table 14) and diluted in blocking buffer at room temperature for 2 h or 4 °C overnight followed by 30 min of PBST wash, repeated twice. Secondary antibodies (ThermoFisher) were diluted in blocking buffer in a 1:500 ratio. DAPI staining was performed, together with secondary antibodies, at room temperature for 1 h followed by three PBST washes. Samples were then soaked in PBS before imaging. Fluorescence imaging was performed on either (1) a Nikon CSU-W1 SoRa spinning-disk confocal microscope with objectives ×20/0.45 numerical aperture (NA), WD 8.9–6.9, air, ×40/0.6 NA, WD 3.6–2.85, air and ×100/1.45 NA, oil or (2) a Zeiss LSM 800 laser-scanning confocal microscope with a ×40/1.3 NA oil objective.

Imaging and statistical analysis

Statistical analyses were repeated at least twice, with consistent results. In the figure captions n denotes the number of biological replicates in the same experiment. Raw images were first processed in Fiji⁷⁰ to create maximal intensity projection (MIP) and export look-up table of representative images. For MLLT3 and FOSB data shown in Extended Data Fig. 7g,h, nuclear segmentation was performed in Ilastik⁷¹. MIP images and segmentation masks were processed in MATLAB (R2022a) using custom code, which is available in a public repository. Nuclear localized fluorescence intensity of transcription factors was computed for each cell in a given field, and the value was then normalized to the DAPI intensity of the same cell. Values of all cells were plotted as mean ± s.e.m. For data shown in Fig. 4f, total cell and TBX6-positive cell numbers were calculated with Imaris (v.9.9, Oxford Instruments) using the SPOTS function. Total cell number was calculated by counting the number of nuclei (DAPI). The same parameters for computation of the spots were applied to the DAPI and TBX6 channels. Data were shown as mean ± s.d. P values were determined by unpaired t-test. GraphPad Prism v.7.0 was used to plot the data shown in Fig. 4f and Extended Data Fig. 7g,h.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link