Integrated intracellular organization and its variations in human iPS cells - Nature - Alert Breaking News

Cell lines, cell culturing and quality control

Each gene-edited cell line was created using the parental WTC-11 hiPS cell line³¹ and contains a fluorescent protein endogenously tagged to a protein representing a distinct cellular structure (Fig. 1a). Cell lines were generated using CRISPR–Cas9-mediated genome editing¹⁴ . The tagging strategy for AAVS1 safe harbour targeting was altered for expression of CAAX-mTagRFP-T^32,33. Fifteen additional Allen Cell Collection lines were generated using the same methods. The complete list of cell lines and reagents can be found in Supplementary Data 2. The cell lines are described at https://www.allencell.org/cell-catalog.html and are available through Coriell at https://www.coriell.org/1/AllenCellCollection. For all non-profit institutions, detailed MTAs for each cell line are listed on the Coriell website. Please contact Coriell regarding for-profit use of the cell lines as some commercial restrictions may apply. All cell lines were cultured on an automated cell-culture platform developed on a Hamilton Microlab STAR Liquid Handling System (Hamilton Company). Cells were cultured in a Cytomat 24 (Thermo Fisher Scientific) at 37 °C and 5% CO₂ in mTeSR1 medium with and without phenol red (STEMCELL Technologies), supplemented with 1% penicillin–streptomycin (Thermo Fisher Scientific). Cells were passaged every four days as single cells for up to ten passages post-thaw. For imaging, cells were plated on Matrigel-coated glass-bottom, black-skirt, 96-well plates with 1.5 optical grade cover glass (Cellvis). Cells were regularly assessed for morphology, cell stemness marker expression and outsourced cytogenetic analyses throughout the three years of data acquisition of the WTC-11 hiPSC Single-Cell Image Dataset v1 (ref. ³⁴). Standard protocols are available at https://www.allencell.org/. Further details are provided in the Supplementary Methods.

Microscopy

Imaging was performed on three identical ZEISS spinning-disk confocal microscopes with 10×/0.45 NA Plan-Apochromat or 100×/1.25 W C-Apochromat Korr UV Vis IR objectives (Zeiss) and ZEN 2.3 software (blue edition; ZEISS) unless otherwise specified. The spinning-disk confocal microscopes were equipped with a 1.2× tube lens adapter for a final magnification of 12× or 120×, respectively, a CSU-X1 spinning-disk scan head (Yokogawa) and two Orca Flash 4.0 cameras (Hamamatsu). Standard laser lines were used at the following laser powers measured with 10× objectives; 405 nm at 0.28 mW, 488 nm at 2.3 mW, 561 nm at 2.4 mW and 640 nm at 2.4 mW unless otherwise specified. An Acousto-Optic Tunable Filter (AOTF) was used to simultaneously modulate the intensity of the four laser lines. The following Band Pass (BP) filter sets (Chroma) were used to collect emission from the specified fluorophore: 450/50 nm for detection of DNA dye, 525/50 nm for detection of mEGFP tag, 600/50 nm for detection of mTagRFP-T tag and 706/95 nm for detection of cell-membrane dye. Images were acquired with an exposure time of 200 ms unless otherwise specified. Cells were imaged in phenol red-free mTeSR1 medium on the stage of microscopes outfitted with a humidified environmental chamber to maintain cells at 37 °C with 5% O₂ during imaging. Transmitted light (bright-field) images were acquired using a white LED light source with broad emission spectrum (pipeline 4.0–4.2) or a red LED light source with peak emission of 740 nm with narrow range and a BP filter 706/95 nm for bright-field light collection (Pipeline 4.4 only). A Prior NanoScan Z 100 mm piezo z stage (ZEISS) was used for fast acquisition in z (Pipeline 4.4 only). Optical control images were acquired daily at the start of each data acquisition to monitor microscope performance. Laser power was measured monthly and the corresponding percentage adjusted accordingly for each wavelength.

Image acquisition

The image acquisition workflow and experimental set-up evolved over the three years of dataset collection and was versioned into four pipelines. Adjustments included single versus dual camera, filter and light sources, as well as addition of a photoprotective cocktail (Supplementary Methods and Extended Data Fig. 1d). Low magnification (12×), 2D bright-field overview images of cells in wells were collected for cell morphology assessment and for selection of imaging positions for high-magnification (120×), 3D, multichannel imaging. Cells were imaged in three modes to acquire a variation of locations within hiPS cell colonies. Selection of FOV position was performed manually using the stage function in ZEN software or using an automated method, depending on the mode and the cell line. After the selection of FOV position from the well overview acquisition, the DNA of cells was first stained for 20 min with NucBlue Live (Thermo Fisher Scientific). Then the cell membrane was stained with CellMask Deep Red (CMDR, Thermo Fisher Scientific) in the continued presence of NucBlue Live for an additional 10 min, and cells were washed once before imaging for a maximum of 2.5 h. Three-dimensional FOVs at 120× were acquired at the pre-selected positions. Four channels were acquired at each z-step (interwoven channels) in the following order: bright field, mEGFP or mTagRFP-T, CMDR and NucBlue Live. Further details are provided in the Supplementary Methods.

3D FOV image quality control

FOV images acquired with two cameras underwent a channel alignment procedure. All 3D FOV images underwent an image quality-control procedure, including three automated FOV quality-control steps. Typical FOV exclusion criteria were related to microscope acquisition system failures (laser, exposure time, z-slice positioning in relation to cell height, empty or out of order channels), analysis steps to identify outliers or any other issues that would cause downstream processing, such as cell, nuclear and cellular structure segmentation, to fail in a systematic batch manner. Total days of acquisition and FOV number per cellular structure are provided in Supplementary Data 1. Further details are provided in the Supplementary Methods.

3D cell and nuclear segmentation

To segment each individual cell and its corresponding DNA from the membrane dye and DNA dye channels of each 3D z-stack, we used the deep-learning-based cell and nuclear instance segmentation algorithm developed as part of Allen Cell & Structure Segmenter, an open-source, Python-based 3D segmentation software package¹⁵. We combined the Segmenter’s Iterative Deep Learning workflow and the Training Assay approach to ensure accurate and robust segmentation at scale (18,100 FOVs) for downstream quantitative analysis. We manually validated a subset of the cell and nuclear segmentation results and found that over 98% of individual cells were well-segmented and over 80% of images generated successful cell and nuclear segmentations for all cells in the entire FOV. On the basis of these validation results, we decided that the cell and nuclear instance segmentation algorithm was sufficiently reliable to be applied to all of the FOVs in the dataset. In addition, all cells in the final dataset were manually reviewed for basic quality criteria. Further details are provided in the Supplementary Methods.

3D cellular structure segmentation

We applied a collection of modular segmentation workflows from the Classic Segmentation component of the Segmenter, each optimized for the particular morphological features of the target cellular structures¹⁵. Representative examples for each of the 25 FP-tagged cellular structures are shown in Extended Data Fig. 2. For each structure, the results of the segmentation workflow were evaluated on sets of images representing the variation observed across imaged cells (for example, different regions of colonies) to ensure consistent segmentation quality across all images for each structure. We performed an additional validation step to determine whether a given target structure segmentation was sufficient for interpretation in the cellular structure volume analysis (Extended Data Fig. 8). We identified ten structures for which there were obvious caveats to the ability to use their target structure segmentation for biological interpretations of how much of the target structure was present in each cell and thus these ten structures were excluded from the structure volume analysis (Extended Data Fig. 2b–d). Further details are provided in the Supplementary Methods.

Single-cell datasets, feature extraction and quality control

To build the WTC-11 hiPSC Single-Cell Image Dataset v1, we extracted all complete individual cells in each FOV automatically from the cell segmentation results (around 12 complete cells per FOV, on average). All images were rescaled to isotropic voxel size (0.108333 µm in x,y and z). A cropping region of interest (ROI) was created for each cell and applied to each of the original intensity z-stacks and cell, nuclear and structure segmentations. Features that were calculated for each cell included FOV-based features (for example, the lowest and highest z position of all cells in the FOV), colony-based features (for example, size of the colony), single-cell-based features (for example, cell, nuclear, and cellular structure volume), and single-cell deep-learning-based annotations of cell-cycle stage (for example, interphase or mitotic). The baseline interphase dataset was created by removing all of the 11,190 mitotic cells, as well as approximately 0.5% of outlier cells. We performed an extensive analysis to identify and account for any potential experimental contributions to cell-shape variation (Extended Data Fig. 12). All of the results together confirmed that although cell line identity can contribute to variation in cell height because each cell line was imaged under a particular set of imaging conditions, which varied throughout the imaging pipeline timeline, cell line identity itself does not greatly contribute to the variation in cell height observed in the baseline interphase dataset. Total numbers of cells per cellular structure and per dataset can be found in Extended Data Fig. 1d and Supplementary Data 1. Further details are provided in the Supplementary Methods.

SHE of cell and nuclear shapes

We used SHE coefficients as shape descriptors for cell and nuclear shape^18,35. We created a publicly available Python package, aics-shparam (see Code availability) to extract SHE coefficients from segmented images of cells and nuclei. Cells and nuclei were first rotated in the xy plane such that the longest cell axis falls along the x axis. The z axis in the lab frame of reference was preserved as it represents the apical–basal axis of these epithelial-like cells. We expanded, up to degree L_max = 16, resulting in 289 coefficients for each input. Therefore, the shape of each cell in our dataset can be represented by a total of 578 coefficients (Fig. 2a). We could also do the reverse and recreate the 3D mesh representation of a particular set of SHE coefficients with aics-shparam. Further details are provided in the Supplementary Methods.

Building the cell and nuclear shape space

We used PCA to reduce the dimensionality of our joint vectors for all cells (578 SHE coefficients) down to eight principal components. We used the PCA implementation from the Python library scikit-learn³⁶ with default parameters (Fig. 2b). Because the sign of a given PC is arbitrary, we adjusted the signs where needed to match the naming of the shape modes (for example, larger cells have a more positive PC). We also translated the location of the nuclear mesh back to its correct location relative to the centre of the cell. To prevent cells with extreme shapes from affecting the interpretation of the PCs, we excluded all cells that fell into the range 0th to 1st or 99th to 100th percentiles of each PC from subsequent analysis (remaining n = 175,147 cells) We z-scored all PCs independently by dividing the PC values by the standard deviation (σ) of that PC. The combination of the first eight ‘shape modes’ (z-scored PCs) created the 8D shape space. We used the inverse of the PCA transform generated above to map coordinates from the shape space back into SHE coefficients, which, in turn, were used to reconstruct the corresponding 3D shape. For example, the eight-component vector (0,0,0,0,0,0,0,0) represents the origin of the shape space and its corresponding 3D shape is called the ‘mean cell and nuclear shape’ (Fig. 2c). In addition to the joint cell and nuclear shape space, we also generated independent cell-only and nucleus-only shape spaces for the baseline interphase dataset (Extended Data Fig. 3e–f), a joint cell and nuclear shape space for cells located at the edges of hiPS cell colonies, and one each joint cell and nuclear shape space for cells in prophase and in early prometaphase. Finally, we created three joint cell and nuclear shape spaces for the three shape-matched datasets described below. Further details are provided in the Supplementary Methods.

PILRs

The nuclear centroid of each cell was defined as the SHE coefficients representing a one-pixel radius (0.108 µm) 3D spherical mesh. Then, pre-computed SHE coefficients were interpolated to create a series of successive 3D concentric mesh shells from the centroid of the nucleus to the nuclear boundary and from the nuclear boundary to the cell boundary. The xyz coordinates of points in the 3D meshes map to corresponding xyz locations in the aligned segmented images that were used to generate the SHE coefficients in the first place. Thus, the presence or absence of a segmentation result at each mesh xyz coordinate could be organized as a matrix as shown in Fig. 3b. This matrix encodes a PILR of the cell. This process could also be performed using the intensity value at a given xyz location in the original FP image (Extended Data Fig. 4). A PILR could then be used to map the cellular structure locations from one cell and nuclear shape into the equivalent locations in any other cell and nuclear shape, thus generating a ‘morphed cell’ and its reconstructed image. Further details are provided in the Supplementary Methods.

Integrating average morphed cells in the mean cell and nuclear shape

We identified and grouped a set of cells by their absolute proximity in 8D space to the origin of the shape space, map point (0,0,0,0,0,0,0,0). We determined the radius of a sphere centred at this origin such that the number of cells per structure within this sphere was as similar as possible to the average number of cells found in the centre bins of all of the shape modes. A total of 35,633 cells across all 25 structures were found to be within this radius of 2.1σ (see Supplementary Data 1 for numbers of cells per structure). We computed the average of all the PILRs for each structure for all cells within the 8-dimensional sphere. We then morphed these average PILRs into the mean cell and nuclear shape, creating an integrated average morphed cell. Any cellular structures could be rendered simultaneously to illustrate the spatial relationships of different structures on the basis of their average location in cells of a particular shape.

Pairwise average interaction map of cellular structures

We calculated the 2D pixel-wise Pearson correlation between the averaged PILRs for all pairs of cellular structures within the 8-dimensional sphere, representing a measure of the average location similarity between two structures (Extended Data Fig. 4g). All correlation values used throughout this paper were calculated using the function corrcoef from the Python package NumPy³⁷. The average location similarities were organized in a 25 × 25 matrix that represents an average pairwise spatial interaction map of cellular structures (Fig. 3d). This correlation matrix was used as input for a hierarchical clustering algorithm to cluster all 25 cellular structures according to their average location similarities. We used the function cluster.hierarchy.linkage of type ‘average’ from the Python package scipy³⁸ to produce the clustering represented by the dendrogram in Fig. 3d. We also computed the average location similarity for every map point along each shape mode. For a given map point, the correlations were computed between the averaged PILRs over all cells that fall into the corresponding map point bin. The heat maps of the resulting matrices for all shape modes and bins between −2σ and 2σ are shown in Fig. 3e and Extended Data Fig. 4h and the data can be found in Supplementary Data 1.

Location stereotypy and location concordance

We calculated the 2D pixel-wise Pearson correlation between the PILRs for all pairs of individual cells within the 8-dimensional sphere centred at the origin of our shape space. This computation results in a 35,633 × 35,633 correlation matrix (Extended Data Fig. 6a). Correlation values from this matrix were averaged within each pair of structures to create an average correlation matrix. Two distinct measurements of structure location and its variation were derived from this average correlation matrix. The diagonal values are the location stereotypy of a given structure and the off-diagonal values are the location concordance between two structures (Extended Data Fig. 6b). We also computed the average correlation matrices for every map point along each shape mode. For a given map point, the correlations were computed between PILRs over all cells that fall into the corresponding map point bin and then averaged. Heat maps and values of location stereotypy and location concordance for all shape modes and map points can be found in Extended Data Figs. 6c,d and 7c,d and Supplementary Data 1.

Shape-matched datasets

To compare a second, distinct population of cells, such as cells at the edges of colonies or cells in early mitosis, with the baseline interphase cell dataset we created shape-matched datasets. We first mapped cell and nuclear shapes from the second population into the shape space of the baseline dataset by transforming the SHE coefficients from the second population using the same PCs obtained for the baseline dataset. Here we did not exclude cells that fell into the range 0th to 1st or 99th to 100th percentiles of each PC in the baseline dataset because these cells could have shapes more similar to the second population. We then calculated the distance in 8D shape space between every possible pair of cells in both datasets (Extended Data Fig. 9a). Finally, for every cell in the second dataset, we flagged its nearest neighbour within the baseline dataset. The same cell in the baseline dataset could be flagged more than once for multiple different cells within the second dataset. This occurred roughly 12% of the time. The resultant shape-matched dataset is the set of unique flagged cells in the baseline dataset combined with cells in the second dataset. The mean cell shape of this shape-matched dataset is the cell and nuclear shape corresponding to the origin of the corresponding shape-matched shape space. Further details are provided in the Supplementary Methods.

LDA

We performed a PCA dimensionality reduction on all of the PILRs for a given cellular structure in a given shape-matched dataset. This reduced the initial dimensionality of 532,610 pixels in each PILR down to 32 dimensions (or the total number of cells available if fewer than 32). The dimensionally reduced data were then used as input for a LDA to identify the linear combination of reduced dimensions that best separated the two populations of cells within the shape-matched dataset. LDA generates a discriminant axis along which we could reconstruct corresponding PILRs using the inverse of the PCA transform (Extended Data Fig. 9c and Supplementary Methods). These PILR reconstructions were morphed into the mean cell and nuclear shape for that shape-matched dataset (for example, Supplementary Videos 4 and 5). These reconstructions represent the full range of the ALP for that structure. Each cell was also assigned a location along the discriminant axis (for example, histograms in Extended Data Fig. 9h and Supplementary Videos 4 and 5).

Workflow to flag significant changes in location stereotypy and concordance in early mitosis

To flag whether a difference in location stereotypy or concordance was significant, we first set a threshold cut-off value of Pearson correlation ρ = 0.03, below which a stereotypy or concordance value was too low to be used for the subsequent detection of a difference between the baseline dataset and its shape-matched comparison dataset. Next, we set a cut-off threshold for the Pearson correlation value of the difference (ρ_diff) in stereotypy or concordance of ρ_diff = 0.02 (Supplementary Methods). We next applied this workflow to flag all entries in the three early mitotic average correlation difference matrices that showed a significant change between interphase, prophase and early prometaphase (i1–m1, i2–m2 and m1–m2). The first cut-off, ρ = 0.03, was applied to the interphase cells when comparing to each early mitotic (i1 for i1–m1; i2 for i2–m2) and to prophase when comparing between the two early mitotic stages (m1 for m1–m2) as in Fig. 5c and Extended Data Fig. 10f. This flagging procedure resulted in three binarized versions of the matrix, in which each flagged entry is marked in black. The combined pattern of flags in these three matrices permits us to identify the TOC for each of the flagged entries (Fig. 5c,d). The four TOC categories included: (1) m1-only: changes that occurred from interphase to m1 but not any further in m2; (2) stepwise: changes that occurred both from interphase to m1 and from m1 to m2; (3) m2-change: changes that occurred from m1 to m2 only; and (4) no change or cases for which changes could not be determined for technical reasons (Fig. 5b and Supplementary Methods). We used all possible combinations of the TOC for the two stereotypies and single concordance for each pair of structures to assess the overall relationship between stereotypy and concordance in early mitosis, which we consolidated and summarized into three categories (top triangle; Fig. 5d and Supplementary Methods).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link