May 19, 2024
A time-resolved, multi-symbol molecular recorder via sequential genome editing – Nature

A time-resolved, multi-symbol molecular recorder via sequential genome editing – Nature

Plasmid cloning

Both pegRNA and DNA Tape constructs were cloned using either Gibson assembly (Gibson Assembly Master Mix, New England Biolabs) or ligation after restriction (T4 DNA ligase, New England Biolabs). For the Gibson assembly protocol, inserts of interest, usually ordered in the form of single-stranded DNA (IDT; Ultramer, up to 200 bp, or oPool, up to 350 bp), were amplified using PCR (KAPA HiFi polymerase) and converted into double-stranded DNA molecules. For ligation, single-stranded DNAs (IDT) were annealed with 4-bp overhangs on both ends of the double-stranded DNAs, with these overhangs acting as a substrate for T4 DNA ligase. Cloning backbones were digested with either BsaI-HFv2 or BsmBI-v2 (NEB), gel purified and mixed with inserts in the Gibson assembly reaction. A small amount (1–2 µl) of Gibson assembly reaction mix or T4 ligation mix was added to an NEB Stbl cell (C3040) for transformation with cells grown at 30 °C or 37 °C for plasmid DNA preparation (Qiagen Miniprep). The resulting plasmids were sequence verified using Sanger sequencing (Genewiz). The pegRNA plasmids used in transient transfection experiments were cloned using plasmid backbone pU6-pegRNA-GG-acceptor (Addgene, 132777), following the protocol outlined in ref. 2. The resulting pegRNA expression cassette had a U6 promoter and poly(T) terminator. For epegRNA cloning, another fragment including the evoPreQ1 sequence was added, with each strand of oligonucleotides purchased phosphorylated from IDT. The Lenti-TargetBC-5×TAPE-1-pegRNA-InsertBC construct was cloned on the basis of the CROP-seq vector39 (CROP-seq-guide-Puro; Addgene, 86708). The vector was modified to include a GFP-TargetBC-5×TAPE-1-CaptureSequence1 sequence, and the sequence downstream of the U6 promoter had been modified to allow insertion of the InsertBC-pegRNA sequence. Plasmids encoding DNA Typewriter constructs (piggyBac-5×TAPE-1-BlastR), lineage tracing constructs (Lenti-TargetBC-5×TAPE-1-pegRNA-InsertBC) and pegRNAs (pU6-CApegTAPE1) have been submitted to Addgene (accessions 175808, 183790 and 175809).

Tissue culture, transfection, lentiviral transduction and transgene integration

The HEK293T cell line was purchased from the American Type Culture Collection and maintained by following the recommended protocol from the vendor. Primary MEFs were purchased from Millipore-Sigma (PMEF-CFL; EmbryoMax Primary Mouse Embryonic Fibroblasts, strain CF1, not treated, passage 3). Both HEK293T and MEF cells were cultured in DMEM with high glucose (Gibco), supplemented with 10% FBS (Rocky Mountain Biologicals) and 1% penicillin-streptomycin (Gibco). mEScells(E14tg2a) were a gift from C. Schröter. mEScellswere cultured in Ndiff 227 medium (Takara) supplemented with 1% penicillin-streptomycin, 3 µM CHIR99021 (Millipore-Sigma), 1 µM STEMGENT PD0325901 (Reprocell) and 1,000 units of ESGRO recombinant mouse LIF protein (Sigma-Aldrich). For culturing of both MEFs and mEScells, wells in the culture plates were coated with 0.1% gelatin in a 37 °C incubator for 1 h. Cells were grown with 5% CO2 at 37 °C. Cell lines were used as received without authentication or a test for mycoplasma.

For transient transfection, HEK293T cells were cultured to 70–90% confluency in a 24-well plate. For prime editing, 375 ng of Prime Editor-2 enzyme plasmid (Addgene, 132776) and 125 ng of pegRNA plasmid were mixed and prepared with transfection reagent (Lipofectamine 3000) following the recommended protocol from the vendor. Cells were cultured for 4 to 5 days after the initial transfection unless noted otherwise, and genomic DNA was collected following cell lysis and the protease protocol from ref. 2.

Both MEFs and mEScells were transfected using 4D-Nucleofector (Lonza Bioscience). For MEFs, about 200,000 cells were resuspended in 20 µl Nucleofector buffer with supplement, mixed with 800 ng of DNA plasmids (600 ng of pCMV-PEmax-P2A-hMLH1dn and 200 ng of epegRNA plasmid), loaded onto a 16-well strip cuvette and electroporated using programme CM137 in the 4D-Nucleofector. For mEScells, about 50,000 cells were resuspended in 20 µl Nucleofector buffer with supplement, mixed with 800 ng of DNA plasmids (600 ng of pCMV-PEmax-P2A-hMLH1dn and 200 ng of epegRNA plasmid), loaded onto a 16-well strip cuvette and electroporated using programme CG104 in the 4D-Nucleofector. Cells were cultured for four more days before genomic DNA collection or the subsequent transfection in the case of mEScells.

For lentivirus generation, approximately 300,000 HEK293T cells were seeded in each well of a six-well plate and cultured to 70–90% confluency. The lentiviral plasmid was transfected into cells along with the ViraPower lentiviral expression system (Thermo Fisher), following the recommended protocol from the vendor. Lentivirus was collected following the same protocol, concentrated overnight using Peg-it Virus Precipitation Solution (SBI) and used within 1–2 days to transduce HEK293T cells without a freeze–thaw cycle. To achieve high MOI, we used the MagnetoFection protocol (OZ Bioscience). For the lineage tracing experiments, transduced cells were serially diluted and seeded in 96-well plates to identify monoclonal lines. Dox concentrations were maintained by including 10 mg l–1 Dox in the initial culture and replenishing it every 5 days, to account for the 24- to 48-hour half-life of Dox in culture medium.

For transposase integration, 500 ng of cargo plasmid and 100 ng of Super piggyBac transposase expression vector (SBI) were mixed and prepared with transfection reagent (Lipofectamine 3000) following the recommended protocol from the vendor and then transfected into confluent 24-well plates. A monoclonal cell line with Dox-inducible expression of PE2 was generated by integrating the coding sequence for PE2 using the piggyBac transposase system and selecting clones by prime editing activity, as previously described27.

Genomic DNA collection and sequencing library preparation

The targeted region from collected genomic DNA was amplified using two-step PCR and sequenced using an Illumina sequencing platform (NextSeq or MiSeq). The first PCR (KAPA Robust polymerase) included 1.5 µl of cell lysate and 0.04 to 0.4 µM of forward and reverse primers in a final reaction volume of 25 µl. In the first PCR, samples were incubated for 3 min at 95 °C; 15 s at 95 °C, 10 s at 65 °C and 90 s at 72 °C for 25–28 cycles; and 1 min at 72 °C. Primers included sequencing adaptors at their 3′ ends, appending them to both termini of the PCR products amplified from genomic DNA. After the first PCR step, products were assessed on a 6% TBE gel, purified using 1.0× AMPure beads (Beckman Coulter) and added to the second PCR that appended dual sample index sequences and flow cell adaptors. The second PCR programme was identical to the first except that we ran it for only 5–10 cycles. Products were again purified using AMPure beads and assessed on a TapeStation (Agilent) before being denatured for the sequencing run.

To append 10-bp unique molecular identifiers (UMIs), we performed PCR in three steps: first, genomic DNA was linearly amplified in the presence of 0.04 to 0.4 µM of a single forward primer in two PCR cycles using KAPA Robust polymerase. Specifically, we programmed the UMI-appending linear PCR to incubate samples for 3 min and 15 s at 95 °C; 1 min at 65 °C followed by 2 min at 72 °C for 5 cycles; 15 s at 95 °C; and 1 min at 65 °C followed by 2 min at 72 °C for 5 cycles. Second, this reaction was cleaned up using 1.5× AMPure beads, followed by a second PCR with forward and reverse primers: 3 min at 95 °C; 15 s at 95 °C, 10 s at 65 °C and 90 s at 72 °C for 25–28 cycles; and 1 min at 72 °C. In this PCR, the forward primer bound upstream of the UMI sequence and was not specific to the genomic locus. Finally, after PCR amplification, products were cleaned up using AMPure magnetic beads (1.0×, following the protocol from Beckman Coulter) and added to the third and last PCR that appended dual sample index sequences and flow cell adaptors. The run parameters for the third PCR were the same as for the second PCR except that only 5–10 cycles were used. TAPE construct sequences and PCR primer sequences are provided in Supplementary Tables 4 and 5, respectively.

For long-read amplicon sequencing library preparation, we used a one-step PCR protocol: the first PCR (KAPA Robust polymerase) included 1.5 µl of cell lysate and 0.04 to 0.4 µM of forward and reverse primers with Pacific Bioscience sample index sequences in a final reaction volume of 25 µl. We programmed the first PCR to incubate samples for 3 min at 95 °C; 15 s at 95 °C, 10 s at 65 °C and 3 min at 72 °C for 25–28 cycles; and 1 min at 72 °C. After the first PCR step, products were purified using 0.6× AMPure beads (Beckman Coulter), assessed on a TapeStation (Agilent) and sequenced on the Sequel platform (Pacific Biosciences; Laboratory of Biotechnology and Bioanalysis, Washington State University).

Genomic DNA amplicon sequencing data processing and analysis

Sequencing reads from the Illumina MiSeq and NextSeq platforms were first demultiplexed using BCL2fastq software (Illumina). For the experiments shown in Fig. 1 (and Extended Data Figs. 13), sequencing libraries were single-end sequenced to cover the DNA Tape from one direction. For the experiments shown in Figs. 2 and 3 (and Extended Data Figs. 4 and 5), sequencing libraries were paired-end sequenced to cover the entire array from both directions. Paired reads were then merged using PEAR40 with default parameters to reduce sequencing errors. Insertion sequences, in the form of NNGGA (5-mer) to NNNNNNGGA (9-mer), were extracted from sequencing reads of the TAPE arrays, including 2×TAPE-1, 3×TAPE-1 and 5×TAPE-1, using pattern-matching software such as Regular Expression (package REGEX) in Python. Insertions (4–6 bp) in 3×TAPE-1 to 3×TAPE-48 were also extracted using REGEX pattern-matching software.

For the sequential transfection epoch experiment shown in Fig. 2, we first extracted 5-mer insertions from 5×TAPE-1 sequencing reads and used a k-means clustering algorithm to filter out possible PCR and sequencing errors with low read counts. Such filtering removed all reads that had the wrong key sequence (GGA in the case of TAPE-1), leaving a set of 16 possible 5-mer sequences in the form of NNGGA. Across five repeats of insertion sites in 5×TAPE-1, we calculated the separate unigram frequencies for each site, which were used to build the unigram order as shown in Extended Data Fig. 4c. Bigram frequencies for adjacent insertion sites (site 1 and site 2, site 2 and site 3, site 3 and site 4, and site 4 and site 5) were combined, normalized across the row and column, and used to build the bigram transition matrices shown in Fig. 2c–g. For ordering of barcodes according to their transfection history, we first generated a unigram order by sorting relative frequency at site 1, with barcodes assumed to have been transfected earlier if they appeared more frequently in site 1 than in the other sites. Using the resulting unigram order as the initial order, we implemented an iterative algorithm where we passed through the order from early to late, swapped the order if a bigram frequency was inconsistent with the order and restarted the pass unless there had been no swaps in a single pass.

For the short digital text encoding experiment shown in Fig. 3, we extracted 6-mer insertions, corrected the read counts of each 6-mer by editing efficiency (using separately measured insertion frequency and respective plasmid abundance, similarly to the process described in Extended Data Fig. 1d,f), used a k-means clustering algorithm to identify NNNGGA barcodes and built a bigram transition matrix as described in the paragraph above. We first analysed the bigram transition matrices using a hierarchical clustering algorithm with default parameters in R software (using a Euclidean distance measure and the complete linkage clustering method, as described in Extended Data Fig. 5). Putative sets of barcodes (cotransfection sets with generally 2–4 barcodes) were visually identified on the basis of the dendrogram and used to group barcodes in the output bigram order of the algorithm used above. The order within the cotransfection sets was determined using corrected unigram counts combined across all five sites, where more abundant barcodes were assigned to be earlier within the set. Barcodes were mapped back to the text following the encoding table (Supplementary Table 2).

For the long-read sequencing experiment described in Extended Data Fig. 7, 12×TAPE-1 and 20×TAPE-1 sequences were isolated from Pacific Biosciences CCS reads. The number of TAPE monomers and insertions was calculated using sequential text matching around insertions and the expected length of the array based on insertion counts. Reads without a match between expected length and observed length were filtered out. Each 12×TAPE-1 and 20×TAPE-1 construct is associated with an 8-bp degenerate barcode sequence (TargetBC). Assuming that the integration sites for each TargetBC were different, we grouped reads from any given replicate that had the same TargetBC. On the basis of our observation that array collapse is more frequent than array expansion, we selected the read with the maximum number of TAPE monomers from each set of reads that shared a TargetBC. If multiple reads were in a tie by this criterion, we selected the one (or one of the ones) with the most edits for presentation in Extended Data Fig. 7g,h. For presentation in Extended Data Fig. 7c–h, we selected reads that had at least three insertions and at most 12 or 20 TAPE-1 monomers (Extended Data Fig. 7c–f) or at most 25 TAPE-1 monomers (Extended Data Fig. 7g,h).

Single-cell lineage tracing experiment and analysis

Monoclonal HEK293T cells containing 5×TAPE-1, iPE2 and multiple TargetBC-5×TAPE-1-pegRNA constructs were cultured for 25 days in the presence of 10 mg l–1 Dox. Dox was replenished every 5 days, to account for the 24- to 48-hour half-life of Dox in culture medium. The initial culture in a 96-well plate was moved to a 24-well plate and subsequently to a 6-well plate, when the culture was 80–90% confluent. Once the monoclonal cell line reached confluence in the six-well plate (estimated to be 1.2 million cells), cells were frozen and thawed for a single-cell experiment in the absence of Dox. For preparation of cells for the single-cell experiment, cells were dissociated, pelleted by centrifugation at 200 RCF for 5 min and resuspended in a single-cell suspension in 0.04% BSA (NEB) in 1× PBS at a concentration of 1,000 cells per µl following the Cell Preparation Guide from 10x Genomics (manual part no. CG00053 Rev C). Cell numbers and the single-cell suspension were checked using both a manual haemocytometer and a Countess II FL Cell Counter (Thermo Fisher).

Single-cell suspensions of cells were directly used in the 10x Genomics experimental protocol (Chromium Next GEM Single-Cell 3′ Reagent Kit v3.1 with Feature Barcoding Technology for CRISPR Screening; manual part no. CG000205 Rev D). We strictly followed the protocol with recovery of 20,000 targeted cells (10,000 per reaction) until step 2.3. The protocol is written for the CRISPR Screening library, where Feature Barcode components including CRISPR gRNA sequences would be collected in step 2.3B, owing to its smaller size compared with the 3′ Gene Expression library (collected in step 2.3A). In our case, we expected our Feature Barcode components including TargetBC-5×TAPE-1 constructs tagged with 16-nucloetide 10x single-cell barcodes (CBC) and 12-bp UMIs from reverse transcription to be greater than 1 kb in length and therefore collected along with the 3′ Gene Expression library. Nonetheless, we collected both components (the eluates from steps 2.3A and 2.3B) and detected TargetBC-5×TAPE-1 constructs in both using quantitative PCR. Detection of TargetBC-5×TAPE-1 constructs from step 2.3B was unexpected but could have resulted from non-processive reverse transcription that generated shorter cDNA products. We combined the TargetBC-5×TAPE-1 constructs and used paired-end sequencing to obtain CBC, UMI and TargetBC-5×TAPE-1 sequences for each read, along with the 3′ Gene Expression library.

For the initial analysis, we used the CellRanger pipeline from 10x Genomics, which filtered out single-cell barcodes (CBC) and UMIs, recovering about 12,000 cells. We selected reads that contained approved CBC and UMI sequences and extracted TargetBC-5×TAPE-1 sequences from the CellRanger output BAM file. Reads with different UMIs were collapsed on the basis of shared CBC-TargetBC-5×TAPE-1 sequences, and any CBC-TargetBC-5×TAPE-1 reads that had fewer than two UMI sequences associated with them were removed. In cases where we observed the same CBC-TargetBC pairs but with different 5×TAPE-1 sequences, we took the consensus sequence with a larger number of associated UMIs.

For the monoclonal lineage tracing experiment, we corrected the observed TargetBC if it contained a single-nucleotide mismatch with respect to the approved list of the 19 most frequent 8-bp sequences. If the TargetBC differed from the list of sequences by more than 2 nucleotides, we removed the corresponding reads from further analysis. For detection of the 14-bp TAPE-1 sequence, a single-base-pair mismatch or substitution error was corrected to the TAPE-1 sequence. We also filtered out TargetBC-5×TAPE-1 arrays that included InsertBCs other than the top 19 most frequent ones. This resulted in a table where each row contained a CBC, TargetBC and up to five InsertBCs (unedited positions left blank) (Supplementary Data).

For lineage tree reconstruction, only cells (CBC) that included the top 13 most frequent TargetBCs were selected (3,257 cells). This ‘top 13’ list excluded the corrupted ATAAGCGG TargetBC (where the second TAPE-1 monomer appeared to have been contracted by 6 bp, inactivating the type guide). We calculated a 3,257 × 3,257 distance matrix by counting the number of shared InsertBCs across 13 × 5 = 65 sites, only counting them if they had the same InsertBC at previous sites (five possible sites per TargetBC; unedited sites were excluded), and then subtracting the count from the maximum number of shared InsertBCs (59, excluding 6 missing sites from three 4×TAPE-1 arrays and one 2×TAPE-1 array) to calculate the distance between a pair of cells. The resulting distance matrix was used as an argument in the UPGMA and neighbour-joining clustering functions in the R phangorn package41. Tree visualizations, bootstrapping analysis and parsimony analysis were performed using the R ape package42 and included functions. Bootstrap resampling was done on blocks of sites within the same TargetBC-TAPE-1 array (that is, resampling with replacement of the intact TAPE-1 arrays associated with the 13 TargetBCs). We then used the same function to calculate the distance matrix as described above, counting InsertBCs as shared only if they had the same InsertBC at previous sites within the TargetBC-TAPE-1 array.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Source link