April 26, 2024
Graph pangenome captures missing heritability and empowers tomato breeding – Nature

Graph pangenome captures missing heritability and empowers tomato breeding – Nature

  • Maher, B. Personal genomes: the case of the missing heritability. Nature 456, 18–21 (2008).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

    ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Young, A. I. Solving the missing heritability problem. PLoS Genet. 15, e1008222 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • Visscher, P. M. Sizing up human height variation. Nat. Genet. 40, 489–490 (2008).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Hemani, G., Knott, S. & Haley, C. An evolutionary perspective on epistasis and the missing heritability. PLoS Genet. 9, e1003295 (2013).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Brachi, B., Morris, G. P. & Borevitz, J. O. Genome-wide association studies in plants: the missing heritability is in the field. Genome Biol. 12, 232 (2011).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

    ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Eichler, E. E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11, 446–450 (2010).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Martiniano, R., Garrison, E., Jones, E. R., Manica, A. & Durbin, R. Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph. Genome Biol. 21, 250 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article 
    CAS 

    Google Scholar
     

  • Jayakodi, M. et al. The barley pan-genome reveals the hidden legacy of mutation breeding. Nature 588, 284–289 (2020).

    ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Hufford, M. B. et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021).

    ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • The Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief. Bioinform. 19, 118–135 (2018).


    Google Scholar
     

  • Rakocevic, G. et al. Fast and accurate genomic analyses using genome graphs. Nat. Genet. 51, 354–362 (2019).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Chen, S. et al. Paragraph: a graph-based structural variant genotyper for short-read sequence data. Genome Biol. 20, 291 (2019).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Sirén, J. et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 374, eabg8871 (2021).

    Article 
    CAS 

    Google Scholar
     

  • Liu, Y. et al. Pan-genome of wild and cultivated soybeans. Cell 182, 162–176 (2020).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 3542–3558 (2021).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Hosmani, P. S. et al. An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps. Preprint at bioRxiv https://doi.org/10.1101/767764 (2019).

  • Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Wang, X. et al. Genome of Solanum pimpinellifolium provides insights into structural variants during tomato breeding. Nat. Commun. 11, 5817 (2020).

    ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Causse, M. et al. Whole genome resequencing in tomato reveals variation associated with introgression and breeding events. BMC Genom. 14, 791 (2013).

    Article 
    CAS 

    Google Scholar
     

  • Aflitos, S. et al. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole‐genome sequencing. Plant J. 80, 136–148 (2014).

    PubMed 
    Article 
    CAS 

    Google Scholar
     

  • Bolger, A. et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat. Genet. 46, 1034–1038 (2014).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Lin, T. et al. Genomic analyses provide insights into the history of tomato breeding. Nat. Genet. 46, 1220–1226 (2014).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355, 391–394 (2017).

    ADS 
    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Sim, S.-C. et al. Development of a large SNP genotyping array and generation of high-density genetic maps in tomato. PLoS ONE 7, e40563 (2012).

    ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Zhu, G. et al. Rewiring of the fruit metabolome in tomato breeding. Cell 172, 249–261 (2018).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Vinces, M. D., Legendre, M., Caldara, M., Hagihara, M. & Verstrepen, K. J. Unstable tandem repeats in promoters confer transcriptional evolvability. Science 324, 1213–1216 (2009).

    ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • The GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • Hormozdiari, F. et al. Widespread allelic heterogeneity in complex traits. Am. J. Hum. Genet. 100, 789–802 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Hormozdiari, F., Jung, J., Eskin, E. & Joo, J. W. J. MARS: leveraging allelic heterogeneity to increase power of association testing. Genome Biol. 22, 128 (2021).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Wang, L. & Michoel, T. Controlling false discoveries in Bayesian gene networks with lasso regression p-values. Preprint at arXiv https://arxiv.org/abs/1701.07011 (2017).

  • Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).

    ADS 
    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51, 675–682 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 9, 559 (2008).

    Article 
    CAS 

    Google Scholar
     

  • Della Coletta, R., Qiu, Y., Ou, S., Hufford, M. B. & Hirsch, C. N. How the pan-genome is changing crop genomics and improvement. Genome Biol. 22, 3 (2021).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Li, N. et al. Identification of the carbohydrate and organic acid metabolism genes responsible for brix in tomato fruit by transcriptome and metabolome analysis. Front. Genet. 12, 714942 (2021).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Awad, M. & Gan, X. GALA: gap-free chromosome-scale assembly with long reads. Preprint at bioRxiv https://doi.org/10.1101/2020.05.15.097428 (2020).

  • Alonge, M. et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 20, 224 (2019).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinform. 12, 491 (2011).

    Article 

    Google Scholar
     

  • Liu, P., Soukup, A. A., Bresnick, E. H., Dewey, C. N. & Keleş, S. PRAM: a novel pooling approach for discovering intergenic transcripts from large-scale RNA sequencing experiments. Genome Res. 30, 1655–1666 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Niknafs, Y. S., Pandian, B., Iyer, H. K., Chinnaiyan, A. M. & Iyer, M. K. TACO produces robust multisample transcriptome assemblies from RNA-seq. Nat. Methods 14, 68–70 (2017).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).

    Article 

    Google Scholar
     

  • Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Pham, G. M. et al. Construction of a chromosome-scale long-read reference genome assembly for potato. Gigascience 9, giaa100 (2020).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Hoff, K., Lomsadze, A., Borodovsky, M. & Stanke, M. Whole-genome annotation with BRAKER. Methods Mol. Biol. 1962, 65–95 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Brůna, T., Lomsadze, A. & Borodovsky, M. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom. Bioinform. 2, lqaa026 (2020).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Heller, D. & Vingron, M. SVIM: structural variant identification using mapped long reads. Bioinformatics 35, 2907–2915 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Nattestad, M. & Schatz, M. C. Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics 32, 3021–3023 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Speed, D., Holmes, J. & Balding, D. J. Evaluating and improving heritability models using summary statistics. Nat. Genet. 52, 458–462 (2020).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Endelman, J. B. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4, 250–255 (2011).

    Article 

    Google Scholar
     

  • Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • Source link