April 1, 2023
FinnGen provides genetic insights from a well-phenotyped isolated population – Nature

FinnGen provides genetic insights from a well-phenotyped isolated population – Nature

To benchmark our register-based phenotyping and to explore the value of the isolated setting of Finland, we selected 15 diseases with more than 1,000 cases in FinnGen and for which well-powered GWAS data have been published. We evaluated the accuracy of our phenotyping by comparing the genetic correlations and effect sizes with the previous GWAS results (Supplementary Table 6). None of the genetic correlations were significantly lower than 1 (the lowest genetic correlation was 0.89 (standard error = 0.07) in age-related macular degeneration (AMD); Supplementary Table 6). For diseases with a large number of cases in FinnGen, the effect sizes of lead variants in known loci were largely consistent between FinnGen and previously published meta-analyses. This result demonstrates that our register-based phenotyping is comparable to existing disease-specific GWASs (Fig. 1e, Supplementary Information and Supplementary Table 6). The effect sizes varied more in some diseases that have a smaller number of cases in FinnGen (for example, ankylosing spondylitis, n = 1462, r2 = 0.62).

GWAS of these 15 diseases identified 235 loci (that is, regions selected for fine-mapping; Methods) and 275 independent genome-wide significant associations (here onwards, ‘association’ means an independent signal) outside the human leukocyte antigen (HLA) region (GRCh38, chromosome 6: 25–34 Mb). A phenome-wide association study (PheWAS) of FinnGen imputed classical HLA gene alleles has been previously reported8. Overall, 44 of the non-HLA associations were driven by low-frequency lead variants (we define ‘low frequency’ as an AF of <5% in non-Finnish, Swedish or Estonian European (NFSEE) individuals in the Genome Aggregation Database (gnomAD; v.2.0.1)9) that were more than twice as frequent in Finnish individuals compared with NFSEE individuals. We use NFSEE as a general continental European reference point, excluding individuals from Finland, Sweden and Estonia. As there were large-scale migrations from Finland to Sweden in the twentieth century, many of the chromosomes from sequencing studies of Swedish individuals are of recent Finnish origin. Moreover, the geographically close and linguistically and genetically similar9 population of Estonia is likely to share elements of the same ancestral founder effect.

Replication of many such enriched variant associations in the Finnish population is hindered by low AFs or missingness in other European populations. People from Finland are genetically more similar to people from Estonia than other European countries9. Therefore we first conducted replication using data from 136,724 individuals from the Estonian Biobank (EstBB) and then extended the analysis to individuals from the UKBB (Methods and see Supplementary Table 7 for definitions of end points and case–control numbers). The effect sizes in genome-wide significant hits in FinnGen were mostly concordant with the EstBB (average inverse variance weighted slope of 1.5 (with FinnGen higher) and r2 = 0.69) and the UKBB (slope = 1.1, r2 = 0.84) (Extended Data Fig. 3). FinnGen had a higher case prevalence in the 15 disease diagnoses than in the UKBB, which is probably due to slightly different ascertainment schemes. By contrast, the EstBB had the highest case prevalence in ophthalmic diseases (AMD and glaucoma) and inflammatory skin conditions (atopic dermatitis and psoriasis) (Fig. 2a).

Fig. 2: Comparison of previously unknown and known lead variants in loci identified in the 15 studied diseases.
figure 2

a, Case prevalence and counts in FinnGen, the EstBB and the UKBB. The phenotypes are sorted on the basis of FinnGen prevalence. b, Distribution of minor AFs in known (red) and new (blue) loci in the NFSEE population. c, Distribution of AF enrichment between Finland and other Northwestern European populations in gnomAD (excluding Estonia and Sweden). The x axis represents enrichment bins. d, AFs of 25 replicated genome-wide significant (in FinnGen discovery) new low-frequency (<5% in NFSEE populations) variants in FinnGen, the EstBB and the UKBB. The dotted line indicates the same variants and no line means absence of the variant in other biobanks.

After a meta-analysis of the EstBB and UKBB data, 241 of the 275 associations remained genome-wide significant (Supplementary Table 8). We performed a further meta-analysis of 232 associations that did not meet the genome-wide significance threshold in FinnGen (5 × 10−8 < P < 1 × 10−6), and 57 of those were genome-wide significant after meta-analysis. This meta-analysis resulted in 298 genome-wide significant associations (see also Supplementary Table 8 for results after multiple testing correction for 15 end points).

To determine whether the observed associations have been previously reported, we queried the GWAS Catalog association database (and largest recent relevant GWAS) for genome-wide significant (P < 5 × 10−8) variants that are in linkage disequilibrium (LD) (r2 > 0.1 in the FinnGen imputation panel) with observed lead variants in FinnGen. As the lowest AF of the new findings was low (0.15%), in addition to published GWASs, we checked whether credible set variants in these loci have also been previously reported in ClinVar. We observed six known pathogenic or likely pathogenic variants, such as a frameshift variant in PALB2 (p.Leu531fs; AF of 0.1%, not observed outside Finland in gnomAD; Supplementary Table 8) associated with breast cancer. Thirty out of the 298 associations have not been previously reported in the largest published meta-analysis so far (Supplementary Table 6), in a manual literature search, the GWAS Catalog or in ClinVar (Table 1). As expected, we observed that lead variants in novel loci were mostly of low frequency and enriched in Finland compared with known loci from previous GWASs. Specifically, 27 lead variants had minor allele frequency (MAF) values of <5% in gnomAD NFSEE individuals, and 88% of novel and 11% of known loci (after LD pruning, see below) had gnomAD NFSEE MAF values of <5% (Fisher’s exact test, P= 4.29 × 10−17). In most cases, the AFs of lower frequency variants (MAF < 5% in gnomAD NFSEE population) were the highest in FinnGen followed by the EstBB and lowest in NFSEE individuals in gnomAD (Fig. 2d).

Table 1 A total of 30 previously unreported associations identified in a GWAS of 15 selected, previously extensively studied phenotypes

Next we performed statistical fine-mapping (Methods) on all 298 genome-wide significant associations (each association is independent; that is, 298 credible sets). Coding variants (missense, frameshift, canonical splice site, stop gained, stop lost or inframe deletion) with posterior inclusion probability (PIP) values of ≥0.05 were observed in 44 (18.7%) out of the 95% credible sets (17 coding variants had PIP > 0.5). Here onwards, we report coding variants with PIP > 0.05 as putatively causal. We recognize that there may be occasions in which assignment of the causal variant to a coding variant is incorrect (see our accompanying paper10 for discussions on fine-mapping calibration and replicability). In addition to identifying putative causal coding variants, we sought to identify potential gene expression regulatory mechanisms by colocalizing credible sets with fine-mapped expression quantitative trait locus (eQTL) datasets from the eQTL Catalogue (Methods).

We then wanted to describe the AF spectrum and putative mechanisms of action of risk variants. To do so, we LD pruned the 298 genome-wide significant associations and prioritized the most significant phenotype among the same hits to represent a single putative causal variant (LD r2 value between lead variants of <0.2). This process resulted in 281 previously unknown associations (27 new).

Most of the 281 previously unknown associations were common variant associations. However, 53 of these had a lead variant frequency of less than 5% in NFSEE individuals, and 38 of them were enriched by more than two times in the Finnish population compared with the NFSEE population. We observed a coding variant more often in the credible sets of associations that were enriched by more than twofold (19 out of 38; 50%) than in non-enriched associations (6 out of 15; 40%) at lower frequencies (MAF < 5%).

Following the discovery of 27 new associations, we sought to determine potential mechanisms of action through the identification of coding variants in their credible sets and potential regulatory effects by colocalization with eQTL associations from the eQTL Catalogue. We identified putative causal coding variants in 9 out of 27 loci and eQTL colocalization in 4 out of 27 loci. In two out of the four eQTL loci, we observed a coding variant in credible sets (IL4R and MYH14; the eQTLs point to different genes than the coding variants). The two remaining eQTL colocalizations were breast cancer loci colocalizing with H2BP2 eQTL in lung tissue and type 2 diabetes colocalizing with PRRG4 in lipopolysaccharide-stimulated monocytes. The disease relevance of these eQTLs is currently not evident.

No credible coding variants or eQTLs were identified in 16 out of 27 loci (Supplementary Table 8). The fraction of associations in which we observed eQTLs was small (14.8%). Most of the new associations were driven by variants with low AFs in NFSEE populations (Table 1 and Fig. 2b,d). The low fraction of observed eQTL colocalizations is probably explained by the low AF of 25 out of the 27 of the variants in available eQTL studies (such as GTEx), for which the majority of the samples do not have Finnish or Estonian ancestry.

We next aimed to explore the benefits of the FinnGen dataset in GWAS discovery. We extrapolated observed meta-analysis results in FinnGen, the UKBB and the EstBB to match the sample size of the UKBB in 14 demonstration diseases (excluding Alzheimer’s disease;  Supplementary Methods). The distribution of extrapolated P values was shifted towards greater significance in FinnGen compared with those of the UKBB and the EstBB in a matched total sample size scenario for the 14 demonstration diseases ( Supplementary Methods and Supplementary Fig. 11). Moreover, frequency enrichment was a major driver in the gain of power in low-frequency variants (Supplementary Fig. 12). In individual end points with similar sample prevalence in FinnGen and the UKBB, similar for inflammatory bowel disease (IBD), the greatest gain in power was in variants in which the AFs are <0.5% in the UKBB (see Supplementary Fig. 13 for a comparison for each end point and biobank).

The identification of a new signal for IBD mapping to a single variant in an intron of TNRC18 highlights the value of FinnGen for discovery, even when the case sample size is below that of existing meta-analyses. This variant has a strong risk-increasing effect (AF = 3.6%, odds ratio (OR) = 3.2, P = 2.4 × 10−61), which eclipses the significance of signals at IL23R, NOD2 and the major histocompatibility complex. The variant is enriched by 114-fold in the Finnish population compared with the NFSEE population, in whom the AF is too low (0.04%) to have been identified in previous GWASs (this FinnGen association was also reported in ref. 11). We were, however, able to replicate this association in the EstBB (AF = 1.3%, OR = 3.9, P = 2.8 × 10−6) owing to the relatively higher frequency in the genetically related Estonian population. This variant was also associated with risk for multiple other inflammatory conditions evaluated in FinnGen, including interstitial lung disease (OR = 1.43, P = 6.3 × 10−26), ankylosing spondylitis (OR = 4.2, P = 1.8 × 10−34), iridocyclitis (OR = 2.3, P = 1.2 × 10−27) and psoriasis (OR = 1.6, P = 1.1 × 10−13). However, the same allele appears to be protective for an end point that combines multiple autoimmune diseases (https://r5.risteys.finngen.fi/phenocode/AUTOIMMUNE) (OR = 0.84, P = 6.2 × 10−12; for example, type 1 diabetes (OR = 0.64, P = 2.7 × 10−7) and hypothyroidism (OR = 0.85, P = 7.8 × 10−7).

The highest number (eight loci) of new and enriched low-frequency associations were identified in type 2 diabetes, which is probably due to the large number of patients with type 2 diabetes in FinnGen release 5 (29,193). Other noteworthy observations from this set of 30 findings for 15 well-studied diseases are described in Supplementary Note 1.

Coding variant associations

Motivated by the identification of high-effect coding variant associations within the selected 15 diseases, we performed a PheWAS followed by fine-mapping to identify putative causal coding variants enriched in the Finnish population.

In a GWAS of 1,932 distinct end points and 16,387,711 variants (Supplementary Table 4; case overlap < 50% and n cases > 80), we identified 2,733 independent associations in 2,496 loci across 807 end points (Supplementary Table 9) at a genome-wide significance threshold (P < 5 × 10−8). Moreover, 893 signals in 771 loci across 247 end points at PWS thresholds (P < 2.6 × 10−11) were identified. The HLA region was excluded here, and a PheWAS of imputed classical HLA gene alleles in FinnGen is reported in ref. 8.

Using statistical fine-mapping, we observed a coding variant (missense, frameshift, canonical splice site, stop gained, stop lost or inframe deletion; PIP > 0.05) in 369 associations (13.5% of all associations) spanning 202 end points. Full results with all 2,803 end points (including end points with a case overlap of >50% that are excluded here) are publicly available from a customized browser based on the PheWeb code base (https://r5.finngen.fi) and as summary statistic files (https://www.finngen.fi/en/access_results).

To put the frequency spectrum and putative mechanisms of action in an interpretable context, we chose a single most-significant association per signal by LD-based merging (r2 > 0.3 lead variants merged), which resulted in 1,838 unique associations in 681 end points (Supplementary Table 10). Overall, 493 of the associations in 112 end points were PWS (P < 2.6 × 10−11). Although most of the 493 PWS unique associations were driven by common variants, 143 and 97 had a lead variant frequency of <5% and <1%, respectively, in gnomAD NFSEE populations. We observed that 82 (57.3%) of the 143 low-frequency (MAF < 5%) lead variants were enriched by more than twofold in Finland compared with NFSEE populations. To estimate the number of putative new associations, we searched for known significant associations using the Open Targets API platform (GWAS Catalogue and the UKBB) and ClinVar for each of the 1,838 associations. Among these, 864 (47%) were not associated with any phenotype in those databases (75 out of 493 (15%) of the stringent P < 2.6 × 10−11 associations). The fraction of previously unreported associations among genome-wide significant (702 out of 841 (84%)) and stringent (69 out of 143 (48%)) associations were notably higher among low-frequency variants (MAF < 5% in NFSEE individuals).

After statistical fine-mapping of the 493 unique PWS associations, we identified a coding variant (PIP > 0.05) in 73 (14.8%) of the credible sets associated with 42 end points (Supplementary Table 10). Most (43) of the fine-mapped coding variants had PIP values of >0.5 and 28 had PIP values of >0.9 (Fig. 3a). The highest proportion and the majority (54 out of 73) of associated coding variants had NFSEE MAF < 10% (Fig. 3b,c). The coding variant associations were more enriched in Finland than noncoding associations in associations driven by variants with AFs of <5% in NFSEE people (Fig. 3d; Wilcoxon rank sum test P = 3.6 × 10−3). For example, we observed a coding variant in 42% (34 out of 89) of the associations with a lead variant that was enriched by more than two times in Finland compared with NFSEE people among low-frequency associations (NFSEE MAF < 5%). By contrast, the proportion of coding variants was lower at 21.7% (13 out of 60) in non-enriched associations (see Extended Data Fig. 4 for enrichment in various NFSEE MAF bins). The higher proportion of coding variants in those that were enriched by more than two times persisted when the PIP threshold was increased to 0.2 (enriched, 30 out of 77 (35.8%); non-enriched, 11 out of 58 (18.9%)).

Fig. 3: Characteristics of unique associations in end points identified in FinnGen.
figure 3

Characteristics of 493 (73 with coding variants in the credible set) specific associations in 112 (42 end points with coding variants in the credible set) end points identified in FinnGen release 5. Note that 25 of the associations with a coding variant with PIP < 0.05 in credible sets were removed from plots as ‘uncertain to contain coding variant’. a, Distribution of fine-mapping PIP values of the 73 coding variants. b, AF spectrum in associations with and without coding variants in credible sets (CS). c, Proportion of coding variants identified in different AFs (in NFSEE individuals in gnomAD). The numbers above the bars indicate the number of associations within a bin, the y axis indicates the proportion of associations with coding variants in their credible sets. d, Enrichment in Finland as a function of AF in the gnomAD NFSEE population (enrichment value for variants with AF values of 0 in NFEE individuals in gnomAD was set to maximum observed enrichment value of log2(166) = 7.38). The smoothed regression lines of local average enrichment are estimated by local polynomial fitting (loess) and the shaded areas represent 95% confidence intervals of the model fit.

The fine-mapping properties and replicability of 67 FinnGen traits across diverse biobanks (FinnGen, Biobank Japan and the UKBB) are explored in detail in another manuscript10, and functional variant associations in the UKBB and FinnGen are described in ref. 12.

We next wanted to quantify the benefits of population isolates such as Finland in GWAS discovery. To this end, we assessed whether lower frequency (MAF < 5% in NFSEE people) variants enriched in the Finnish population were more likely to be associated with a phenotype than would be expected by chance. We randomly sampled 1,000,000 times the number of genome-wide significant variants observed (143) from a set of frequency-matched variants (MAF NFSEE < 5%) that were not associated with any end point (P > 0.001). None of the 1 million random draws had a higher proportion of variants enriched by more than twofold in the Finnish population than was observed in the significant associations (57.3% observed versus 33% expected; P = 1.0 × 10−16).

Known pathogenic variant associations

Among the genome-wide significant coding variant associations, we identified 13 variant associations (AF range of 0.04–2%) classified as pathogenic or likely pathogenic in ClinVar (Supplementary Table 10). Nine out of the 13 variants were enriched by more than 20-fold in Finland compared with NFSEE populations. Some of these variants have previously been primarily considered recessive. Here, however, we observed that some were a risk variant in the heterozygous state. An example is a rare frameshift variant at NPHS1 associated with nephrotic syndrome, including the congenital form (ICD-10: N04,p.Leu41fs; AF FinnGen = 0.9%; gnomAD NFSEE = 0.009%; OR = 185, P = 4.3 × 10−27). Congenital nephrotic syndrome in Finnish individuals is a recessively inherited rare disease, and is in the Finnish Disease Heritage database4. The pathogenic variant associations listed in ClinVar include a missense variant in XPA (xeroderma pigmentosum) associated with non-melanoma neoplasm of skin (‘other malignant neoplasm of skin’) (p.Arg228Ter; AF FinnGen = 0.02%, gnomAD NFSEE = 0%; OR = 4.4, P = 8.3 × 10−18), and the abovementioned frameshift variant in PALB2 associated with breast cancer (p.Leu531fs, ‘malignant neoplasm of breast’; p.Ala82Pro; AF FinnGen = 0.2%, gnomAD NFSEE = 0%; OR = 28.8, P = 3.7 × 10−33). Furthermore, a known pathogenic recessively acting missense variant in CERKL was associated with hereditary retinal dystrophy (p.Cys125Trp; AF FinnGen = 0.6%, gnomAD NFSEE = 0%; OR = 98,716, P = 5.15 × 10−25). This association is, however, driven by compound heterozygotes, as previously detailed13. These associations demonstrate that imputation using a population-specific genotyping array and an imputation panel combined with national-registry-based phenotyping in the isolated Finnish population can successfully identify associations and fine-map causal variants even in rare variants and phenotypes. An extended study of ClinVar variants and variants with specific biallelic Mendelian effects in FinnGen is provided in a companion paper13.

Associations in known disease genes

In the remaining 135 genome-wide significant coding variant associations not reported as pathogenic in ClinVar, 77 had NFSEE MAF values of <5%. Of the 77 variants, 54 were more than 5 times more common in Finland than in NFSEE populations, and 19 had not been previously observed in NFSEE people (Supplementary Table 2). Nine out of the 19 variants are in a gene in which other variants are pathogenic for various traits, 3 of which are for the same or related traits. These FinnGen associations include the following variants: a RFX6 frameshift variant associated with type 2 diabetes (p.His293LeufsTer7; AF = 0.15%, OR = 3.7, P = 1.2 × 10−10; ClinVar, ‘monogenic diabetes and others’); a TERT missense variant (AF = 0.15%, OR = 1,032, P = 6.5 × 10−21) associated with idiopathic pulmonary fibrosis (ClinVar, ‘idiopathic pulmonary fibrosis’); a missense in MYH14 associated with sensorineural hearing loss (p.Ala1156Ser; AF = 0.04%, OR = 19.9, P = 1 × 10−15; ClinVar, ‘non-syndromic hearing loss’ and others); and a stop gained variant in TG associated with autoimmune hypothyroidism (p.Gln655Ter; AF = 0.1%, OR = 3.2, P = 3.9 × 10−11). These variants in RFX6, TERT and TG have been previously observed in Finnish and Nordic cohorts14,15,16, but had uncertain significance (single carrier in TG) or conflicting interpretation (TERT) in ClinVar. Pathogenic variants in RFX6 cause Mitchell–Riley syndrome with recessive inheritance (characterized by neonatal diabetes). However, heterozygote enrichment of RFX6-truncating variants have been observed in maturity-onset diabetes of the young14, for which the same variant observed here was identified in a replication in Finnish data. RFX6 is a regulator of transcription factors involved in beta-cell maturation and has a specific role in releasing gastric inhibitory peptide (GIP) and GLP1 in response to meals. Our results propose that around 1:700 individuals in Finland carry a frameshift variant that has been previously shown to reduce incretin levels and to lead to isolated diabetes14. It is tempting to speculate that early administration of GLP1 analogues would benefit carriers of this diabetes-associated variant.

New disease associations

Among the previously undescribed genome-wide significant coding variant associations without previous associations in Open Targets (GWAS Catalog and the UKBB) or ClinVar, we observed 29 that had NFSEE MAF values of <5% and were 2 times more frequent in Finland, 9 of which had no copies in NFSEE populations (Supplementary Table 11). We summarize selected new discoveries and biological knowledge gained in Supplementary Table 12. A missense variant not observed outside Finland (p.Val70Phe; AF = 0.2%, OR = 3.0, P = 2.1 × 10−9) in PLTP was associated with coronary revascularization (n = 12,271 coronary angioplasty or bypass grafting). PLTP is a lipid-transfer protein in human plasma that transfers phospholipids from triglyceride-rich lipoproteins to high-density lipoprotein, and its activity is associated with atherogenesis in humans and mice17. Noncoding variations near PLTP independent of p.Val70Phe are associated with lipid levels (high-density lipoprotein and triglycerides)18 and coronary artery disease19. The identification of a coding variant in this gene provides support for PLTP as the causal gene for symptomatic atherosclerosis in this locus. Other variants associated with coronary artery disease included a missense variant (p.Gly567Arg; AF = 0.9%, OR = 2.0, P = 5.2 × 10−12) in HHIPL1, which was associated with coronary revascularization (n = 12,271), and a splice acceptor variant (c.7325-2A>G; AF = 0.7%, OR = 2.5, P = 2.9 × 10−08) in NBEAL1, which was associated with coronary artery bypass grafting (n = 5,779). Both genes are susceptibility loci for coronary artery disease19 and have been suggested as causal, although for NBEAL1 the evidence is inconsistent20. HHIPL1 encodes a secreted sonic hedgehog regulator that modulates atherosclerosis-relevant smooth muscle cell phenotypes and promotes atherosclerosis in mice21. NBEAL1 regulates cholesterol metabolism by modulating low-density lipoprotein (LDL) receptor expression, and genetic variants in NBEAL1 are associated with decreased expression of NBEAL1 in arteries22. Our results strengthen the evidence that both these genes are causal in the loci.

A missense variant in LAG3 (p.Pro67Thr; AF = 0.08%, gnomAD NFSEE = 0%) was associated with autoimmune hypothyroidism (n = 22,997, OR = 3.2, P = 4.6 × 10–8, lead variant P = 4.57 × 10–8). LAG3 encodes an immune checkpoint protein that is involved in inhibitory signalling of immune response, especially in T cells23. LAG3 has been a target of active immune checkpoint inhibitor cancer immunotherapy development. One such immunotherapy was recently approved by the US Food and Drug Administration as a combination treatment for unresectable or metastatic melanoma24. Immune checkpoint inhibition therapies aim to enhance immune responses against tumour cells. Excessive immune responses, however, can exert deleterious effects on healthy tissue and lead to autoimmune disease. A common side effect of immune checkpoint inhibitors, including those that target LAG3, is hypothyroidism. The p.Pro67Thr variant could be acting as an inhibitor of LAG3 immunoregulatory activity, which in turn leads to susceptibility to hypothyroidism. In a PheWAS of p.Pro67Thr, we observed a nominally increased risk for other immune-related conditions (for example, psoriatic arthropathies (M13_PSORIARTH_ICD10) n = 1,455, OR = 7.8, P = 3.3 × 10−3; urticaria and erythema (L12_URTICARIAERYTHEMA), n = 6,328, OR = 3.7, P = 2.7 × 10−4; and streptococcal septicaemia (AB1_STREPTO_SEPSIS), n = 1,090, OR = 15, P = 2.2 × 10−3), but we did not observe protective effects with any cancers. It should be noted, however, that owing to the rarity of the variant, the data were not sufficiently powered to detect more subtle effects.

We found a missense variant (p.Tyr212Phe, rs35937944) in COLGALT2 that was enriched by >20-fold in the Finnish population. This variant was associated with a reduced risk for arthrosis (OR = 0.79, P = 2.57 × 10−10), coxarthrosis (OR = 0.68, P = 1.34 × 10−19) and gonarthrosis (OR = 0.80, P = 7.5 × 10−7). A noncoding variant near COLGALT2 has recently been described as a GWAS locus for osteoarthritis25. COLGALT2 encodes the procollagen galactosyltransferase 2, which initiates post-translational modification of collagens by transferring β-galactose to hydroxylysine residues, an important step to ensure structure and function of bone and connective tissue. Modulating COLGALT2 enzymatic activity with drugs could be a potential strategy to reduce arthritis risk.

CD63 is a cell surface protein involved in basophil activation and mast cell degranulation. We identified a missense variant in CD63 (rs148781286) that was enriched by >42-fold in the Finnish population. This variant was associated with childhood asthma (OR = 3.5, P = 3.37 × 10–9). In a combined analysis with data from the EstBB and the UKBB, this variant was also associated with atopic dermatitis26. Mediators secreted by basophils and mast cells correlate with asthma severity in the clinic, and a CD63-based basophil activation test has been reported to predict asthma outcome in young children with wheezing episodes27. The observation of a putative causal relationship between genetic variations in CD36, basophil activation and childhood asthma risk and severity may point to a new intervention point for targeted asthma therapies.

A missense variant in TUBA1C (p.Ala331Val; AF = 0.2%, OR = 35.2, P = 1.4 × 10−10) was associated with sudden idiopathic hearing loss (n = 1,491). No relevant phenotype has previously been reported for variants in TUBA1C. TUBA1C encodes an α-tubulin isotype. The precise roles of α-tubulin isotypes are unknown, but mutations in other tubulins can cause various neurodevelopmental disorders28. The p.Ala331Val variant was also associated with vestibular neuritis (inflammation of the vestibular nerve; n = 1,224, OR = 40.9, P = 3.2 × 10−10). Pure vestibular neuritis presents acutely with vertigo but not hearing loss, and accurate diagnosis of vertigo in acute settings is challenging and misdiagnosis is possible.

A >30-fold-enriched missense variant, pThr155Met (rs145955907), in ZAP70 was associated with sarcoidosis (OR = 2.05, P = 1.03 × 10−8). Previously, homozygote or compound heterozygote mutations in ZAP70 have been described in cell-mediated combined immunodeficiency caused by abnormal T cell receptor signalling29. Associations of heterozygote variants have not been associated with any disease so far. Given its crucial role in cell signalling, the ZAP70 association with sarcoidosis seems in line with its key role in immunity.

A 75-fold-enriched missense variant, p.Ala777Thr (rs199680517), in PPP1R26 was associated with endometriosis (OR = 1.97, P = 3.41 × 10−8). PPP1R26 (protein phosphatase 1 regulatory subunit 26) has been associated with tumour formation and has been observed to be upregulated in various malignancies. Cellular GWAS analyses have identified one variant to be associated with carboplatin-induced toxicity30. In one study, a copy number variant has been associated with endometriosis, but how this gene contributes to endometriosis susceptibility remains speculative31.

We also report several of these coding associations in separate manuscripts. One such new observation is a missense variant (p.Arg20Gln; AF = 3%, gnomAD NFSEE = 0.7%) in SPDL1 with a pleiotropic association. It is associated with a strongly increased risk of idiopathic pulmonary fibrosis (OR = 3.1, P = 1.0 × 10−15) but protective with an end point that combines all cancers (OR = 0.82, P = 2.1 × 10−15)32. Other associations between variants and disease described in separate manuscripts include the following: an inframe deletion in MFGE8 and coronary atherosclerosis (p.Asn239dup; AF = 2.9%, gnomAD NFSEE = 0%, OR = 0.74, P = 5.4 × 10−15)33; a frameshift variant in MEPE (p.Lys101IlefsTer26; AF = 0.3%, gnomAD NFSEE = 0.07%, OR = 18.9, P = 1.5 × 10−11) and otosclerosis34; and a missense variant in ANGPTL7 (p.Arg220Cys; AF = 4.2%, gnomAD NFSEE = 0.06%, OR = 0.7, P = 7.2 × 10−16) and glaucoma35.

Coding variants associated with drug use

An notable registry available in FinnGen is a prescription medication purchase registry (KELA; Supplementary Table 1), which links all prescription medication purchases for all FinnGen participants since 1995. Using prescription records from this registry, we identified two enriched low-frequency coding variants that were associated with drug purchase of statin medications (three or more purchases per individual) (Supplementary Table 11). A missense variant in TM6SF2 (p.Leu156Pro, rs187429064) was associated with a decreased likelihood of being prescribed statins (AF = 5.2%, gnomAD NFSEE = 1.2%; OR = 0.86, P = 3.8 × 10−13) but with an increased likelihood for insulin medication for diabetes (OR = 1.17, P = 8.2 × 10−11) and type 2 diabetes (OR = 1.15, P = 2.6 × 10−8). In addition, the same variant showed a strong association with a strongly increased risk of hepatocellular carcinoma (ICD-10 C22 ‘hepatic and bile duct cancer’; OR = 3.7, P = 5.9 × 10−10). The hepatic and bile duct cancer association did not change after conditioning on statin medication (OR = 3.7, P = 7.1 × 10−10). Consistent with a decrease in the likelihood of being prescribed statins, TM6SF2 p.Leu156Pro and another independent (r2 = 0.003) missense variant (p.Gly167Lys, rs58542926) have previously been associated with decreased LDL and total cholesterol levels36. In a mouse model, both p.Gly167Lys and Leu156Pro lead to increased protein turnover and reduced cellular TM6SF2 levels37. TM6SF2 p.Gly167Lys leads to decreases in hepatic large, very LDL particle secretion and increases in intracellular lipid accumulation38. These effects probably explain its associations with non-alcoholic fatty liver disease39, alcohol-related cirrhosis40, hepatocellular carcinoma41 and incident type 2 diabetes42. Our results provide, in a single PheWAS analysis, strong evidence of a previously unknown p.Leu156Pro variant that has similar consequences of decreasing circulating lipid levels and increasing the risk of diabetes, cirrhosis and liver cancer, as observed for p.Gly167Lys. Such pleiotropy of the variant can be explored in the custom PheWeb browser (http://r5.finngen.fi/variant/19-19269704-A-G).

Source link