May 19, 2024
The intrinsic substrate specificity of the human tyrosine kinome – Nature

The intrinsic substrate specificity of the human tyrosine kinome – Nature

Plasmids

For expression and purification from bacteria, DNA sequences for the human Tyr kinases His6–PKMYT1 (full length), BMPR2–His6 (amino acids 172–504)20, His6-TESK1 (amino acids 1–345) and the C. elegans Tyr kinase His6–ABL1 (amino acids 297–584) were codon-optimized for Escherichia coli expression using the GeneSmart prediction software (Genscript). Optimized coding sequences were synthesized as gBlocks (Integrated DNA Technologies) carrying 16 bp overhangs at the 5′ and 3′ ends to facilitate in-fusion cloning (Clontech) into pET expression vectors (EMD Millipore).

Coding sequences for 12 C. elegans kinases were PCR-amplified out of a cDNA library (provided as a gift from B. Emerling and M. Hansen). PCR products for src-1 (full length), csk-1 (full length) and sid-3 (amino acids 93–498) were subcloned into the pcDNA 3.4 mammalian expression vector for expression in Expi293 cells. PCR products for daf-2 (amino acids 1234–end), let-23 (amino acids 848–end), egl-15 (amino acids 550–end), cam-1 (amino acids 493–end), ddr-2 (amino acids 407–end), ver-3 (amino acids 788–end), scd-2 (amino acids 930–end) and vab-1 (amino acids 582–end) were subcloned into the pFastBac Dual baculoviral expression vector for expression in Sf9 cells.

The coding sequence for CSF1R (amino acids 539–end) was PCR-amplified out of a pTag mammalian expression vector construct (a gift from M. E. Ross, C. Wang, V. Aguiar-Pulido and S. Kholmanskikh) and subcloned into pFastbacDual.

Coding sequences for EGFR (amino acids 668–end), IGF1R (amino acids 960–end) and FAK (full length) were PCR-amplified out of constructs obtained from Addgene (82906, 98344 and 23902, respectively), and subcloned into pcDNA 3.4. Amino acid substitutions in the kinase domains were generated using the QuikChange II Site-Directed Mutagenesis kit (Agilent).

Expression and purification from bacteria

Transformations were performed with BL21 Star cells (Thermo Fisher Scientific) unless specified otherwise. Antibiotic concentrations used were as follows: carbenicillin (100 mg l−1), kanamycin (50 mg l−1), spectinomycin (25 mg l−1) and chloramphenicol (25 mg l−1 in ethanol, prepared fresh). Transformed cells were grown in 1 l Terrific broth by shaking at 190 rpm at 37 °C until the optical density (λ = 600 nm) reached 0.7–0.8, at which point 1 mM IPTG was added to induce expression. The cells were then transferred to a refrigerated shaker and shaken at 220 rpm at 18 °C for 16–20 h. Cells were then centrifuged at 6,000g, and the pellets were snap-frozen in liquid nitrogen and stored at −80 °C.

All of the steps in the protein purification were performed at 4 °C. Cell pellets were solubilized in lysis buffer (the contents of which are described below), using a spatula to disperse, and lysed by probe sonication. The lysates were centrifuged at 20,000g for 1 h, and the supernatants were combined with affinity purification resin, nickel NTA (Qiagen) or glutathione Sepharose (GE Health) that had been rinsed in base buffer. The supernatant–bead slurries were agitated using a rotisserie for 30 min. Resin was washed with 1 l base buffer and eluted in 10 bed volumes of elution buffer. Eluted proteins were concentrated using the Ultra Centrifugal Filter Units (Amicon), supplemented with 1 mM DTT and 25% glycerol, and snap-frozen in liquid nitrogen and stored at −80 °C.

Standard lysis buffer was 50 mM Tris pH 8.0, 100 mM NaCl, 2 mM MgCl2, 2% glycerol, HALT EDTA-free phosphatase and protease inhibitor cocktail (Life technologies), 5 mM β-mercaptoethanol and 1–3 grams of lysozyme (Sigma-Aldrich). Standard base buffer was 50 mM Tris pH 8.0, 100 mM NaCl, 50 mM imidazole, 2 mM MgCl2 and 2% glycerol. Standard wash buffer was 50 mM Tris pH 8.0, 500 mM NaCl, 50 mM imidazole, 2 mM MgCl2 and 2% glycerol. Polyhistidine-tag elution buffer was 50 mM Tris pH 8.0, 100 mM NaCl, 2 mM MgCl2, 2% glycerol and 350 mM imidazole.

PDHK1, PDHK3 and PDHK4 were co-expressed with Gro-EL/Gro-ES protein chaperones61,62 and purified with the following buffers: lysis buffer (100 mM potassium phosphate pH 7.5, 10 mM l-arginine (stock pH-adjusted to 7.5), 500 mM KCl, 0.1 mM EDTA, 0.1 mM EGTA, 0.2% Triton X-100, lysozyme), wash buffer (50 mM potassium phosphate pH 7.5, 10 mM arginine, 500 mM NaCl, 0.1% Triton X-100, 2 mM MgCl2), and elution buffer (25 mM Tris pH 7.5, 120 mM KCl, 0.02% Tween-20, 50 mM arginine, 350 mM imidazole).

PKMYT1 was co-expressed with untagged HSP90–CDC37 complex63.

Protein expression in insect cells

Spodoptera frugiperda (Sf9) cells (Thermo Fisher Scientific) were cultured in Grace’s Insect Cell Culture Medium containing 10% fetal bovine serum (Thermo Fisher Scientific) and shaken at 120 rpm at 27 °C in a humidified incubator. According to protocols provided in the Bac-to-Bac Baculovirus Expression System manual (Thermo Fisher Scientific), Sf9 cells underwent infection with the recombinant baculoviruses derived from the pFastbac constructs described above. At 3 days after transfection, the cells were centrifuged at 500g for 5 min, snap-frozen in liquid nitrogen and stored at −80 °C.

Protein expression in mammalian cells

Expi293 cells (Thermo Fisher Scientific) were cultured in 500 ml Expi293 Expression Medium (Thermo Fisher Scientific) in 2 l spinner flasks on a magnetic stirring platform at 100 rcf at 36.8 °C under 8% CO2. For transfection, 500 μg of expression constructs were diluted in Opti-MEM I Reduced Serum Medium (Thermo Fisher Scientific). ExpiFectamine 293 Reagent (Thermo Fisher Scientific) was diluted with Opti-MEM separately and then combined with diluted plasmid DNA for 10 min at room temperature. The mixture was then transferred to the cells (3 × 106 cells per ml) and stirred. Then, 20 h after transfection, ExpiFectamine 293 Transfection Enhancer 1 and Enhancer 2 (Thermo Fisher Scientific) were added to the cells. Then, 2 days later, the cells were centrifuged at 300g for 5 min, snap-frozen in liquid nitrogen and stored at −80 °C (3 days after transfection).

Purification from insect and mammalian cells

All steps of protein purification were performed at 4 °C. Cell pellets were solubilized in lysis buffer, using a spatula to disperse, and lysed by Dounce homogenization (20 strokes). The lysates were centrifuged at 100,000g for 1 h and the supernatants were combined with affinity purification resin, nickel NTA (Qiagen), glutathione Sepharose (GE Health) or Anti-Flag M2 affinity gel (Sigma-Aldrich), and agitated on a rotisserie for 30 min (nickel and glutathione beads) for 1 h (anti-Flag beads). The resin was washed with 1 l base buffer and eluted in 10 bed volumes of elution buffer. For elution of Flag-tagged proteins, beads were immersed in elution buffer (0.15 μg ml−1 3× Flag peptide (Sigma-Aldrich)) and agitated on rotisserie for 1 h before elution. Th eluted proteins were concentrated using Ultra Centrifugal Filter Units (Amicon), supplemented with 1 mM DTT and 25% glycerol, and snap-frozen in liquid nitrogen and stored at −80 °C. Standard lysis buffer was 50 mM Tris pH 8.0, 150 mM NaCl, 2 mM MgCl2, 5% glycerol, 1% Triton X-100, 5 mM β-mercaptoethanol and HALT protease inhibitors. Standard base buffer was 50 mM Tris pH 8.0, 100 mM NaCl, 2 mM MgCl2 and 2% glycerol. Standard wash buffer was 50 mM Tris pH 8.0, 500 mM NaCl, 2 mM MgCl2 and 2% glycerol. Elution buffer was 50 mM Tris pH 8.0, 100 mM NaCl, 2 mM MgCl2 and 2% glycerol. Glutathione (10 mM) pH 8.0 was included for GST affinity purifications. Imidazole (250 mM) was included for polyhistidine affinity purifications. 3× Flag peptide (0.15 μg ml−1) was included for Flag affinity purifications.

Recombinant active SRMS was a gift from D. Gurbani and K. Westover64.

PSPA experiments

Each recombinant kinase was distributed across a 384-well plate, mixed with a customized Tyr peptide substrate library (Anaspec) in solution phase and 50 μM ATP (50 μCi ml−1 γ-32P-ATP, Perkin-Elmer), and incubated for 90 min. Assay conditions63 for each kinase are described in Supplementary Table 1. Each well contains a mixture of peptides with a centralized Tyr phospho-acceptor and one fixed amino acid in an otherwise randomized background mixture of all natural amino acids except Tyr and Cys. All 20 natural amino acids, plus two PTM residues (pThr and pTyr), were substituted into positions −5 to +5 to generate 220 unique peptide mixtures (22 amino acids × 10 fixed positions). All peptides were amidated at their C termini. N- and C-terminal flanking sequences of all peptides were G-A-[phosphorylation site sequence]-A-G-K-K(biotin)-NH2, where K(biotin) represents a lysine sidechain modified with an aminohexanoic acid spacer attached to biotin. After the phosphorylation reactions, peptides were spotted onto Streptavidin-conjugated membranes (Promega, V2861), where they associated through their C-terminal biotinylations. The membranes were rinsed to remove free ATP and kinase and imaged using the Typhoon FLA 7000 phosphorimager (GE). Raw data (GEL file) was quantified using ImageQuant (GE). Images of the raw data are presented in Supplementary Fig. 1. For 24 kinases, the +5 position peptides were profiled in separate experiments, and their results are shown as separate images in Supplementary Fig. 1. Dual-specificity kinases (NEK10, PINK1, BMPR2, LIMK1, LIMK2, TESK1, MYT1, MKK4, MKK6, MKK7, PDHK1, PDHK3 and PDHK4) and a subset of the canonical kinases (IRR, JAK3, MST1R (RON), TXK and VEGFR1) were profiled using a second customized Tyr peptide substrate library lacking Ser, Thr, Tyr and Cys, at randomized positions.

Together, substrate motifs were obtained from a total of 109 distinct kinases, comprising 92 human kinases, 12 Caenorhabditis elegans kinase orthologues, 1 arthropod Tribolium castaneum kinase orthologue (PINK1) and 4 phosphopriming selection mutant kinases (Extended Data Fig. 5 and 6).

Kinetic analysis

Peptide phosphorylation assays to determine the kinetic parameters of JAK1 and ZAP70 were performed at room temperature in 20 μl containing the corresponding kinase reaction buffer (Supplementary Table 1). Each reaction contained 100 ng of kinases and 500 μM, 250 μM, 50 μM or 25 μM of biotinylated substrate peptide (Anaspec). Then, 2 μl of each reaction was transferred to 18 μl quenching buffer (500 mM EDTA pH 8.0) at 0, 3, 6, 9, 12, and 15 min. A total of 1.5 μl of quenched reaction mixtures was spotted onto Streptavidin-conjugated membranes (Promega, V2861). The membranes were rinsed to remove free ATP and kinase and imaged alongside ATP standards using the Typhoon FLA 7000 phosphorimager (GE) and quantified using ImageQuant (GE). From these kinase assays, the KM and Vmax values were determined by curve fitting using the Michaelis–Menten equation (GraphPad Prism v.10.1).

Matrix processing

The raw spot-intensity matrices of the canonical kinases and the non-canonical kinases TNNI3K and WEE1 were column-normalized (at each position) by the sum of the 18 randomized amino acids (excluding Tyr and Cys) to yield PSSMs. The raw spot-intensity matrices of all other non-canonical kinases and the canonical kinases IRR, JAK3, MST1R (RON), TXK and VEGFR1 were normalized by the sum of the 16 randomized amino acids (excluding Ser, Thr, Tyr and Cys), corresponding to the uniquely customized peptide library that was used to profile these kinases. The cysteine row was scaled to fix its median as 1/18 for the 18 amino acid library or 1/16 for the 16 amino acid library, depending on the library used as described above. The Tyr values in each position were set to be identical to the phenylalanine value at that position. For kinases displaying dual specificity (PDHK1, PDHK4, BMPR2, LIMK2, MKK7 and PINK1), the serine and threonine values in each position were set to be the median of that position.

Substrate scoring

For scoring substrates, the PSSM values of the corresponding amino acids in the corresponding positions were scaled by 18 or 16, depending on the library used, to calculate the selectivity of that amino acid relative to the mean randomized amino acid, which has a value of 1. These values are rounded to the nearest 10,000th and multiplied to generate a raw score for each kinase–substrate pair20,34,35 (Supplementary Note 1). To calculate the percentile score of a substrate for a given kinase, we first computed the a priori reference score distribution of that kinase PSSM by scoring a reference Tyr phosphoproteome comprising 5,431 identified sites with localization probability above 0.75 (ref. 3), using the method discussed above (Fig. 2a). The percentile score of a kinase–substrate pair is defined as the percentile ranking of the substrate within the reference score distribution for the kinase.

For scores displayed at the Kinase Library websites, we log2-transform and sum PSSM values such that a substrate preferred over random has a positive value and a substrate selected against has a negative value.

Matrix clustering

The dendrograms in Figs. 1 and 5 were generated using the normalized matrices with all the unmodified amino acids excluding Tyr (which was fixed as identical to phenylalanine), as well as phosphothreonine and phosphotyrosine. Linkage matrices were computed using the SciPy package in Python (v.3.7.6), using the ‘ward’ method. The results were converted to the Newick tree format and plotted using FigTree (v.1.4.4).

Comprehensive analysis of substrate sequence selectivity

In Extended Data Figs. 3 and 4d,e, for each of the 78 canonical human Tyr kinases, the selectivities at each position for each of the 20 natural amino acids, relative to a mixed pool of natural amino acids, were calculated as described above. These values were log-transformed and plotted in v.4.2.3 of R65 using v.3.4.2 of the package ggplot266. As a proxy for the variability among kinases in degree of selectivity, the s.d. of log-transformed selectivity values was calculated and plotted for each amino acid at each position using the same software.

Comparison to literature PSSMs

The log2-enrichment of each amino acid at each position among phosphorylated peptides versus unphosphorylated library, using the subset of the library containing only one Tyr residue, was calculated previously7 for each of the five kinases screened against a degenerate library. The Pearson correlation coefficient t of these quantifications was calculated against the log2 selectivity for each amino acid at each position in all 78 canonical human Tyr kinases screened here. Shown in Extended Data Fig. 1 are the correlation coefficients sorted from lowest to highest with each of the five kinases screened7, with the five best-matching kinase selectivities in our study explicitly labelled in each plot.

Kinase enrichment analysis

The single phosphorylation sites (not including multiply-phosphorylated peptides) in the analysed phosphoproteomics studies were scored for each of the characterized canonical kinases (78 Tyr kinases), and their ranks in the reference phosphoproteome score distributions were determined as described above. For every non-duplicated, singly phosphorylated site, kinases that ranked within the top eight kinases for the Tyr kinases were considered to be biochemically favoured kinases for that phosphorylation site. For assessing kinase motif enrichment in phosphoproteomics datasets, we compared the percentage of phosphorylation sites for which each kinase was predicted among the upregulated/downregulated (increased/decreased, respectively) phosphorylation sites (sites with |log2[fold change]| equal to or greater than our log[fold change] threshold of 1), versus the percentage of biochemically favoured phosphorylation sites for that kinase within the set of unregulated (unchanged) sites in this study (sites with |log2[fold change]| less than our log2[fold change] threshold of 1). Contingency tables were corrected using Haldane correction (adding 0.5 to the cases with zero in one of the counts). Statistical significance was determined using a one-sided Fisher’s exact test. Kinases that were significant (P ≤ 0.05) for both upregulated and downregulated analysis were excluded from the downstream analysis. Then, for each kinase, the direction of most significant enrichment (upregulated or downregulated) was selected based on the P values and presented in the volcano plots.

Sequence logos

Sequence logos were generated using the Logomaker package in Python67. For individual kinases, the normalized matrix was used, where the height of every letter is the ratio of its value to the median value for that position. The Tyr height in the central position (position zero) was set to the maximal height in the peripheral positions. For clustered groups of kinases, the average matrix was calculated and presented as a sequence logo as described above.

Comparative analyses between amino acids in the kinase domains and their substrate specificities

For Extended Data Fig. 6, kinases were sorted by the +1 pTyr signal in their PSSM. For the sequence logo, kinase domains of the 78 canonical Tyr kinases were obtained from previously aligned kinase sequences68. The alignments to residue Ala920 in EGFR (Protein Data Bank (PDB): 5CZH) were obtained for each kinase, and the frequencies of amino acids were calculated and plotted.

Known kinase–substrate pairs

Experimentally validated kinase–substrate relationships were obtained from PhosphoSitePlus (April 2022)2. The number of reports for each pair was determined by the sum of the in vivo and in vitro reports.

Performance analysis

Experimentally validated kinase–substrate relationships were obtained from PhosphoSitePlus2. We selected Tyr sites on human proteins and filtered out sites with an additional phosphorylated residue within 5 amino acids or sites with reported upstream kinase not characterized in this study. The number of reports for each pair was determined by the sum of the in vivo and in vitro reports.

SH2-binding specificity matrix processing

The raw binding matrices of 76 SH2 domains were obtained from previously published work53. Values of zero were replaced with the minimal value at that position. Matrices were then position-normalized by the sum of the 19 randomized amino acids (excluding cysteine), to yield PSSMs34. The cysteine specificity was then added and set to 1/19 to represent neutral specificity as it was not included in the original data. The PSSM for PIK3R2_C was also used to represent PIK3R3_C.

SH2 enrichment for different kinase motifs

First, we scored the Tyr phosphoproteome3 with each kinase motif and, for each, divided the data into favoured sites (top 20%), neutral sites (middle 60%) and disfavoured sites (bottom 20%). SH2 enrichment was then calculated similarly to the kinase enrichment process described above. SH2-binding PSSMs53 (Supplementary Table 5) that ranked within the top eight SH2s were considered to be biochemically favoured SH2s for binding that phosphorylation site. For assessing SH2 motif enrichment in the Tyr phosphoproteome distribution for a given kinase, we compared the percentage of phosphorylation sites for which each SH2 PSSM was predicted among the favoured/disfavoured phosphorylation sites (top 20% and bottom 20%, respectively) versus the percentage of biochemically favoured phosphorylation sites for that SH2 within the set of neutral phosphorylation sites in this study (middle 60%). Contingency tables were corrected using Haldane correction (adding 0.5 to the cases with zero in one of the counts). Statistical significance was determined using one-sided Fisher’s exact test, and the corresponding P values were adjusted using the Benjamini–Hochberg procedure. Finally, for every SH2 domain, the most significant direction of enrichment (favoured or disfavoured) was selected based on the adjusted P value and presented in the volcano plots.

Illustrations

Experimental schema and illustrative models were generated using BioRender (https://biorender.com/). Kinome tree images were generated and modified using Coral (http://phanstiel-lab.med.unc.edu/CORAL/)69. Structural illustrations were generated with ChimeraX70 or PYMOL71. Generic kinase domains in Figs. 1 and 4 and Extended Data Fig. 7: INSR (PDB: 1IRK)72. Kinase and substrate structures in Fig. 2: INSR (structural chimera of PDB 1IRK (ref. 72) and AlphaFold AF-P06213-F1 (https://alphafold.ebi.ac.uk/entry/P06213) (ref. 73)), IRS1 (AlphaFold: AF-P35568-F1) (https://alphafold.ebi.ac.uk/entry/P35568)73, JAK1 (PDB: 7T6F)74, STAT1 (PDB: 1BF5)75 and CSK–SRC complex (PDB: 3D7T)49. RTK in Fig. 3: EGFR transmembrane domain (PDB: 2M20)76 and ECD (PDB: 3NJP)77. Kinase–drug complex in Fig. 3: ABL–imatinib (PDB: 1IEP)78. Generic SH2 domain structures in Fig. 4: SRC (PDB: 1SHB)79. Kinase domain of DDR2 in Extended Data Fig. 2 (AlphaFold: AF-Q16832-K1A, based on https://alphafold.ebi.ac.uk/entry/Q16832)80.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link