May 6, 2024
De novo design of modular peptide-binding proteins by superhelical matching – Nature

De novo design of modular peptide-binding proteins by superhelical matching – Nature

Generation of DHR scaffolds

Each designed helical repeat (DHR) scaffold is formed by a helix-loop-helix-loop topology that is repeated four or more times18,35,36. The helices range from 18 to 30 residues and the loops from 3 to 4 residues. The DHR design process goes through backbone design, sequence design and computational validation by energy landscape exploration. To match the peptides, the designs were required to have a twist (omega) between 0.6 and 1.0 radians, a radius of 0 to 13 Å and a rise between 0 and 10 Å. The geometry of a repeat protein can be described by the radius of the super-helix, the axial displacement and the twist37,38.

The backbone is designed using Rosetta fragment assembly guided by motifs21. Backbone coordinates are built up through 3,200 Monte Carlo fragment assembly steps with fragments taken from a non-redundant set of structures from the Protein Data Bank (PDB). After the insertion of each fragment, the rigid-body transform is propagated to the downstream repeats. The score that guides fragment assembly is composed of Van der Waal interactions, packing, backbone dihedral angles and residue-pair-transform (RPX) motifs21. RPX motifs are a fast way to measure the full-atom hydrophobic packability of the backbone before assigning side chains. After design, backbones are screened for native-like features. The loops are required to be within 0.4 Å of a naturally occurring loop or rebuilt. Structures with helices above 0.14 Å appear bent and kinked and are discarded. And poorly packed structures in which fewer than four helices are in contact with each other are filtered.

The sequence is designed using Rosetta for each backbone that passes filtering. Design begins in a symmetrical mode in which each repeat is identical using the RepeatProteinRelax mover. Core residues are restricted to be hydrophobic and surface residues hydrophilic using the layer design task operators. Sequence is biased toward natural proteins with a similar local structure using the structure profile mover. After the symmetrical design is complete, the N-terminal and C-terminal repeats are redesigned to eliminate exposed hydrophobics. Designs with poor core packing as measured by Rosetta Holes < 0.5 are then filtered39.

The designs are computationally validated using the Rosetta ab initio structure prediction on Rosetta@home40. Rosetta ab initio verifies that the design is a lower-energy state than the thousands of alternative conformations sampled. Simulating a protein using Rosetta@home can take several days on hundreds of CPUs. To speed this up, we used machine learning to filter designs that were most likely to fail37,38.

Backbone generation of curved repeat-protein monomers in polyproline II conformation

A second round of designs was made to ensure that the distance between helices matches the 10.9 Å. distance between prolines in the polyproline II conformation. To design these backbones, we used atom-pair constraints between the first helix of each repeat. The atom-pair constraints were set to 10.9 Å with a tolerance of 0.5 Å. For these designs, we found the topologies that most efficiently produced structures that matched the atom-pair constraints had a helix length of 20 or 21 residues and a loop range of three residues.

Design of peptide binders

Modular peptide docking and hashing

To construct hash tables storing the pre-computed privileged residue interactions, we first surveyed the non-redundant PDB database and extracted the intended interacting residues as seeds. For each seeding interaction residue pair, random perturbations were applied to search for alternative relative conformations of the interacting residues. In the case of the side-chain–backbone bidentate interactions, random rigid-body perturbations were applied to the backbone residues, with a random set of Euler angles drawn from a normal distribution with 0° as the mean and 60° as the standard deviation, as well as a random set of translation distances in three-dimensional (3D) space drawn from a normal distribution with 0 Å as the mean and 1 Å as the standard deviation. At the same time, the backbone torsion angles Φ and Ψ of the backbone residue were randomly modified to values drawn from a Ramachandran density plot based on structures from the PDB database. The transformed set of residues losing the intended interactions were discarded. The transformed residues keeping the interactions will be collected. Then, the side chains of the side-chain residues were replaced with all reasonable rotamers, to further diversify the samples of the sets of interacting residues. Finally, the geometry relationship of each set of residues keeping the intended interactions was subjected to an 8D hash function (6D rigid-body transformation plus two torsion angles), and represented with a 64-bit unsigned integer as the key of an entry in the hash table. The identity and the side-chain torsion angles (Χs) of the side-chain residues were treated as the value of the entry in the hash table. Similar processes were used to build different hash tables for various interactions, with minor alterations. For example, for pi–pi and cation–pi interactions, only a 6D hash function was used, because there is no need for the perturbation and consideration of the backbone torsions. For Asn, Gln, Asp or Glu interacting with two residues on the backbone, a 10D hash table was applied for representing the geometry relationship, and, in these cases, the geometries of the N–H and C=O groups on the backbone were treated as 5D rays.

To sample repeat peptides that match the superhelical parameters of the DHRs, we randomly generate a set of backbone torsion angles φ and ψ, for example, [φ1, ψ1, φ2, ψ2, φ3, ψ3] for repeats of tripeptide. If any pair of φ and ψ angles gets a high Rosetta Ramachandran score above the threshold of −0.5, it means that this pair of torsion angles is likely to introduce intra-peptide steric clashes, and in these cases we randomly regenerate a new pair of φ and ψ angles until they are reasonable according to the Rosetta Ramachandran score. Next, we set the backbone torsion angles of the repeat peptide using this set of φ and ψ angles repetitively across the eight repeats. And we calculate the superhelical parameters using the 3D coordinates of adjacent repeat units of the repeat peptide. The repeat peptides matching the superhelical parameters of any one of the curated DHRs are saved for the docking step.

To dock cognate repeat proteins and repeat peptides, with matching superhelical parameters, they are first aligned to the z axis by their own superhelical axes. In the next step, a 2D grid search (rotation around and translation along the z axis) is carried out to sample compatible positions of the repeat peptide in the binding groove of the repeat protein. Once a reasonable dock is generated without steric clash, the relevant hash function is used to iterate through all potential peptide–protein interacting residue sets, to calculate the hash keys. If a hash key exists in the hash table, the interacting side-chain identities and torsion angles will be pulled out immediately and installed on all equivalent positions of this repeat peptide–repeat protein docking conformation. The docked peptide–DHR pair is saved for the interface design step if the peptide–DHR hydrogen-bond interactions are satisfied.

Design of the peptide-binding interface

If a single dock was accepted with the designed repetitive peptide–DHR hydrogen bond, the peptide was first trimmed to the exact same repeat number as the DHR (for example, four-repeat or six-repeat). After that, for both peptide and DHR sides, each amino acid was set linked to its corresponding amino acids on the same position in each repeat unit. This was to make sure that all of the following design steps would be carried out with the exact same symmetry inside both the DHR and the peptide.

During our design cycles, the interface neighbour distance is set as 9 Å as the whole designable range around the DHR–peptide binding interface, and 11 Å as the whole minimization range. Three rounds of full hydrophobic FastDesign21 followed by hydropathic FastDesign were carried out, with each hydrophobic or hydrophilic FastDesign repeating twice. The Rosetta score function beta_nov16 was chosen in all design cycles. In the produced complex, the peptide itself with an averaged score (three calculations were carried out) larger than 20.0 or a complex score larger than −10.0 were rejected directly.

After the preliminary design was done, we performed two types of sanity check to further optimize the designed peptide sequence, as well as the designed DHR interface. Specifically, for the peptide side, in the tripeptide repeat units, every two amino acids other than proline were scanned for a possible mutation to all twenty amino acids except cysteine, unless a certain originally designed peptide amino acid is making the hashed side-chain–backbone hydrogen bond, or side-chain–side-chain hydrogen bond, or side-chain–side-chain–backbone hydrogen bond with the DHR interface. The DDG (binding energy for the peptide–DHR complex) was compared before and after this peptide side mutation; and the mutation was accepted if the delta DDG (DDG_after – DDG_before) was larger than 1.0. Similarly, we also checked the designed DHR interface by mutation. The whole DHR was scanned. For the designed hydrophobic amino acids that were originally hydrophilic, a delta DDG of −5.0 was set as the threshold to be accepted as a necessary design that made enough binding contribution. For the designed hydropathic amino acids, a delta DDG of −2.0 was used as the threshold.

For experimental characterization, we selected designed complexes with near-ideal bidentate hydrogen bonds between protein and peptide, favourable protein–peptide interaction energies (DDG ≤ −35.0), interface shape complementarity (Iface_SCval ≥ 0.65), tolerable interface unsatisfied hydrogen bonds (Iface_HbondsUnsatBB ≤ 2, Iface_HbondsUnsatSC ≤ 4) and low peptide apo energies (ScoreRes_chainB ≤ 0.9).

Forward docking

As for the selected designed complexes from our round-two experiments, forward docking was performed to ensure the specificity in silico. For each designed complex, 10,000 arbitrary peptide conformations were generated as above, using the designed sequence. The same docking protocol was conducted as described in the docking stage, against the untouched designed DHR. FastRelax41 was then performed for the 10,000 docks, and the DDG versus peptide-backbone RMSD was plotted to check the convergence of the complex. Only the ‘converged’ complexes were selected for experimental characterization; for example, (i) peptide backbone RMSD < 2.0Å among the top 20 designs with the lowest DDG during forward docking; and (ii) the averaged peptide backbone of the top 20 designs was close to the original design model (RMSD < 1.5 Å).

Preparation of SSM libraries

We performed SSM studies for some of the designed peptide–protein binding pairs to gain a better understanding of the peptide-binding modes, and to search for improved peptide binders. For each designed repeat protein, we ordered a SSM library covering the central span of 65 amino acids within the whole repeat protein, owing to the chip DNA size limitation. This span roughly equals one and a half repeating units, across three helices. The chip synthesized DNA oligos for the SSM library were then amplified and transformed to EBY100 yeast together with a linearized pETCON3 vector including the encoding regions of the rest of the designed repeat protein. Each SSM library was subjected to an expression sort first, in which the low-quality sequences due to chip synthesizing defects or recombination errors were filtered out. The collected yeast population, which successfully expresses the designed repeat-protein mutants, will be regrown, and subjected to the next round of peptide-binding sorts. The next-generation sequencing results of this yeast population will also serve as the reference data for SSM analysis. The next round of without-avidity peptide-binding sorts used various concentrations of the target peptide, depending on the initial peptide-binding abilities, ranging from 1 nM to 1,000 nM. The peptide-bound yeast populations were collected and sequenced using the Illumina NextSeq kit. The mutants were identified and compared to the mutants in the expression libraries. Enrichment analysis was used to identify beneficial mutants and provide information for interpreting the peptide-binding modes. For each mutant, its enrichment value is calculated by dividing its ratio in the peptide-bound population by its ratio in expression population. The enrichment value is then subjected to a log10 transformation, and plotted in heat maps for the SSM analysis.

Design of binders against endogenous targets

To evaluate which endogenous proteins could at present be targeted with our method (Fig. 6), we developed Python code to search databases for sub-sequences that match permutations of the set of amino acid triplets for which we designed binders in this study (that is, LRP PEW PLP IYP PKW IRP LRT LRN LRQ RRN PSR PRQ). This code can be accessed freely (https://github.com/tjs23/prot_pep_scan). We then ranked all outputs to find the longest sub-sequence possible, and manually inspected the candidates to find sub-sequences landing in disordered regions. Doing this analysis on the human proteome suggested that ZFC3H1 could be a good target for two main reasons: (1) this protein possesses the sequence (PLP)x4 within a large disordered domain, with downstream sequence (PEDPEQPPKPPF) within the reach of our binder design method; and (2) this protein is well studied, and—in particular—commercial, highly specific and validated antibodies exist against it.

Synthetic gene constructs

All genes in this work were ordered from either Integrated DNA Technologies (IDT) or GenScript. For both the first- and the second-round designs, a His tag containing a TEV protease cleavage site and short linkers were added to the N terminus of protein sequences. For the protein lacking a tryptophan residue, a single tryptophan was added to the short N-terminal linker following the TEV protease cleavage site to help with the quantification of protein concentration by A280. The protein sequence along with the linker (MGSSHHHHHHHHSSGGSGGLNDIFEAQKIEWHEGGSGGSENLYFQSG or LEHHHHHH) was reverse-translated into DNA using a custom Python script that attempts to maximize the host-specific codon adaptation index42 and IDT synthesizability, which includes optimizing whole-gene and local GC content as well as removing repetitive sequences. Finally, a TAATCA stop codon was appended to the end of each gene. Genes were delivered cloned into pET-29b+ between NdeI and XhoI restriction sites. For the second-round designs, the designed amino acid sequences were inserted directly into pET-29b+ between Ndel and Xhol restriction sites.

For the disordered region of ZFC3H1, the 103 amino acids containing the key targeting sequence (LPPPPQVSSLPPLSQPYVEGLCVSLEPLPPLPPLPPLPPEDPEQPPKPPFADEEEEEEMLLREELLKSLANKRAFKPEETSSNSDPPSPPVLNNSHPVPRSNL) was cloned into a customized vector with sfGFP at the N terminus and His6 at the C terminus with a linker (GGSGSG) in between.

Protein expression and purification

Proteins were transformed into Lemo21(DE3) E. coli from New England Biolabs (NEB) and then expressed as 50-ml cultures in 250-ml flasks using Studiers M2 autoinduction medium with 50 μg ml−1 kanamycin. The cultures were either grown at 37 °C for around 6–8 h and then around 18 °C overnight (around 14 h), or at 37 °C for the entire time (around 14 h). Cells were pelleted at 4,000g for 10 min, after which the supernatant was discarded. Pellets were resuspended in 30 ml lysis buffer (25 mM Tris-HCl pH 8, 150 mM NaCl, 30 mM imidazole, 1 mM PMSF, 0.75% CHAPS, 1 mM DNase and 10 mM lysozyme, with Thermo Fisher Scientific Pierce protease inhibitor tablet). Cell suspensions were lysed by microfluidizer or sonication, and the lysate was clarified at 20,000g for around 30 min. The His-tagged proteins were bound to Ni-NTA resin (Qiagen) during gravity flow and washed with a wash buffer (25 mM Tris-HCl pH 8, 150 mM NaCl and 30 mM imidazole). Protein was eluted with an elution buffer (25 mM Tris-HCl pH 8, 150 mM NaCl and 300 mM imidazole). For the first-round designs, the His tag was removed by TEV cleavage, followed by IMAC purification to remove TEV protease. The flowthrough was collected and concentrated before further purification by SEC or fast-performance liquid chromatography on a Superdex 200 increase 10/300 GL column in Tris-buffered saline (TBS; 25 mM Tris pH 8.0 and 150 mM NaCl).

Circular dichroism

Circular dichroism spectra were measured with an AVIV Model 420 DC or Jasco J-1500 circular dichroism spectrometer. Samples were 0.25 mg ml−1 in TBS (25 mM Tris pH 8.0 and 150 mM NaCl), and a 1-mm path-length cuvette was used. The circular dichroism signal was converted to mean residue ellipticity by dividing the raw spectra by N × C × L × 10, in which N is the number of residues, C is the concentration of protein and L is the path length (0.1 cm).

SEC with multi-angle light scattering

Purified samples after the initial SEC run were pooled then concentrated or diluted as needed to a final concentration of 2 mg ml−1 and 100 μl of each sample was then run through a high-performance liquid chromatography system (Agilent) using a Superdex 200 10/300 GL column. These fractionation runs were coupled to a multi-angle light scattering detector (Wyatt) to determine the absolute molecular weights for each designed protein as described previously21.

SAXS

SAXS was collected at the SIBYLS High Throughput SAXS Advanced Light Source in Berkeley, California43,44. Beam exposures of 0.3 s for 10.2 s resulted in 33 frames per sample. Data were collected at low (around 1.5 mg ml−1) and high (around 2–3 mg ml−1) protein concentrations in SAXS buffer (25 mM Tris pH 8.0, 150 mM NaCl and 2% glycerol). The SIBYLS website (SAXS FrameSlice) was used to analyse the data for high- and low-centration samples and average the best dataset. If there was obvious aggregation over the 33 frames, only the data points before aggregation arose were used in the Gunier region; otherwise, all data were included for the Gunier region. All data were used for the Porod and Wide regions. The averaged file was used with scatter.jar to remove data points with outlier residuals in the Gunier region. Finally, the data were truncated at 0.25 q. This dataset was then compared to the predicted SAXS profile based on the design model using the FoxS SAXS server (FoXS Server: Fast X-Ray Scattering n.d.), and the volatility ratio (Vr) was calculated to quantify how well the predicted data matched the experimental data. Proteins with a Vr of less than 2.5 were considered to be folded to the designed quaternary shape.

Bio-layer interferometry

Bio-layer interferometry binding data were collected in an Octet RED96 (ForteBio) and processed using the instrument’s integrated software. To measure the affinity of peptide binders, N-terminally biotinylated (biotin-Ahx) target peptides with a short linker (GGS) were loaded onto streptavidin-coated biosensors (SA ForteBio) at 50–100 nM in binding buffer (10 mM HEPES (pH 7.4), 150 mM NaCl, 3 mM EDTA, 0.05% surfactant P20 and 0.5% non-fat dry milk) for 120 s. Analyte proteins were diluted from concentrated stocks into the binding buffer. After baseline measurement in the binding buffer alone, the binding kinetics were monitored by dipping the biosensors in wells containing the target protein at the indicated concentration (association step) and then dipping the sensors back into baseline buffer (dissociation).

Yeast surface display

Saccharomyces cerevisiae EBY100 strain cultures were grown in C-Trp-Ura medium and induced in SGCAA medium following the protocol in ref. 45. Cells were washed with PBSF (phosphate-buffered saline (PBS) with 1% BSA) and labelled with biotinylated designed proteins using two labelling methods: with-avidity and without-avidity labelling. For the with-avidity method, the cells were incubated with biotinylated RBD, together with anti-Myc fluorescein isothiocyanate (FITC, Miltenyi Biotec) and streptavidin–phycoerythrin (SAPE, Thermo Fisher Scientific). The SAPE in the with-avidity method was used at one-quarter of the concentration of the biotinylated RBD. The with-avidity method was used in the first few rounds of screening against the repeat-peptide library to fish out weak binder candidates. For the without-avidity method, the cells were first incubated with biotinylated designed proteins, washed and then secondarily labelled with SAPE and FITC.

Crystallization and structure determination

RPB_PEW3_R4–PAWx4

Purified RPB_PEW3_R4 protein + PAWx4 peptide at a concentration of 36 mg ml−1 was used to conduct sitting-drop, vapour-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_PEW3_R4–PAWx4 grew from drops consisting of 100 nl protein plus 100 nl of a reservoir solution consisting of 0.1 M MES pH 5.0 and 30% (w/v) PEG 6000 at 4 °C, and were cryoprotected by supplementing the reservoir solution with 5% ethylene glycol. Native diffraction data were collected at APS beamline 23-ID-D, indexed to P212121 and reduced using XDS46 (Supplementary Table 1). The structure was phased by molecular replacement using Phaser46. A set of around 50 of the lowest-energy predicted models from Rosetta were used as search models. Several of these models gave clear solutions, which were adjusted in Coot47 and refined using PHENIX48. Model refinement in P212121 initially resulted in unacceptably high values for Rfree – Rwork. Refinement was therefore first performed in lower-symmetry space groups (P1 and P21). In the late stages of refinement, these P1 and P21 models were refined against the P212121, which ultimately yielded acceptable, albeit somewhat higher, R-factors.

RPB_PLP3_R6–PLPx6

Purified RPB_PLP3_R6 protein + PLPx4 peptide at a concentration of 70 mg ml−1 was used to conduct sitting-drop, vapour-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_PLP3_R6-PLPx6 grew from drops consisting of 100 nl protein plus 100 nl of a reservoir solution consisting of 2.4 M (NH4)2SO4 and 0.1 M sodium citrate pH 4 at 18 °C, and were cryoprotected by supplementing the reservoir solution with 2.2 M sodium malonate pH 4. Native diffraction data were collected at APS beamline 23-ID-D, indexed to I422 and reduced using XDS49 (Supplementary Table 1). The structure was phased by molecular replacement using Phaser46. A set of around 28 of the lowest-energy predicted models from Rosetta were used as search models. Several of these models gave clear solutions, which were adjusted in Coot47 and refined using PHENIX48.

RPB_LRP2_R4–LRPx4

Purified RPB_LRP2_R4 protein + LRPx4 peptide at a concentration of 21.4 mg ml−1 was used to conduct sitting-drop, vapour-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_LRP2_R4–LRPx4 grew from drops consisting of 100 nl protein plus 100 nl of a reservoir solution consisting of 0.1 M HEPES pH 7 and 10% (w/v) PEG 6000 at 18 °C, and were cryoprotected by supplementing the reservoir solution with 25% ethylene glycol. Native diffraction data were collected at APS beamline 23-ID-B, indexed to P32 2 1 and reduced using XDS49 (Supplementary Table 1). The structure was phased by molecular replacement using Phaser46. The coordinates of apo-RPB_LRP2_R4 from the proteolysed or filament structure were used as a search model. The resulting model was adjusted in Coot47 and refined using PHENIX48. Like the apo structure, this crystal structure of RPB_LRP2_R4 also contained infinitely long filaments in the crystal, this time with peptide bound.

RPB_PLP1_R6–PLPx6

Purified RPB_PLP1_R6 protein + PLPx6 peptide at a concentration of 143 mg ml−1 was used to conduct sitting-drop, vapour-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_PLP1_R6–PLPx6 grew from drops consisting of 100 nl protein plus 100 nl of a reservoir solution consisting of 0.2 M NaCl and 20% (w/v) PEG 3350 at 4 °C, and were cryoprotected by supplementing the reservoir solution with 15% ethylene glycol. Native diffraction data were collected at APS beamline 23-ID-B, indexed to H32 and reduced using XDS49 (Supplementary Table 1). The structure was phased by molecular replacement using Phaser46. A set of around 230 of the lowest-energy predicted models from Rosetta were used as search models. Several of these models gave clear solutions, which were adjusted in Coot47 and refined using PHENIX48. In the later stages of refinement, two copies of the 6xPLP peptide were built into clearly defined electron density in the asymmetrical unit. The first copy adopts the expected location based on the design, and makes the designed interactions with RPB_PLP1_R6. The density for this peptide and the final atomic model (19 amino acid residues) are slightly longer than the peptide used in crystallization (18 residues); this is probably due to ‘slippage‘ or misregistration of the peptide relative to the R6PO11 in many unit cells, resulting in density longer than the peptide itself. A second copy of the peptide lies across a twofold symmetry axis at around 50% occupancy, resulting in the superposition of this peptide with a symmetry-derived copy of itself running in the opposite direction. Despite this, the locations of each Pro or Leu side-chain unit were reasonably well defined. However, it seems unlikely that the binding of the peptide at this second site would occur readily in solution.

RPB_PLP1_R6, alternative conformation 1

Purified RPB_PLP1_R6 protein + PLPx6 peptide at a concentration of 166 mg ml−1 was used to conduct sitting-drop, vapour-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_PLP1_R6-PLPx6 grew from drops consisting of 100 nl protein plus 100 nl of a reservoir solution consisting of 0.02 M CaCl2, 30% (v/v) MPD and 0.1 M sodium acetate pH 4.6 at 18 °C, and were cryoprotected by supplementing the reservoir solution with 5% MPD. Native diffraction data were collected at APS beamline 23-ID-B, indexed to P22121 and reduced using XDS49 (Supplementary Table 1). The structure was phased by molecular replacement using Phaser46, using the coordinates for R6PO11 (alternative conformation 1) as a search model. The model was adjusted in Coot47 and refined using PHENIX48. In the later stages of refinement, one copy of the 6xPLP peptide was model at a site of crystal contact, where it is sandwiched between adjacent subunits in a way that is likely to only be bound in the crystal lattice.

RPB_PLP1_R6, alternative conformation 2

Purified RPB_PLP1_R6 protein + PLPx6 peptide at a concentration of 166 mg ml−1 was used to conduct sitting-drop, vapour-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_PLP1_R6-PLPx6 grew from drops consisting of 100 nl protein plus 100 nl of a reservoir solution consisting of 40% (v/v) MPD and 0.1 M sodium phosphate-citrate pH 4.2 at 18 °C, and were cryoprotected by supplementing the reservoir solution. Native diffraction data were collected at APS beamline 23-ID-B, indexed to P22121 and reduced using XDS49 (Supplementary Table 1). Initial attempts to phase by molecular replacement using Phaser46 and around 500 predicted models from Rosetta and RoseTTAfold failed to yield any clear solutions. Similarly, several thousand truncations of these models (containing all combinations of 1, 2, 3, 4 or 5 of the 6 repeat units) also failed to give clear solutions. To try to identify correct but low-scoring solutions in the output of these trials, we ran SHELXE autobuilding and density modification on a large number of these potential solutions. Ultimately, we were able to identify an MR solution with two out of six repeats correctly placed that allowed the autobuilding of a polyalanine model and an interpretable map, which could be further improved by iterative rounds of rebuilding in Coot47 and refinement using PHENIX48. Ultimately, the final model revealed that in this crystal form and a similar crystallization condition (RPB_PLP1_R6, alternative conformation 1, above), RPB_PLP1_R6 adopted an alternative fold.

RPB_LRP2_R4

Purified RPB_LRP2_R4–LRPx4 protein at a concentration of 33 mg ml−1 was used to conduct sitting-drop, vapour-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of RPB_LRP2_R4 grew from drops consisting of 100 nl protein plus 100 nl of a reservoir solution consisting of 0.2 M K2HPO4 and 20% (w/v) PEG 3350 at 18 °C, and were cryoprotected by supplementing the reservoir solution with 15% ethylene glycol. Native diffraction data were collected at APS beamline 23-ID-B, indexed to P32 2 1 and reduced using XDS49 (Supplementary Table 1). The structure was phased by molecular replacement using Phaser46. A set of around 50 of the lowest-energy predicted models from Rosetta, as well as a variety of truncated models, were used as search models. Several of these models gave clear solutions, which were adjusted in Coot47 and refined using PHENIX48. Four helical-repeat modules were present in the asymmetrical unit. However, unexpectedly, side-chain densities for all four repeats were very similar to one another and matched the sequence of the internal helical repeats, but not the N- and C-terminal capping repeats, which are slightly different from the internal ones. In addition, these four repeat units pack tightly against adjacent, symmetry-related molecules such that they form an ‘infinitely long’ repeat protein running throughout the crystal. Careful examination of the the junction between each repeat unit revealed no clear breaks in electron density; the density for the backbone is continuous through the asymmetrical unit, and continuous with the symmetry-related molecules near the N terminus and C terminus of the molecule in the asymmetrical unit. Rather than truly forming an infinitely long polymer, we suspect that proteolytic cleavage of the RPB_LRP2_R4 (either during purification or crystallization) led to the removal of the N- and/or C-terminal caps in many molecules, which could allow the internal repeats from separate molecules to polymerize to form fibres in the crystal. Heterogeneity in these cleavage products and how they assemble into the crystal lattice (misregistration) could consequently explain the ‘continuous’ filaments of this repeat protein that we observe in these crystals.

Cell studies

Plasmids

For expression in cells, constructs were synthesized by Genescript and cloned into a modified pUC57 plasmid (GenScript) allowing mammalian expression under a EF1a promoter. Target peptides were cloned as C-terminal fusions with a linker (GAGAGAGRP) followed by EGFP. Binders were expressed as fusions with an N-terminal Mito-Tag—the first 34 residues of the Mas70p protein, shown to efficiently relocalize proteins to mitochondria in mammalian cells50 —and a C-terminal mScarlet tag51. Plasmids encoding the GFP-tagged peptide and the mScarlet-tagged binder were then cotransfected into cells.

Alternatively, for an in vivo demonstration of the multiplexed binding between different peptides and their cognate binders (Fig. 3f,g), bicistronic plasmids were generated expressing the binder flanked with a Mito-Tag followed by a stop codon, then an internal ribosome entry site (IRES) sequence and the target peptide tagged with EGFP. Alternatively, the binder was flanked with a PEX tag—the first 66 residues of human PEX3, targeting to peroxisomes52—and the target peptide was tagged with mScarlet. Cells were then cotransfected with both bicistronic plasmids to express all four proteins.

Cells

U2OS FlipIn Trex cells (a gift from S. C. Blacklow) and HeLa FlpIn Trex cells (a gift from S. Bullock), were cultured in DMEM (Corning) supplemented with 10% fetal bovine serum (Gibco) and 1% penicillin–streptomycin (Gibco) at 37 °C with 5% CO2. Cells were transfected with Lipofectamine 3000 (Invitrogen) according to the manufacturer’s instructions, and imaged after one day of expression. Cell lines were not authenticated. Cells were routinely screened for mycoplasma by DAPI staining.

Live-cell imaging

For live-cell imaging (Fig. 3), U2OS FlipIn Trex cells were plated on glass-bottom dishes (World Precision Instruments, FD35) coated with fibronectin (Sigma, F1141, 50 μg ml−1 in PBS), for 1 h at 37 °C in DMEM-10% serum. Medium was then changed to Leibovitz’s L-15 medium (Gibco) supplemented with 20 mM HEPES (Gibco) for live-cell imaging. Imaging was performed using a custom spinning disk confocal instrument composed of a Nikon Ti stand equipped with a perfect focus system, a fast Z piezo stage (ASI) and a PLAN Apo Lambda 1.45 NA 100× objective, and a spinning disk head (Yokogawa CSUX1). Images were recorded with a Photometrics Prime 95B back-illuminated sCMOS camera run in pseudo global shutter mode and synchronized with the spinning disk wheel. Excitation was provided by 488 and 561 lasers (Coherent OBIS mounted in a Cairn laser launch) and imaged using dedicated single-bandpass filters for each channel mounted on a Cairn Optospin wheel (Chroma 525/50 for GFP and Chroma 595/50 for mScarlet). To enable fast 4D acquisitions, an FPGA module (National Instrument sbRIO-9637 running custom code) was used for hardware-based synchronization of the instrument, in particular to ensure that the piezo z stage moved only during the readout period of the sCMOS camera. The temperature was kept at 37 °C using a temperature control chamber (MicroscopeHeaters.Com). The system was operated by Metamorph.

Immunofluorescence

For immunofluorescence of mitochondria (Extended Data Fig. 2b), U2OS FlpIn Trex cells (a gift from S. C. Blacklow) were spread on glass-bottom dishes coated with fibronectin as above. Cells were washed with PBS then fixed in 4% PFA for 20 min at room temperature. After fixation, cells were washed with PBS and then permeabilized with 0.1% Triton X-100 in PBS for 5 min at room temperature. Cells were washed again with PBS and blocked in 1% BSA in PBS for 15 min. Cells were then incubated with TOM20 antibody (Santa Cruz, sc-17764, used at 1:200 dilution), diluted in 1% BSA in PBS, for 1 h at room temperature. Cells were washed three times with PBS and then incubated with DAPI (Roche, 10236276001) and anti-mouse Alexa Fluor 488, diluted at 1:400 in 1% BSA in PBS, for 1 h at room temperature. Cells were washed a final three times in PBS and then imaged using the spinning disk confocal described above.

Pull-down of endogenous proteins from extracts using designed binders

For the pull-down of endogenous ZFC3H1 from human cell extracts, HeLa FlpIn Trex cells were lysed in lysis buffer (25 mM HEPES, 150 mM NaCl, 0.5% Tx100, 0.5% NP-40 and 20 mM imidazole, pH 7.4, supplemented with Roche EDTA-free protease inhibitor tablets). The lysate was incubated on ice for 10 min to continue lysis and then spun at 4,000g for 15 min at 4 °C. The supernatant was incubated with pre-washed Ni-NTA agarose (Qiagen, 30210 318/AV/01) for 1 h with rocking at 4 °C to remove or reduce proteins in the lysate that bind to the resin non-specifically. For each condition, 50 µl of fresh Ni-NTA agarose resin was washed twice in lysis buffer. Equimolar amounts of purified His-tagged binder, or as a control an equal volume of buffer, was added to the Ni-NTA agarose. The pre-cleared HeLa lysate was split evenly between the three conditions. An input was taken of each condition, and the tubes were incubated for 2 h at 4 °C with rocking. Beads were then washed twice in lysis buffer and twice in wash buffer (25 mM HEPES, 150 mM NaCl and 20 mM imidazole pH 7.4). Proteins were then eluted from the beads in elution buffer (25 mM HEPES, 150 mM NaCl and 500 mM imidazole, pH 7.4). Inputs and elutions were run on a NuPage 3-8% Tris-Acetate gel (Invitrogen, EA0375) and transferred to a nitrocellulose membrane using the iBlot system (Thermo Fisher Scientific). Membranes were blocked in 5% (w/v) milk in TBS-TWEEN (10 mM Tris-HCl, 120 mM NaCl and 1% (w/v) TWEEN20, pH 7.4) for 30 min at room temperature with gentle shaking. Rabbit anti-ZFC3H1 (Sigma, HPA007151, used at 1:250) and mouse anti-α-tubulin 488 (Clone DMA1, Sigma T6199, directly labelled with Abberior STAR 488, NHS ester leading to a 4.5 dye/antibody degree of labelling, and used at 0.1 µg ml−1 final concentration) were diluted in 1% (w/v) milk in TBS-TWEEN and incubated with the membrane overnight at 4 °C with gentle shaking. The membrane was washed three times in TBS-TWEEN then incubated with goat anti-rabbit Alexa 555 (Invitrogen, A32732, 1:2,000) for 1 h at room temperature with gentle shaking. The membrane was washed twice with TBS-TWEEN, followed by a final wash with TBS-TWEEN with 0.001% SDS. Membranes were imaged using a ChemiDoc system (BioRad). Alternatively, the same samples were analysed using 4–12% Bis-Tris gels (Invitrogen NP0323BOX) and stained with InstantBlue Coomassie stain (Sigma ISB1L). Note that αZFC-high was also able to pull down endogenous ZFC3H1 from human cell extracts when 50 mM rather than 150 mM NaCl was used in all buffers (Extended Data Fig. 7b).

Mass spectrometry

Each line of the polyacrylamide gel presented in Fig. 6c was cut into six pieces (1–2 mm) and prepared for mass spectrometric analysis by manual in situ enzymatic digestion (the gel area containing the binder was omitted from the analysis to avoid saturation of the detector by overabundance of binder peptides). In brief, the excised protein gel pieces were placed in a well of a 96-well microtitre plate and destained with 50% (v/v) acetonitrile and 50 mM ammonium bicarbonate, reduced with 10 mM DTT and alkylated with 55 mM iodoacetamide. After alkylation, proteins were digested with 6 ng µl−1 trypsin (Promega) and 0.1% Protease Max (Promega) overnight at 37 °C. The resulting gel pieces were extracted with ammonium bicarbonate (100 μl, 100 mM) and ammonium bicarbonate/acetonitrile (50/50, 100 μl) before being dried down by vacuum. Clean-up of peptide digests was carried out with HyperSep SpinTip P-20 (Thermo Fisher Scientific) C18 columns, using 80% acetonitrile as the elution solvent before being dried down again. The resulting peptides were extracted in 0.1% (v/v) trifluoroacetic acid acid and 2% (v/v) acetonitrile. The digest was analysed by nano-scale capillary liquid chromatography–tandem mass spectrometry (LC–MS/MS) using an Ultimate U3000 HPLC (Dionex, Thermo Fisher Scientific) to deliver a flow of 250 nl min−1. Peptides were trapped on a C18 Acclaim PepMap100 5 μm, 100 μm × 20 mm nanoViper (Thermo Fisher Scientific) before separation on a PepMap RSLC C18, 2 μm, 100 A, 75 μm × 75 cm EasySpray column (Thermo Fisher Scientific). Peptides were eluted on a 90-min gradient with acetonitrile and interfaced using an EasySpray ionization source to a quadrupole Orbitrap mass spectrometer (Q-Exactive HFX, Thermo Fisher Scientific). Mass spectrometry data were acquired in data-dependent mode with a top-25 method; high-resolution full mass scans were performed (R = 120,000, m/z 350–1,750), followed by higher-energy collision dissociation with a normalized collision energy of 27%. The corresponding tandem mass spectra were recorded (R = 30,000, isolation window m/z 1.6, dynamic exclusion 50 s). LC–MS/MS data were then searched against the Uniprot human proteome database, using the Mascot search engine programme (Matrix Science)53. Database search parameters were set with a precursor tolerance of 10 ppm and a fragment ion mass tolerance of 0.1 Da. One missed enzyme cleavage was allowed and variable modifications for oxidation, carboxymethylation and phosphorylation. MS/MS data were validated using the Scaffold programme (Proteome Software)54. All data were in addition interrogated manually. To generate the Venn diagram in Fig. 6f, we considered a threshold of minimum five peptides to consider that a protein had been identified. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium through the PRIDE55 partner repository with the dataset identifiers PXD038492 and 10.6019/PXD038492. See also Source Data for the annotated full dataset.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link