May 29, 2024

The structural basis of odorant recognition in insect olfactory receptors – Nature

Expression and purification of MhOR5

The coding sequence of M. hrabei OR5 (MhOR5) was synthesized as a gene fragment (Twist). Residues Lys2 to Pro474 were cloned into a pEG BacMam vector37 containing N-terminal tags of Strep II, superfolder GFP38, and an HRV 3C protease site for cleavage (N-CACCatg-ST2-SGR-sfGFP-PPX-AscI-MhOR5-taa-NotI-C). The AscI/NotI restriction enzyme sites enable efficient cloning of different OR sequences. SF9 cells (ATCC CRL-1711) were used to produce baculovirus containing the MhOR5 construct, and the virus, after three rounds of amplification, was used to infect HEK293S GnTI cells (ATCC CRL-3022)37. Cell lines were not authenticated except as performed by the vendor. HEK293S GnTI cells were grown at 37 °C with 8% carbon dioxide in Freestyle 293 medium (Gibco) supplemented with 2% (v/v) fetal bovine serum (Gibco). Cells were grown to 3 × 106 cells/ml and infected at a multiplicity of infection of about 1. After 8–12 h, 10 mM sodium butyrate (Sigma-Aldrich) was added to the cells and the temperature was dropped from 37 °C to 30 °C for the remainder of the incubation. Seventy-two hours after initial infection, cells were collected by centrifugation, washed with phosphate-buffered saline (pH 7.5; Gibco), weighed and flash frozen in liquid nitrogen. Pellets were stored at −80 °C until they were thawed for purification.

For purification, cell pellets were thawed on ice and resuspended in 20 ml lysis buffer per gram of cells. Lysis buffer was composed of 50 mM HEPES/NaOH (pH 7.5), 375 mM NaCl, 1 μg/ml leupeptin, 1 μg/ml aprotinin, 1 μg/ml pepstatin A, 1 mM phenylmethylsulfonyl fluoride (PMSF; all from Sigma-Aldrich) and about 3 mg DNase I (Roche). MhOR5 was extracted using 0.5% (w/v) n-dodecyl β-d-maltoside (DDM; Anatrace) with 0.1% (w/v) cholesterol hemisuccinate (CHS; Sigma-Aldrich) for 2 h at 4 °C. The mixture was clarified by centrifugation at 90,000g and the supernatant was added to 0.1 ml StrepTactin Sepharose resin (GE Healthcare) per gram of cells and rotated at 4 °C for 2 h. The resin was collected and washed with 10 column volumes (CV) of 20 mM HEPES/NaOH, 150 mM NaCl with 0.025% (w/v) DDM and 0.005% (w/v) CHS (together, SEC buffer). MhOR5 was eluted by adding 2.5 mM desthiobiotin (DTB) and cleaved overnight at 4 °C with HRV 3C Protease (EMD Millipore). Sample was concentrated to about 5 mg/ml and injected onto a Superose 6 Increase column (GE Healthcare) equilibrated in SEC buffer. Peak fractions containing MhOR5 were concentrated until the absorbance at 280 nm reached 5–6 (approximately 5 mg/ml) and immediately used for grid preparation and data acquisition. For the eugenol-bound structure, peak fractions were pooled, and eugenol (Sigma Aldrich, CAS#97-53-0) dissolved in dimethylsulfoxide (DMSO; both Sigma-Aldrich) was added for a final odour concentration of 0.5 mM, and the complex was incubated at 4 °C for 1 h. The maximum DMSO concentration was kept below 0.07%. The complex was then concentrated to approximately 5 mg/ml and used for grid preparation. For the DEET-bound structure, sample from the overnight cleavage step was concentrated to about 5 mg/ml and injected into the Superose 6 Increase column equilibrated in SEC buffer with 1 mM DEET (Sigma Aldrich, CAS#134-62-3). Peak fractions were concentrated to about 5 mg/ml and used immediately for grid preparation.

Cryo-EM sample preparation and data acquisition

Cryo-EM grids were frozen using a Vitrobot Mark IV (FEI) as follows: 3 μl of the concentrated sample was applied to a glow-discharged Quantifoil R1.2/1.3 holey carbon 400 mesh gold grid, blotted for 3–4 s in >90% humidity at room temperature, and plunge frozen in liquid ethane cooled by liquid nitrogen.

Cryo-EM data were recorded on a Titan Krios (FEI) operated at 300 kV, equipped with a Gatan K2 Summit camera. SerialEM39 was used for automated data collection. Movies were collected at a nominal magnification of 29,000× in super-resolution mode resulting in a calibrated pixel size of 0.51 Å/pixel, with a defocus range of approximately −1.0 to −3.0 μm. Fifty frames were recorded over 10 s of exposure at a dose rate of 1.22 electrons per Å2 per frame.

Movie frames were aligned and binned over 2 × 2 pixels using MotionCor240 implemented in Relion 3.041, and the contrast transfer function parameters for each motion-corrected image were estimated using CTFFIND442.

Apo structure

Two datasets were collected with 4,050 micrographs in dataset A and 3,748 micrographs in dataset B. Processing was done independently for each dataset in the following way: particles were picked using a 3D template generated in an initial model from a dataset of 5,000 particles picked in manual mode. A total of 562,794 (dataset A) and 536,145 (dataset B) particles were subjected to 2D classification using RELION-3.041. Particles from the best 2D classes (210,833 for dataset A, 183,061 for dataset B) were selected and subjected to 3D classification imposing C4 symmetry and adding a soft mask to exclude the detergent micelle after 25 iterations. One class from each dataset containing 44,884 (dataset A) and 43,788 (dataset B) particles was clearly superior in completeness and definition of the transmembrane domains. These particles were subjected to 3D refinement with C4 symmetry, followed by Bayesian polishing and CTF refinement. The polished particles from both datasets were exported to cryoSPARC v243 and processing continued with the joined dataset of 88,672 particles. In cryoSPARC, further heterogeneous refinement resulted in a single class with 49,832 particles that were subjected to particle subtraction with a micelle mask. Non-uniformed refinement of subtracted particles imposing C4 symmetry yielded the final map with an overall resolution of 3.3 Å as estimated by cryoSPARC with a cutoff for the Fourier shell correlation (FSC) of 0.14344.

Ligand-bound structures

Processing for the eugenol-bound and DEET-bound structures occurred through the following pipeline: 4,410 (eugenol) and 4,365 (DEET) micrographs were collected and used to pick 461,254 (eugenol) and 787,448 (DEET) particles that were extracted, unbinned and exported into cryoSPARC v2. In cryoSPARC, several rounds of 2D classification resulted in 221,339 (eugenol) and 180,874 (DEET) particles that were used to generate an initial model with four classes with no imposed symmetry. These models were inputted as templates of a heterogeneous refinement with no imposed symmetry, from which one (eugenol) and two (DEET) final classes were selected containing 129,031 (eugenol) and 121,441 (DEET) particles. These particles were refined and exported to RELION 3.0 where they were subjected to a round of 3D classification with no imposed symmetry. The best class from this 3D classification contained 54,900 (eugenol) and 56,191 (DEET) particles that were subjected to Bayesian polishing and CTF refinement. Polished particles were then imported into cryoSPARC v2 and subjected to particle subtraction. Final non-uniform refinement with C4 symmetry imposed resulted in the final maps with overall resolution of 2.9 Å in both cases, estimated with a cutoff for the FSC of 0.143. In all cases, the four-fold symmetry of the channel was evident from the initial 2D classes without having imposed symmetry and refinements without imposed symmetry produced four-fold symmetric maps.

Model building

The Cryo-EM structure of Orco (Protein Data Bank (PDB) accession 6C70) was used as a template for homology modelling of MhOR5 using Modeller45, followed by manual building in Coot46. The 3.3 Å density map of the apo was of sufficient quality to build the majority of the protein, with the exception of the S3–S4 and S4–S5 loops, the 13 N-terminal residues and the 5 C-terminal residues. The models were refined using real-space refinement implemented in PHENIX47 for five macro-cycles with four-fold non-crystallographic symmetry applied and secondary structure restraints applied. The eugenol- and DEET-bound models were refined including the ligands, which were placed as a starting point within the corresponding density in a pose that was obtained through docking methods (described below) and with restraints obtained with the electronic Ligand Builder and Optimization Workbench58 (eLBOW) implemented in PHENIX. Model statistics were obtained using MolProbity. Models were validated by randomly displacing the atoms in the original model by 0.5 Å, and refining the resulting model against half maps and full map48. Model–map correlations were determined using phenix.mtriage. Images of the maps were created using UCSF ChimeraX49. Images of the model were created using PyMOL50 and UCSF ChimeraX49.

Docking analysis

All compounds were docked using Glide20,51 implemented in Maestro (Schrödinger, suite 2020). In brief, the model was imported into Maestro and prepared for docking. A 20 Å3 cubical grid search was built centred in the region of observed ligand density. Ligand structures were imported into Maestro by their SMILES unique identifiers and prepared using Epik52 to generate their possible tautomeric and ionization states, all optimized at pH 7.0 ± 2. All ligands were docked within the built grid, and the top poses that best fit the density are presented in Extended Data Fig. 8. The top activators scored with values between −7.4 and −4. While all activators docked with negative scores, some non-activators also docked with favourable scores. For example, caffeine docked favourably despite the molecule not activating the channel in our functional experiments. As docking does not incorporate dynamics of the receptor, it is not expected that docking will correlate homogeneously or monotonically with experimentally determined activity of ligands. At most a qualitative agreement can be expected.

Structure analysis

Residues at subunit interfaces were identified using PyMOL as any residue within 5 Å of a neighbouring subunit (Extended Data Fig. 5d). The pore diameters along the central axis and lateral conduits were calculated using the program HOLE53, which models atoms as solid spheres of Van der Waals radius (Fig. 2a–c, Extended Data Fig. 10d, e). Two calculations were performed for each structure: one along the central four-fold axis (central pore) and another between subunits near the cytosolic membrane interface (lateral conduits). The plots in Fig. 2b and Extended Data Fig. 10e show the diameter along the central axis of the main conduit and the lateral conduit. The measurements in Fig. 2d and Extended Data Fig. 10f between residues lining the pore are taken from atom centres using PyMol. Electrostatic surface representations were performed using ChimeraX v1.1, coulombic estimation with default parameters (Extended Data Fig. 7). Morph videos were created in ChimeraX v1.1 with direct interpolation between states.

Electrophysiology

HEK293 cells were maintained in high-glucose Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% (v/v) fetal bovine serum (FBS) and 1% (v/v) GlutaMAX (all Gibco) at 37 °C with 5% (v/v) carbon dioxide. Cells were plated on 35-mm tissue-culture-treated Petri dishes 72–48 h before recording, and infected with the same pEG BacMam GFP-tagged MhOR5 construct used for expression 24–48 h before recording. Electrodes were drawn from borosilicate patch glass (Sutter Instruments) and polished (MF-83, Narishige Co.) to a resistance of 3–6 MΩ when filled with pipette solution. Analogue signals were digitized at 20 kHz (Digidata 1440A, Molecular Devices) and filtered at 1 kHz (whole-cell) or 2 kHz (patch recordings) using the built-in four-pole Bessel filter of a Multiclamp 700B patch-clamp amplifier (Molecular Devices) in whole-cell or patch mode. Whole-cell recordings were baseline-subtracted offline. Patch signals were further resampled offline for representations.

Whole-cell and single-channel recordings in Fig. 1c and Extended Data Fig. 2e were performed using an extracellular (bath) solution composed of 135 mM NaCl, 5 mM KCl, 2 mM MgCl2, 2 mM CaCl2, 10 mM glucose, 10 mM HEPES-Na/HCl (pH 7.3, 310 mOsm/kg) and an intracellular (pipette) solution composed of 150 mM KCl, 10 mM NaCl, 1 mM EDTA-Na, 10 mM HEPES-Na/HCl (pH 7.45, 310 mOsm/kg). Single-channel recordings were done in excised outside-out mode. Stock eugenol solution was prepared by dissolving in DMSO at 150 mM, and working solutions were prepared by diluting stocks to 3 μM in extracellular solution. Solutions were locally perfused using a microperfusion system (ALA Scientific Instruments).

Cell-based GCaMP fluorescence calcium flux assay

All DNA constructs used in this assay were cloned into a modified pME18 s vector with no fluorescent marker, flanked by AscI/NotI restriction enzyme sites for efficient cloning. Each transfection condition contained 0.5 μg of a plasmid encoding GCaMP6s (Addgene #40753) and 1.5 μg of the plasmid encoding the appropriate olfactory receptor, diluted in 250 μl OptiMEM (Gibco). In experiments with heteromeric olfactory receptors, the total amount of DNA was 1.5 μg, in a ratio of 1:1 of Orco:OR. These were diluted in a solution containing 7 μl Lipofectamine 2000 (Invitrogen) and 250 μl OptiMem, followed by a 20-min incubation at room temperature. HEK293 cells were maintained in high-glucose DMEM supplemented with 10% (v/v) FBS and 1% (v/v) GlutaMAX at 37 °C with 5% (v/v) carbon dioxide. Cells were detached using trypsin and resuspended to a final concentration of 1 × 106 cells/ml. Cells were added to each transfection condition, mixed and added to 2 × 16 wells in a 384-well plate (Grenier CELLSTAR). Four to six hours later, a 16-port vacuum manifold on low vacuum was used to remove the transfection medium, replaced by fresh FluoroBrite DMEM (Gibco) supplemented with 10% (v/v) FBS and 1% (v/v) GlutaMAX. Twenty-four hours later, this medium was replaced with 20 μl reading buffer (20 mM HEPES/NaOH (pH 7.4), 1× HBSS (Gibco), 3 mM Na2CO3, 1 mM MgSO4, and 2 or 5 mM CaCl2) in each well. The calcium concentration was optimized for each receptor to account for their differences in baseline activity: for experiments with MhOR5 and MhOR5 mutants, reading buffer contained 2 mM CaCl2, while 5 mM CaCl2 was used for MhOR1, Orco and Orco–AgOR28 heteromers. The fluorescence emission at 527 nm, with excitation at 480 nm, was continuously read by a Hamamatsu FDSS plate reader. After 30 s of baseline recording, an optimized amount of odorant solution—10 μl for all MhOR-containing experiments or 20 μl for all Orco-containing experiments—was added to the cells and read for 2 min. All solutions were warmed to 37 °C before beginning.

Seven ligand concentrations were used for each transfection condition in sequential dilutions of 3, alongside a control well of only reading buffer. Ligands were dissolved in DMSO to 150 mM, then diluted with reading buffer to a highest final-well concentration of 0.5 mM (DMSO never exceeded 0.5%). Water-soluble ligands (arabinose, caffeine, denatonium, glucose, MSG, sucrose) were dissolved directly into reading buffer. If experimental data indicated a more sensitive response than this range, the concentration was adjusted accordingly. Ligand concentrations for mutants were the same as for the corresponding wild-type OR. Each plate contained a negative control of GCaMP6s transfected alone and exposed to eugenol for MhOR5 and VUAA1 for Orco experiments. Additionally, each plate included the corresponding wild-type OR with its cognate ligand—MhOR5 and MhOR1 with eugenol, Orco with VUAA1, and Orco–AgOR28 with acetophenone—as a positive control to account for plate-to-plate variation in transfection efficiency and cell count. A control of DMSO alone was also tested to ensure no activity effects were due to the solvent. Each concentration of ligand was applied to four technical replicates, which were averaged and considered a single biological replicate.

The baseline fluorescence (F) was calculated as the average fluorescence of the 30 s before odour was added to the plate. Within each well, ΔF was calculated as the difference between the average of the last 10 s of fluorescence and the baseline F. ΔF/F was then calculated as the ΔF divided by the baseline fluorescence (F). Finally, the ΔF/F for each concentration was normalized to the maximum ΔF/F value of the corresponding positive control present on each plate: MhOR5 and MhOR1 with eugenol, Orco with VUAA1, and Orco–AgOR28 with acetophenone to account for inevitable variations in transfection efficiency and cell counts across different plates. The normalized ΔF/F averaged across all experiments for a given condition is the value used to construct the dose–response curves in all plots (Figs. 1b, 2e–g, Extended Data Figs. 2d, 9a–c, 10c, 11b). All wild-type curves come from the same plates as the experimental data in the same plot. Baseline values for wild-type and mutant channels were found by normalizing each F value by the negative GCaMP6s-only control on the same plate (Extended Data Figs. 1c, 9a, e).

For all experiments, GraphPad Prism 8 was used to fit the dose–responses curves to the Hill equation from which the EC50 of the curve was extracted. Three metrics were used to characterize the dose–response curve for each ligand: activity index, log(EC50) and max ΔF/F. For conditions where EC50 was too high for the dose–response curve to reach saturation and therefore could not be fitted to a Hill equation, a value of −2 was assigned to the EC50, which is more than an order of magnitude higher than the highest concentration used. Max ΔF/F is the maximum response achieved at the highest concentration. Activity index is defined as the negative product of log(EC50) and max ΔF/F, as follows:

 Activity index = −log(EC50) × max ΔF/F

Gels and small-scale transfections

For western blots and fluorescence-detection size-exclusion chromatography (FSEC) traces (Extended Data Figs. 1a, b, 9g), HEK293 cells were maintained in high-glucose DMEM supplemented with 10% (v/v) FBS and 1% (v/v) GlutaMAX at 37 °C with 5% (v/v) carbon dioxide. Cells were detached using trypsin and plated in six-well plates at a concentration of 0.4 × 106 per well. Twenty-four hours later, cells were transfected with 2 μg of DNA in the same superfolder GFP-containing pEG BacMam vector used for large-scale purification and 9 μl Lipofectamine 2000 (Invitrogen) diluted in 700 μl OptiMEM and added dropwise to the cells after a 5-min incubation. Twenty-four hours later, cells were checked for GFP fluorescence, rinsed with phosphate-buffered saline, and collected by centrifugation. Cells were either frozen at −20 °C or used immediately.

Cell pellets were rapidly thawed and resuspended in 200 μl lysis buffer containing 50 mM HEPES/NaOH (pH 7.5), 375 mM NaCl, an EDTA-free protease inhibitor cocktail (Roche), and 1 mM PMSF. The protein was extracted for 2 h at 4 °C by adding 0.5% (w/v) DDM with 0.1% (w/v) CHS after 10 s sonication in a water bath. This mixture was then clarified by centrifugation and filtered. The supernatant was added to a Shimadzu autosampler connected to a Superose 6 Increase column equilibrated in SEC buffer. An aliquot of the supernatant was also used to run SDS–PAGE (Bio-Rad, 12% Mini-PROTEAN TGX) and Blue Native(BN)-PAGE (Invitrogen, 3–12% Bis-Tris) gels. Gels were transferred using Trans-Blot Turbo Transfer Pack (Bio-Rad) and blocked overnight. The following day, gels were stained with rabbit anti-GFP polyclonal antibody (Life Technologies; 1:20,000), washed, incubated with anti-rabbit secondary antibody (1:10,000), and imaged with ImageLab.

Lifetime sparseness calculation

The lifetime sparseness54,55 measure in Extended Data Fig. 1d was used to quantify olfactory receptor tuning breadth and calculated as follows:

$${rm{Lifetime}},{rm{sparseness}}=,left(frac{1}{1-frac{1}{n}}right)times left(1-frac{{left({sum }_{i=1}^{n}frac{{{rm{res}}}_{i}}{n}right)}^{2}}{{sum }_{i=1}^{n}frac{{{rm{res}}}_{i}^{2}}{n}}right),$$

in which n is the number of ligands in the set, and resi is the receptor’s response to a given ligand i. All inhibitory responses (values below 0) were set to 0 before the calculation54,55. The Drosophila melanogaster OR dataset comes from the DoOR database56.

Multiple regression analysis

A set of 11 molecular descriptors were compiled for all 54 ligands tested from PubChem, Sigma-Aldrich, ChemSpider, EPA, and The Good Scents Company; the values used are in Supplementary Table 9. A multiple regression analysis using the scikit-learn Linear Regression module was used to assess the accuracy with which the receptor activity could be predicted by individual descriptors (1-dimensional analysis) or combinations of two descriptors (2-dimensional analysis) (Extended Data Table 2). Owing to the absence of reported metrics for some ligands—acetic acid, citric acid, MSG, sucrose, denatonium, and VUAA1—the analysis was performed on the remaining 48 ligands. For the 1-dimensional analysis, a single variable linear regression was performed for each descriptor independently. The analysis sought to fit a linear model with coefficients w1, …, wn + 1, in which n is the dimension of the input data. The optimal coefficient set was determined using residual sum of squares optimization between the observed activity index targets and those predicted by linear approximation using solved coefficients. This process was repeated for the 2-dimensional case, using every unique permutation of descriptors across the 11-dimensional space. As a means of assessing the predictive power of a given combination, the R2-value, reflecting the square of the correlation coefficient between observed and modelled values of the activity index, was calculated for each linear model and reported in Extended Data Table 2. This allowed ranking of descriptor sets based on accuracy of prediction.

Sequence alignments

For Extended Data Fig. 11a, the alignment between the sequences of MhOR1 and MhOR5 was done using MAFFT implemented in JalView57 with minimal manual adjustment based on the structure of MhOR5. For Extended Data Fig. 5a, the sequence alignment between A. bakeri Orco and MhOR5 was done by aligning the published structure of A. bakeri Orco (PDB 6C70) and the structure of MhOR5 in PyMOL. All sequence alignments were visualized and plotted using JalView57.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Source link