Complex peptide macrocycle optimization: combining NMR restraints with conformational analysis to guide structure-based and ligand-based design

Results for applying two computational workflows for prioritizing analogs of lead-compound Pep-01 will be described: (1) a structure-based method requiring a crystal structure of PD-L1 in a compliant conformation to bind macrocycles in this series and (2) a purely ligand-based method. Both approaches make use of information to partially constrain the conformational space required to be searched to make predictions of bound poses. The information can be derived from experimental NMR data for Pep-01 in its solution state, a co-crystal structure of Pep-01 bound to PD-L1, or a structure-based prediction of the bound state of Pep-01 to a non-cognate protein conformation.

Correspondence of Pep-01 NMR solution ensemble to its bound state

The NMR experimental analysis of Pep-01 yielded 50 distance restraints between single proton pairs, 115 distance restraints where one/both ends contained chemically equivalent protons and six torsional restraints consisting of 1 omega and 5 psi angles. Very thorough conformational search was performed using the deep ForceGen approach [15]. Refer to the Methods and Data for additional details on the NMR experimental aspects and conformational search methods.

Figure 4A shows the comparison between the PD-L1 bound state of Pep-01 (green carbons) [1, 2] and the closest-matching conformer from the ensemble that came from the NMR-restrained conformational search procedure. The particular conformation shown from the NMR-based ensemble (magenta carbons) was 6 kcal/mol above the lowest-energy example, and it was a very close match to the xGen re-fit bound ligand state (0.9 Å RMSD for all non-hydrogen atoms and 0.4 Å RMSD for ring backbone atoms). Figure 4B shows the set of non-redundant conformers from the lowest 5 kcal/mol energy window. The single lowest energy conformer was 1.4 Å RMSD from the bound state (0.4 Å RMSD for ring backbone atoms).

Fig. 4

Solution ensemble for Pep-01 from NMR restrained conformational search: A the best-matching conformation from the ensemble (magenta carbons) and the bound state by crystallography (green carbons), with sidechains of specific residues numbered in red; B the 24 lowest energy non-redundant conformers (within 5 kcal/mol of the minimum, magenta) superimposed on the bound state (green), C the single lowest energy conformer (viewed from the solvent-exposed side) with H-bonds labeled; D five alternative molecular subfragments derived from the lowest energy conformer

Clearly, the solution-state of Pep-01 is pre-organized for binding PD-L1. In particular, buried sidechains (residues 8, 1 and 10 especially) showed relatively little movement in the solution ensemble. By contrast, solvent-exposed residues (e.g. 13 and 5) with little protein contact exhibited more movement. Within the backbone itself, there are five hydrogen bonds between amide carbonyl oxygen atoms and amide protons, with an additional one between the indole N-H of Trp\(_8\) and a backbone carbonyl oxygen (see Fig. 4C). Note that these H-bonds do not form a topologically detectable beta-hairpin-like structure [9] but form a rather unique stabilizing framework.

Figure 4D shows five alternative molecular subfragments derived from the lowest energy conformer of the NMR-restrained solution ensemble. These are used to establish conformational preferences for analogs by employing graph matching. Given an analog, the subfragments are matched in order (left to right, top to bottom), and the first match is used to instantiate a set of torsional preferences for the analog during conformational search (as described earlier in the discussion of Fig. 3). The fragments are ordered from most restrained to least, with the fifth alternative allowing matches to variants at Pro\(_4\) that retain both Trp residues (of which there are a few among the patent peptides).

An underappreciated, but critical, aspect of structure-based design in the context of peptidic macrocycles is the difficulty in fitting large molecules into X-ray density correctly. The tools available for X-ray crystallography model refinement are better developed for protein modeling than for ligand modeling. Very often, the modeled ligand coordinates yield very high energy values, and modeled coordinates often contain serious errors. This has been established in a number of studies concerned with estimating bound ligand strain energy [15,16,17,18,19,20,21,22] and studies and perspectives involving X-ray model accuracy [12, 13, 23,24,25]. Figure 5A shows the comparison between the deposited PDB ligand coordinates for Pep-01 (gray carbons) and the re-fit coordinates using the xGen approach [12, 13]. Overall, the ligand had been well-modeled, but one of the chiral centers of the ligand was incorrect (red arrow), causing a distortion to the ring-closing thioether linkage.

Figure 5B shows a much more serious set of problems with the deposited structure of Pep-57 (orange carbons) compared to the xGen re-fit (yellow). Note that Pep-57 differs only at position 7 from the lead compound Pep-01, lacking the N-methyl and replacing Gly with Ser. Three cis-amide bond configurations are highlighted (red arrows), but the structure contains numerous high-strain features. If we consider the overlay of the two deposited peptide variants in Fig. 5C, we would conclude that the macrocyclic backbone took on substantially different conformations despite only two minor differences between the ligands (both at position 7). However, as is clear in Fig. 5D, the two variants adopt nearly identical backbone configurations when correctly fit into the X-ray density of PDB code 6PV9 and 5O4Y.

Fig. 5

Comparative overlays of different ligand fits to X-ray density: A original deposited Pep-01 structure (gray) and corrected real-space fit (green); B original deposited Pep-57 structure (orange) and corrected real-space fit (yellow); C original deposited Pep-01 (gray) and Pep-57 structures (orange); D corrected xGen re-fit Pep-01 (green) and Pep-57 structures (yellow)

The importance of the above comparison for the purpose of predictive modeling is that the NMR solution ensemble of Pep-01 and both Pep-01 and Pep-57 bound crystal structures agree extremely closely with respect to their conformations. They are nearly identical for the macrocyclic backbone and for the large, common substituents that make strong contact with PD-L1. High-quality fitting of low-energy conformational ensembles, whether to a set of NMR-determined restraints from a pre-organized solution ensemble or to X-ray density, is required in order to accurately model the likely bound states of analogs.

Results from the scheme presented in Fig. 3 do not vary substantially whether making use of the lowest energy conformer from the NMR ensemble (Fig. 4C) or deriving analogous molecular subfragments from either the 6PV9 or 5O4Y structures. Because an NMR ensemble can be obtained regardless of having a protein target structure, in what follows, all results reflect the conformational restraints that were derived from the experimental NMR data on Pep-01.

Figure 6 shows the low-energy conformational ensemble for Pep-57 superimposed onto its crystallographic pose. The ensemble was derived using the torsional restraints from the lowest energy conformer of Pep-01’s NMR-derived solution ensemble (recall Fig. 4C). The full ensemble contained conformers with 1.0Å RMSD to the bound state and the low energy pool depicted in Fig. 6 contained conformers with 1.4Å RMSD to the bound state. Deviations from the crystallographic pose were in the solvent-facing residues, with very tight correspondence among residues involved in protein binding. The NMR-derived torsional restraints provide an effective means to identify conformers close to the bound states of Pep-01 analogs.

Fig. 6

Non-redundant low energy (within 5 kcal/mol of the minimum) pool of conformers of Pep-57 (cyan) superimposed on the crystallographic pose of Pep-57 from 5O4Y (yellow)

Relationship of estimated binding enthalpies and experimentally measured binding affinities

Figure 7 (top) shows the relationship between experimental (X-axis) and structure-based-protocol predicted binding for 63 patent peptides (violet) and 9 subsequently made and tested project compounds (green). Note that the assays were slightly different (e.g. for Pep-01, the difference was roughly sixfold, with poorer nominal binding for the HTRF patent assay), but are generally comparable. The points labeled 1, 2, and 3 correspond, respectively, to BMT-174900, BMT-153099, and BMT-139699 from Fig. 1. Kendall’s Tau (\(\tau\)) was 0.25 (p < 0.001) with ties being counted as exact values, increasing to \(\tau = 0.50\) with prediction value ties defined as being within 5.0 kcal/mol of one another (p \(\ll\) 0.001). Given two analogs whose predicted binding enthalpy values differed by 5 kcal/mol, the likelihood that they were ranked correctly was 75%. Pearson’s correlation (r) was 0.48. Of note, the clinical candidate (BMT-174900) and an analog with similar activity (BMT-153099) were among the best scoring 6 of 72 compounds. The mean pK\(_d\) of the ten best predicted analogs was 8.1.

Fig. 7

Relationship of predicted binding enthalpy to binding free energy (calculated from experimentally determined IC\(_\) values) for the structure-based protocol (top) and ligand-based protocol (middle), and comparison of bound ligand strain estimates between the two protocols (bottom)

Figure 7 (middle) shows the relationship between experimental (X-axis) and ligand-based-protocol predicted binding for 63 patent peptides and nine subsequently made and tested project compounds, with points colored as above. Kendall’s Tau (\(\tau\)) was 0.25 (p \(=\) 0.001) with ties being counted as exact values, increasing to \(\tau = 0.47\) with prediction value ties defined as being within 5.0 kcal/mol of one another (p < 0.001). Given two analogs whose predicted binding enthalpy values differed by 5 kcal/mol, the likelihood that they were ranked correctly was 73%. Pearson’s correlation (r) was 0.42. Of note, the clinical candidate (BMT-174900) was not among the best-scoring compounds in the ligand-based protocol. The ligand-based protocol has a fundamental lack of information regarding the new favorable interactions of BMT-174900 with PD-L1 that are evident in the structure-based protocol. However, five highly active analogs from the lead optimization effort were among the top 11 predictions. The mean pK\(_d\) of the ten best predicted analogs by the purely ligand-based protocol was 8.0.

The direct correlation between the structure-based and ligand-based strain estimates was high (Fig. 7, bottom, with \(\tau = 0.55\), p \(\ll\) 0.001, \(r = 0.86\), and mean absolute difference being 2.2 kcal/mol). This reflects the degree to which the ligand-based predictions of bound ligand pose matched those from docking (discussed below). Bound ligand strain, by itself, was the major predictive factor of experimentally measured analog activity, which is why the purely ligand-based approach exhibited a similar level of predictive value to the structure-based approach.

Expectations for ligand strain

It is not clear the extent to which the predictive value of ligand strain is a general property for macrocyclic peptides that result from the type of intensive affinity-based screening used to identify Pep-01 [1], but it is possible to quantify the likelihood that ligand strain can be leveraged in lead optimization. We have recently shown that bound ligand strain follows a size-dependent probability distribution [15]. For Pep-01, the expected bound strain is roughly 24 kcal/mol and the expectation is that 95% of cases will fall between in the range of 14–34 kcal/mol. From the real-space refined coordinates of Pep-01, we obtained an estimated bound strain of 12.7 kcal/mol—clearly very low. From re-docking Pep-01 into its cognate protein structure, an analogous process to that used for the analog compounds, we obtained an even lower strain estimate: 2.3 kcal/mol. Note that because the conformer pool for Pep-01 was derived from the torsional preferences of its own solution-state, it is likely that the strain estimate from docking is systematically lower than that of the analog compounds.

Whether considering the strain estimate from crystallography (very low for its size) or from docking (extremely low), one should expect that many changes to the lead compound’s structure will result in significant increases in strain. So, maintaining low strain in the design process is clearly indicated based on where the lead compound falls within the expected strain distribution. Here, using either the structure-based protocol or the ligand-based protocol, we see that the most active analogs have extremely low strain compared with expectations: an average of 7.6 kcal/mol for those with activity \(\ge\) 8.0 pIC\(_\) units. Conversely, the least active analogs have approximately double the strain: an average of 14.7 kcal/mol for those with activity \(\le\) 6.0 pIC\(_\) units (still quite low, but the changes were modest).

Recall Pep-05 from Fig. 3, which was a Phe to Ala change at position 1, resulting in a decrease in activity of nearly 3 log units. The change resulted in a loss of less than 0.5 kcal/mol in intermolecular binding energy compared with Pep-01. However, the bound strain estimate increased by roughly 8 kcal/mol for Pep-05. The predicted loss in activity between Pep-01 and Pep-05 based on intermolecular energy and strain is overestimated, but it correctly deprioritizes Pep-05 as an analog for synthesis and testing. This type of effect is likely to be general. Large, rigid substituents such as phenylalanine create conformational constraint by excluding possible conformational states. Changes that decrease either the size or rigidity of such substituents are likely to reveal different (and lower) global minima relative to the bound conformational energy.

Pep-50 from Fig. 3 is interesting for similar reasons. The deletion of a single methylene from the Pro residue at position 4 makes a small change to the interaction footprint, leading to a decrease in intermolecular binding energy of 0.7 kcal/mol. The impact of the change on estimated strain was larger: an increase of just under 5 kcal/mol. As with Pep-05, the predicted degree of loss in activity was overestimated, but the important aspect is that the rankings of the compounds were correct: Pep-01 > Pep-50 > Pep-05. Further, while the gap between Pep-01 and Pep-05/Pep-50 was overestimated, the gap between Pep-50 and Pep-05 was quite closely predicted (\(\Delta \Delta\)G of 3.0 kcal/mol predicted vs. 2.7 experimental).

Because of the dominant effect of estimated strain, both the structure-based and ligand-based protocols agreed on the ranking of these compounds.

Protein structure adds exploitable information

In the structure-based protocol, docking score is defined as the estimated intermolecular binding enthalpy ignoring ligand strain. By itself, the correlation between this score and experimentally determined binding affinity was weak (\(\tau = 0.12\), p \(=\) 0.07). However, despite strain being the largest explanatory component of analog activity, predicted interactions between analogs and PD-L1, both quantitatively and qualitatively, had significant value.

Figure 8A shows the predicted pose of Pep-57 (a variant of Pep-01 differing slightly at position 7) when docked into the cognate protein conformation of Pep-01. The predicted pose (cyan) was just 1.0 Å RMSD from the experimentally determined pose (yellow). Figure 8B shows the top-scoring docked poses for all 72 of the analogs. The torsionally restrained search procedure yielded compliant conformers for all compounds, which exhibited largely congruent binding interactions at the protein interface.

Fig. 8

Docking of analogs to PD-L1 (PDB Code 6PV9): A comparison of predicted (cyan) and experimentally determined (yellow, PDB Code 5O4Y) bound pose of Pep-57; B top-scoring docked poses of all analogs (seen from the protein interface) superimposed on the bound pose of Pep-01 (green carbons); C predicted bound pose for BMT-174900

Figure 8C shows the predicted binding mode of BMT-174900, with two salt-bridges to the protein at positions 5 and 10 (marked with red arrows). The structure-based protocol predicted a marked improvement (3.7 kcal/mol in intermolecular score) progressing from Pep-01 to BMT-174900. The strain estimates from the structure-based and ligand-based protocols differed by less than 0.1 kcal/mol. It was the estimate of intermolecular binding enthalpy from the structure-based protocol that led to the much better ranking of BMT-174900 (see the points labeled 1 in the top and middle plots of Fig. 7).

Recall BMT-153099 from Fig. 1, which differs from BMT-174900 only at position 10, with a benzothiophene rather than the substituted indole. In the pure binding assay, the two compounds exhibited very similar activity. BMT-153099 was the only analog with a calculated intermolecular binding score (units of pK\(_d\)) higher than BMT-174900 and the calculated strain estimate was lower in both protocols. Both protocols incorrectly ranked BMT-153099 with respect to BMT-174900, with the structure-based protocol predicting a smaller gap than the ligand-based protocol. The docked pose of BMT-153099 was not significantly different than BMT-174900, with the change in score being driven by the \(\pi\)-cation interaction of the thiophene compared with the substituted indole.

BMT-139699 differed from BMT-153099 only at position 7, replacing the hydroxyl with a proton. Because the hydroxyl at position 7 is completely solvated, it was unsurprising that the estimated intermolecular binding energy differed only slightly (with BMT-153099 being 0.3 kcal/mol lower in intermolecular energy). Note, however, that the difference in experimentally measured activity corresponded to just under 2 kcal/mol. Here again, the estimated strain pointed in the correct direction, with significantly increased strain estimates for BMT-139699: 6.7 and 3.5 kcal/mol by the structure-based and ligand-based protocols, respectively.

Comparison of predictions for bound ligand poses

The structure-based protocol produced a highly accurate docking for Pep-57 and convincing poses for the remaining analogs (see Fig. 8). The parallel ligand-based protocol also predicted poses for all analogs using the eSim method [14] in order to derive an estimate for bound conformational energy. Figure 9A shows the optimal alignment of BMT-174900 to Pep-01 using ligand-based pose prediction. Gray dots show the parts of the molecular surfaces that are congruent, with notable differences only at positions 5, 10, 9 and 1. Red and blue sticks show congruence of hydrogen bond donors and acceptors, including directionality. Small spheres in the red to blue color spectrum indicate areas where the electrostatic fields of the molecules are congruent.

Fig. 9

Pose prediction accuracy for the ligand-based protocol: A optimal eSim alignment of BMT-174900 to Pep-01; B superimposition of pose prediction from ligand similarity (cyan) and docking (tan); and C relationship of pose prediction accuracy for the ligand-based protocol (violet) to the poorest possible result from the low-energy pose pool for each analog (green)

Figure 9B shows the comparison of the bound pose predictions of BMT-174900 from the structure-based protocol (tan) and the ligand-based protocol, with an RMSD of 0.9 Å. There are slight differences in the poses at positions 5 and 10, where docking identified quantitatively significant interactions with the protein, but where the ligand-based approach simply saw differences between Pep-01 and BMT-174900. Figure 9C shows the cumulative histogram of RMSD for the ligand-based pose predictions compared to the poorest expected RMSD for each analog. The RMSD values for the eSim predictions were derived by comparing the similarity-predicted poses against the top-scoring pose family from docking for each analog. The pessimistic RMSD values were derived by considering the lowest 10 kcal/mol conformers from each analog’s pool and identifying the most deviant conformer compared with the top-scoring pose family from docking.

Over 80% of the cases showed conformer matches to docking of 1.0 Å RMSD or less in the ligand-based protocol, with 98% being under 1.5 Å RMSD. This was not simply because the torsionally restrained pools contained no conformers that deviated from the likely docked poses. The pools contained a diversity of conformations for each analog, typically containing examples deviating 1.5 to − 2.5Å from the docked configurations. The close relationship between the strain estimates for the structure-based and ligand-based protocols stemmed from the quantitative similarity in their predicted bound poses for the analogs.

Note, however, that this was a structure-enabled project, which influenced the design of analogs. In this restrospective analysis, without protein structure, the substitution on the Trp indole at position 10 would probably not have been explored in using a systematic “conservative” strategy. Combinatorial exploration of such diverse sidechain variants would yield an extremely large space of analogs to prioritize. It is conceivable that a position-by-position sequential optimization, essentially an iterative line search strategy, could be used in a “blind” exploration. Such a strategy assumes that the effects of positional variations will be largely additive.

Computational cost

Large macrocycles present special challenges, particularly regarding the computational complexity of conformational search. The fgen_deep search approach requires roughly ten-fold more time than the standard thorough ForceGen search protocol for the compounds studied here. Roughly speaking, using the thorough ForceGen search protocol for all conformational searches, roughly 1000 compounds per day can be run on a 100-node cluster of 36-core nodes, with ten-fold fewer using the deep search protocol. Deeper conformational search produced stronger correlations between estimated enthalpies and experimentally determined activities, but for ranking larger sets of candidate analogs, the faster protocol would be useful for eliminating poor candidates. Given the availability of cloud-based high-performance computing, with schemes that trade perfect availability against cost, the trade-offs between calculation speed and accuracy are complex.

View original article

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Complex peptide macrocycle optimization: combining NMR restraints with conformational analysis to guide structure-based and ligand-based design

Comments (0)