Endonuclease PvuII (1PVI) DNA - GATTACAGATTACA
CAP - Catabolite gene Activating Protein (1BER)
DNA - GATTACAGATTACAGATTACA Endonuclease PvuII bound to palindromic DNA recognition site CAGCTG (1PVI) DNA - GATTACAGATTACAGATTACA TBP - TATA box Binding Protein (1C9B)
CAP - Catabolite gene Activating Protein (1BER)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
TBP - TATA box Binding Protein (1C9B)
 

Knowledge-based potentials in YASARA


Knowledge-based potentials

The most successful methods in structural bioinformatics have usually been those that make extensive use of the available knowledge, instead of trying to start from first principles. When predicting a protein structure or analyzing the quality of a homology model, it is an enormous help to peek at the thousands of known structures deposited in the PDB to first get an idea of what a real protein looks like. Then it becomes much easier to judge the correctness of the model.

The standard way of 'getting an idea' is an extensive statistical analysis of known protein structures, trying to extract common structural features and preferences from the 3D coordinates. Qualitative insights like 'hydrophobic side-chains like to be in contact' can be converted to quantitative energies thanks to Boltzmann's formula, which states that a certain configuration occurs with a frequency that is proportional to exp(-E/kT), where E is the energy, T the temperature and k the Boltzmann constant ('exp' is the exponential function). So we only need to extract the frequency (e.g. of two methyl groups at a distance of 5 Å) from the PDB, and obtain the corresponding energy by turning Boltzmann's formula around: E ~ -log(frequency)*kT.

The resulting energy functions are called 'knowledge-based potentials' and are widely used today, after pioneering work in the 1990s, done for one dimensional distance dependent potentials in ProSA[1] and three dimensional direction dependent potentials in WHAT IF[2]. YASARA Structure builds on these cornerstones and provides a number of innovations[3]:

  • The statistical analysis is not limited to proteins. Atom types and knowledge-based potentials have been derived for all other molecules in PDB files, so that energies can also be calculated for DNA/RNA, metal ions and ligands, the latter being especially important for pharmaceutical research.
  • Knowledge-based potentials can be visualized easily: Example (A) on the right shows the 1D distance-dependent potential for two methyl groups, one from Leu 18 in red, and one from Val 15 in yellow. To aid visualization, the vertical energy axis is mirrored, the yellow top corresponds to the energy minimum. The red arrow marks the current distance of 4.72 Å. The arrow adapts in real-time to atom movements, for example during a simulation.
  • Example (B) shows the 3D orientation-dependent potential of a carboxyl group carbon around the arginine side-chain, blue indicates the unfavorable high-energy regions.
  • Atomic contact analysis is not the only application of knowledge-based potentials. The distribution and interdependence of dihedral angles can be analyzed equally well[4]. Example © shows the potential of the backbone dihedral angle φ for threonine. The yellow φ arrow points from -180 to +180 degrees.
  • Two dihedral angles can be combined to a single 2D potential: example (D) shows the φ/ψ potential of threonine. Again the yellow tops are the low energy regions, corresponding to the preferred areas in the Ramachandran plot. The orange arrow indicates the ψ axis.
  • Finally, example (E) shows a 3D potential: the combined φ/ψ/χ1 potential of threonine, which captures the interdependence between the backbone and side-chain conformation. The long red arrow indicates the χ1 axis.
  • Knowledge-based potentials like the ones shown above have been incorporated into two new force fields, exclusively available in YASARA Structure. The contact potentials allow to calculate highly informative knowledge-based energies, while the dihedral angle potentials are differentiable and thus permit also force calculations, resulting in the most accurate force fields for structure prediction and refinement that YASARA has to offer.

R E F E R E N C E S
[1] Recognition of Errors in Three-Dimensional Structures of Proteins
Sippl MJ (1993) Proteins 17, 355-362
[2] Quality control of protein models: Directional atomic contact analysis
Vriend G, Sander C (1993) J.Appl.Cryst. 26, 47-60
[3] Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8
Krieger E, Joo K, Lee J, Lee J, Raman S, Thompson J, Tyka M, Baker D, Karplus K (2009), Proteins 77 Suppl 9,114-122
[4] Improvements and Extensions in the Conformational Database Potential for the Refinement of NMR and X-ray Structures of Proteins and Nucleic Acids
Kuszewski J, Gronenborn AM and Clore GM (1997) Journal of Magnetic Resonance 125, 171-177