Endonuclease PvuII (1PVI) DNA - GATTACAGATTACA
CAP - Catabolite gene Activating Protein (1BER)
DNA - GATTACAGATTACAGATTACA Endonuclease PvuII bound to palindromic DNA recognition site CAGCTG (1PVI) DNA - GATTACAGATTACAGATTACA TBP - TATA box Binding Protein (1C9B)
CAP - Catabolite gene Activating Protein (1BER)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
TBP - TATA box Binding Protein (1C9B)
 

Structure validation in YASARA


Structure validation examples

When working with experimental protein structures or predicted models, the first question is usually about its quality. Is the structure roughly correct? If yes, are there maybe some doubtful regions that must be treated with care, especially when making predictions to guide further experimental research?

The usual approach is therefore to validate the structure. The most convincing results are achieved if the validation is based on a comparison with a gold standard of trusted reference structures. Not surprisingly, the reference structures may share little or no sequence similarity with the structure to validate, so how to compare them? The solution is to base the validation on general aspects of protein structure, encoded in knowledge based potentials.

A final problem still needs to be solved: The knowledge-based energies depend on the size and shape of the protein, and also on its amino acid composition. So one cannot really associate certain energies with 'good' or 'bad'. The obvious fix is to normalize the energies, remove the dependencies mentioned above, and obtain estimates for the expected average energy and its standard deviation from the gold standard reference structures. When validating a certain structure, one can then easily calculate how many standard deviations it is away from the average, thereby obtaining a 'Z-score'.  E.g. a structure with a Z-score of -4 is four standard deviations below average and can be considered bad. Z-scores form the basis of most structure validation tools, from one of the first around[1] to today's most extensive validation tool: WHAT_CHECK[2], which is also part of YASARA in the Twinset.

In YASARA Structure, validation takes a twist: it is entirely based on Z-scores calculated from molecular dynamics force field energies. As it turns out, this approach has only advantages:

  • Since some YASARA force fields have knowledge-based components, all the classic knowledge-based checks are available, like normality of dihedral angles (Ramachandran plot), 1D and 3D packing. When visualizing check results, YASARA maps the range [perfect.. ..bad] to the color gradient [blue.. ..magenta.. ..red.. ..orange.. ..yellow]. The example on the right compares a low resolution NMR structure (1ACP) with a high resolution X-ray structure (1CRN).
  • New checks for features that cannot be directly extracted from known protein structures are available, e.g. the normality of electrostatic & Van der Waals interactions.
  • Checks are not limited to proteins, but also work for other molecules like ligands. The last example on the right shows a nicotinamide- adenine- dinucleotide cofactor in PDB file 1A5Z. The atoms with the lowest 1D packing Z-score are in the amide group bound to the nicotine ring. Not surprisingly, it turns out that this amide group is incorrectly placed and needs to be rotated by 180°, as described in more detail here. The ability to validate ligands and their interaction with proteins is of crucial importance in pharmaceutical research.


R E F E R E N C E S

[1]
PROCHECK: a program to check the stereochemical quality of protein structures
Laskowski RA, MacArthur MW, Moss DS, Smith DK and Thornton JM (1993) J. Appl. Cryst. 26, 283-291
[2] Errors in protein structures
Hooft RWW, Vriend G, Sander C, Abola EE (1996) Nature 381,272