Structure validation in YASARAWhen working with experimental protein
structures or predicted models, the first question is usually about its
quality. Is the structure roughly correct? If yes, are there maybe
some doubtful regions that must be treated with care, especially when
making predictions to guide further experimental research? The usual approach is therefore to validate the
structure. The most convincing results are achieved if the validation
is based on a comparison with a gold standard of trusted reference
structures. Not surprisingly, the reference structures may share little
or no sequence similarity with the structure to validate, so how to compare
them? The solution is to base the validation on general aspects of
protein structure, encoded in knowledge
based potentials. A final problem still needs to be solved: The
knowledge-based energies depend on the size and shape of the protein,
and also on its amino acid composition. So one cannot really associate
certain energies with 'good' or 'bad'. The obvious fix is to normalize
the energies, remove the dependencies mentioned above, and obtain
estimates for the expected average energy and its standard deviation
from the gold standard reference structures. When validating a certain
structure, one can then easily calculate how many standard deviations
it is away from the average, thereby obtaining a 'Z-score'. E.g.
a structure with a Z-score of -4 is four standard deviations below
average and can be considered bad. Z-scores form the basis of most
structure validation tools, from one of the first around[1] to today's
most extensive validation tool: WHAT_CHECK[2], which is also part of
YASARA in the Twinset. In YASARA Structure, validation takes a twist:
it is entirely based on Z-scores calculated from molecular dynamics
force field energies. As it turns out, this approach has only
advantages:
|