Endonuclease PvuII (1PVI) DNA - GATTACAGATTACA
CAP - Catabolite gene Activating Protein (1BER)
DNA - GATTACAGATTACAGATTACA Endonuclease PvuII bound to palindromic DNA recognition site CAGCTG (1PVI) DNA - GATTACAGATTACAGATTACA TBP - TATA box Binding Protein (1C9B)
CAP - Catabolite gene Activating Protein (1BER)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
TBP - TATA box Binding Protein (1C9B)
 

Automatic molecule typing in YASARA

Ring typing examples

Compared to the very high resolution data found for example in the CSD ('Cambridge Structural Database'), the accuracy of small molecules in the PDB is usually much lower. Especially bond lengths and bond angles are often far away from their true values. Still, the PDB contains a large number of protein-ligand complexes from which a lot of crucial data for drug design could be derived, if it was possible to identify and correct the ligand structures.

YASARA features an advanced molecule typer that converts a point cloud of ligand atoms extracted from e.g. a PDB file (CONECT records can be used if present) to an accurate representation of the molecule including bond orders and hydrogen atoms, optionally optimized by semi-empirical quantum mechanics.

The stages of the 'AutoSMILES' algorithm can be summarized as follows:

  • Assignment of hybridization states and initial bond orders using an algorithm developed in collaboration with the OpenBabel team.
  • Typing of ring systems by an algorithm that combines graph theory with chemical knowledge about valence ambiguities and preferred tautomers.
  •  Assignment of fractional bond orders that provide a better picture of underlying symmetries and facilitate the automatic assignment of force field parameters for molecular dynamics simulations. The following color mapping is used in the examples on the right:
    Bond order
    Color
    1
    grey
    1.25
    blue
    1.33
    magenta
    1.50
    red
    1.66
    orange
    1.75
    bright orange
    2
    yellow

  • Introduction of pH dependency of bond orders and protonation states using a library of SMILES strings.
  • Optional conversion to a Kekulé form that avoids fractional bond orders.
  • Optional semi-empirical optimization using the AM1 or PM3 QM methods (requires YASARA Dynamics+).
  • Fully automatic force field parameter assignment to run simulations at the touch of a button.


All the examples on the right side have been typed in the absence of hydrogen atoms and disregarding bond length information.