CAP - Catabolite gene Activating Protein (1BER)
DNA - GATTACAGATTACAGATTACA Endonuclease PvuII bound to palindromic DNA recognition site CAGCTG (1PVI) DNA - GATTACAGATTACAGATTACA TBP - TATA box Binding Protein (1C9B)
CAP - Catabolite gene Activating Protein (1BER)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
TBP - TATA box Binding Protein (1C9B)

Hydrogen bonding networks in YASARA

Hydrogen bonding examples

Most protein structures solved by X-ray crystallography have a drawback that becomes apparent as soon as the structure is used for molecular simulations and related applications: the electron density traces the shape of the molecules, but does not really permit to identify hydrogen atoms or distinguish the heavier elements C, N and O. Consequently ambiguities arise if groups of atoms can be rotated without affecting the overall shape.

Typical examples in proteins are the side-chains of asparagine and glutamine, whose terminal amide group can be rotated by 180° with almost no impact on the electron density. The same applies to the imidazole ring of histidine, which can additionally adopt three different pH dependent protonation patterns, giving rise to six different states, that can hardly be distinguished based on the electron density alone. Aspartates and glutamates can adopt three different states (negatively charged or neutral with the hydrogen on either of the two terminal side-chain oxygens), with the neutral states being mostly important for buried residues with strongly shifted pKa values.

If a molecular simulation is run with incorrectly oriented or protonated side-chains, the protein stability can be reduced significantly, in the worst case  the protein may even fall apart. The only way to resolve the issue is to infer the correct orientations and protonation patterns from the chemical environment, most importantly the hydrogen bonding possibilities. Since several of the critical side-chains are often found in close contact, a choice made for one side-chain immediately influences others, giving rise to a hydrogen bonding network that must be optimized in one shot. This topic has been pioneered by WHAT IF in 1996[1], and YASARA Structure expands the original concepts with a number of additional features:

Consideration of bumps: One side-chain amide hydrogen of Asn 193 in PDB file 2BNU (second image on the right) bumps strongly into the side-chain of Lys 152 (1.29 Å distance), another hydrogen of Asn 192 is very close to its own backbone carbon (2.11 Å). Flipping both side-chains resolves the issue, showing that bumps can provide important hints[2]. A classic case is Asn 189 nearby: its side-chain oxygen is unfavorably close (3 Å) to a backbone oxygen, both carry a negative partial charge.  After flipping the side-chain around, a perfect hydrogen bond can be formed. The water molecule in between easily adapts to the new environment (third image on the right).

pH dependent analysis of ligands: The fourth image shows two inhibitors with residue name CHQ in PDB file 1W1T. YASARA's molecular typing capabilities let it recognize the imidazole rings and conclude that the state of the imidazole in CHQ 1514 is uniquely determined by an internal hydrogen bond. The other imidazole in CHQ 1513 is more difficult: from the built-in pH model, YASARA knows that the standard pKa of the imidazole ring is 6.95. The hydrogen bonding scoring function predicts that the influence of the neighboring carboxyls of Asp 215 and Glu 144 is more than enough to shift this pKa above 7 (the crystallization conditions), leading to a positively charged imidazole where both nitrogens are protonated and donate hydrogen bonds to the nearby carboxyl groups. The alternative of a neutral Glu 144 donating a hydrogen bond to the imidazole is ruled out because the pKa of a carboxyl group is much lower at around 4.0 and protonation thus much more costly.

Identification of ambiguous ligand electron density: The last image on the right shows a nicotinamide-adenine-dinucleotide cofactor in PDB file 1A5Z. YASARA analyzes the molecule and concludes that the orientation of the amide group bound to the nicotine ring cannot be determined from the electron density and should thus be optimized.  The hydrogen bonding network analysis then immediately recognizes the error in the NAD structure and rotates the amide group by 180° (green arrows), so that two perfect hydrogen bonds can be formed with the backbone of Thr 246.

Jumping ligand protons: The nicotinamide-adenine-dinucleotide cofactor from the previous example contains an internal pyrophosphate group. If the user decided to optimize the hydrogen bonding network at very low pH, YASARA would have to add one or two protons to the pyrophosphate. Putting aside the question whether or not the protein can still fold at low pH, YASARA scores all permutations of protonation states to find the best one: four states for placing one proton, and 12 states for placing two protons on the four available oxygens.

Solvation effects: While hydrogen bonding network optimizers often tend to maximize the number of hydrogen bonds, YASARA minimizes the number of energetically unfavorable structural features instead. The differences can be subtle and often involve solvation effects, where the first approach does not yield to correct answer.

High performance: YASARA uses the same graph-theory algorithm as for protein side-chain prediction, combining dead-end elimination with graph reduction to biconnected components[3], so that the hydrogen bonding network can be solved within a fraction of a second. Including the setup time, the optimization of a typical protein takes about 1-3 seconds including water molecules.

[1] Positioning hydrogen atoms by optimizing hydrogen-bond networks in protein structures
Hooft RW, Sander C, Vriend G (1996) Proteins 4, 363-376
[2] Asparagine and glutamine: using hydrogen atom contacts in the choice of sidechain amide orientation
Word et al. (1999), J. Mol. Biol. 285, 1735-1747
[3] Assignment of protonation states in proteins and ligands: combining pKa prediction with hydrogen bonding network optimization.
Krieger E, Dunbrack RL Jr, Hooft RW, Krieger B (2012), Methods Mol Biol.819, 405-421.