Endonuclease PvuII (1PVI) DNA - GATTACAGATTACA
CAP - Catabolite gene Activating Protein (1BER)
DNA - GATTACAGATTACAGATTACA Endonuclease PvuII bound to palindromic DNA recognition site CAGCTG (1PVI) DNA - GATTACAGATTACAGATTACA TBP - TATA box  Binding Protein (1C9B)
CAP - Catabolite gene Activating Protein (1BER)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
TBP - TATA box  Binding Protein (1C9B)
 

Building and visualizing gigastructures with YASARA

Building and displaying mesoscale all-atom models of biomolecular structures with millions or billions of atoms, like virus particles or cells, remains a challenge due to the sheer size of the data, the required levels of automated building, and the visualization limits of today's graphics hardware.

Essential for the efficient interactive visualization of gigastructures is the use of multiple levels of detail (LODs), where distant molecules are drawn with a heavily reduced polygon count. YASARA employs a grid-based algorithm[1] to create such LODs for all common molecular graphics styles (including balls&sticks, ribbons and cartoons), that do not require monochrome molecules to hide LOD transitions. As a result, you can interactively visualize giant models like the presynaptic bouton with 3.6 billion atoms shown in Figures 1 and 2 on the right side. YASARA's graphics engine is powered by the Vulkan graphics API in Windows and Linux, and by the Metal API in MacOS. To display the largest structures, the graphics card must provide more than 4 GB of memory. On old hardware of the pre-Vulkan age, YASARA uses classic OpenGL rendering, which works for everything except the gigastructures shown here.

For mesoscale systems, it is impractical to store all atoms explicitly, because their enormous number would quickly exhaust the available memory. YASARA uses two approaches to compress the data: Assembly is done with coarse-grained "pet molecules", which reduces the atom count by a factor of 50, and all-atom visualization uses GPU instancing, which reduces the memory requirements by a factor of 40 to 1000. Instancing implies that one first needs to create building blocks and then join instances (i.e. identical copies) of these blocks to construct the final model, just like Lego bricks (Figure 3).

A tricky part of mesoscale models are phospholipid membranes with embedded proteins. Due to the membranes' arbitrary shape, there is normally no exact way to construct them with instances of identical building blocks. But fortunately, approximate solutions work well in practice. You define the shape of the membrane with a mesh of mostly equilateral triangles, which can be constructed manually using 3D modeling software like Blender, or algorithmically with built-in YASARA commands to create spheres or planes, and distort them to yield a more natural appearance. Neighboring triangles are joined to rhombi and filled with instances of rhombic membrane blocks, with and without transmembrane proteins (Figure 3).

As soon as the membranes have been built, pet DNA/RNA is created directly from the FASTA sequence file. For single-stranded nucleic acids, a secondary structure assignment in dot-bracket notation can be provided. The remaining space is filled by creating a neighbor search grid and placing the pet proteins at random locations with random orientations, rejecting those that bump into existing pet atoms or are outside their compartment, as defined by the membrane polygon mesh used above. Molecular dynamics simulations of the coarse-grained model can be run if needed, in the end it is expanded back to all-atom details.

Additional information:

How YASARA creates coarse-grained models to assemble and simulate giant biomolecular scenes.

All the details described in our open access article on the topic[1].

The PetWorld database with a growing collection of mega- and gigastructures.

Detailed building instructions and infos how to get YASARA for free in return for your contribution can be found in the user manual of any YASARA stage (including the free YASARA View) if you browse to Recipes > Build a gigastructure.

R E F E R E N C E S

[1] Assembly of biomolecular gigastructures and visualization with the Vulkan graphics API
Ozvoldik K, Stockner T, Rammner B, and Krieger E (2021). Journal of Chemical Information and Modeling 61, 5293-5303

Presynapse hi resolutionPresynapse medium resolution

Figure 1: The video above shows a model of the presynaptic bouton with 3.6 billion atoms visualized interactively on a Geforce RTX 2080 card using YASARA's Vulkan real-time graphics engine. If your browser fails to play the video, here is a YouTube link with horrible quality due to technical issues at YouTube.
Space
JCIM cover
Figure 2: Closeup view of the model from Figure 1 above. This is a simple screenshot from interactive visualization on a Geforce RTX 2080 with YASARA, not an offline rendering. Click the image twice for high-resolution versions.
Space

Figure 3: The five building blocks needed to construct a model of the SARS-CoV-2 envelope, all-atom based models on top, and the corresponding coarse-grained pet molecules at the bottom (scaled up for comparability). A: Spike protein block, B: E-protein block, C: M-protein block, D: empty membrane block with the rhombic simulation cell used for equilibration, E: triangular block to close leftover holes.