Endonuclease PvuII (1PVI) DNA - GATTACAGATTACA
CAP - Catabolite gene Activating Protein (1BER)
DNA - GATTACAGATTACAGATTACA Endonuclease PvuII bound to palindromic DNA recognition site CAGCTG (1PVI) DNA - GATTACAGATTACAGATTACA TBP - TATA box Binding Protein (1C9B)
CAP - Catabolite gene Activating Protein (1BER)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
TBP - TATA box Binding Protein (1C9B)
 

Supercomputers@Home

"No matter how smart the algorithms - they are always too slow to do what they are supposed to do - overnight on a desktop PC" - that's a well known experience in bioinformatics. Luckily it turns out that most of the time, a simple scale up by one or two orders of magnitude is enough to get it running (probably our brains can only think at most two orders of magnitude ahead of current computer powers..). 10-100 times faster than a desktop PC - that's already a supercomputer (and almost enough to enter the world's Top 500).

If the task parallelizes very well (which is true for most areas covered by YASARA, but certainly not for all applications of interest), then there is only one simple way to obtain a maximum performance for each dollar invested: Buying a cluster of cheap AMD Athlon or Intel Pentium PCs, linked with 100MBit network cards (fig.1). The current trend to avoid high-end workstations and supercomputers in well parallelizable fields of application reflects itself in steadily growing clusters of standard PCs, usually based on the free operating system Linux: Beowulf (NASA, 1994, 16 PCs), PaRe (Technical University of Braunschweig, 1998, 18 PCs), Mosix (University of Jerusalem, 100 PCs), CLOWN (University of Paderborn, 512 Intel and Alpha PCs) or Sandia Cplant (1.600 PCs).

But the best performance/ price ratio (i.e. infinite) can still be achieved by not buying anything at all - just abuse what is already present. Here at the CMBI, we turned our 26 Linux/ Windows NT PCs into one big distributed computing cluster, using our newly developed software package Models@Home (Fig.2-4).

In principle, the program is a screen saver that runs our jobs as soon as the computer is idle. The very general design allows to run whichever program one needs, as long as it is available for either of the two currently supported operating systems: Linux or Windows. The summed up clock-frequency of this cluster is currently around 15 GHz. It is aimed at course grained applications that can easily be split up into multiple jobs and do not require much data exchange. If you want to build your own cluster, Models@Home is available for free download.

Fortunately, most applications in structural bioinformatics are very course grained and thus ideally suited for our cluster: Model building, threading, docking, data base updates etc.

Click for large image (40kb JPEG)
Fig.1: A 128 GFLOP peak performance YASARA cluster (to be built one day..)
Click for large image (118kb JPEG)
Fig.2: The 60 GFLOP (single precision peak performance) modeling cluster at the CMBI. Since this photo has been taken,
Click for large image (117kb JPEG)
Fig.3: Eight of our 26 PCs, used for teaching purposes about 10% of the time, 90% are still available for Models@Home.
Click for large image (108kb JPEG)
Fig.4: Just another view of the left-most four computers in fig.3.