Misfolded proteins have been implicated in numerous diseases. Folding@home
is biological research based upon the science of Molecular
Dynamics where molecular chemistry and mathematics are combined in computer-based models to predict how protein
molecules might fold (or misfold) in space over time. This information is used to guide scientific and medical research.
When I first heard about this, I recalled the science-fiction magnum opus by Isaac Asimov colloquially known as The Foundation Trilogy which introduced a fictional branch of science called psychohistory where statistics, history and sociology had been combined in computer-based models to
guide humanity's future in order to avoid a potential dark age. How did Asimov
conceive of such a thing? (answer: Asimov is to sci-fi as Bach is to classical music)
Years ago I became infected with an Asimov
inspired optimism about humanity's future and have since felt the need to promote Asimov's vision. While Folding@home
will not cure my "infection of optimism", I am convinced Dr.
Isaac Asimov (who received a Ph.D. in Biochemistry from Columbia in 1948 then was employed as a Professor of
Biochemistry at the Boston
University School of Medicine for 10-years until his publishing workload became too large) would have been fascinated by
something like this.
I was considering a financial charitable donation to Folding@home when it occurred to me
that my money would be better spent by making a knowledgeable charitable donation to all of
Increasing my Folding@home computations (which will advance medical discoveries to increase both the length and quality of
human life). I was already folding on a half-dozen computers so all I needed to do was purchase video graphics cards which
would increase my computational throughput by a thousand fold (three orders of magnitude).
Convincing others (like you) to follow my example. My solitary folding efforts will have little
effect on humanity's future but together we can make a real difference. (read on)
Dr. Asimov: I am dedicating this website to you and your publishing. You have greatly influenced my life.
Misfolded proteins have been associated with numerous diseases and age-related
illnesses. However, proteins are much larger and so much more complicated than smaller molecules that it is not possible to
begin a chemical experiment without first providing hints to researchers about where to look and what
to look for. Since the behavior of atoms-in-molecules (computational chemistry) as well as atoms-between-molecules (molecular dynamics) can be modeled, it makes more sense to begin with a computer analysis. Then
permitted configurations can then be passed on to experimental researchers.
From a kitchen point of view, chicken eggs are a mix of water, fat (yolk), and protein (albumen). Cooking an egg causes the
semi-clear protein to unfold into long strings which now can intertwine into a tangled network which will stiffen then
scatter light (appear white). No chemical change has occurred but taste, volume and color have been altered.
Click here to read a short "protein article" by Isaac Asimov published in 1993
shortly after his death.
Using the most powerful single core processor (CPU) available today, simulating the folding possibilities of one large protein
molecule for one millisecond of chemical time might require one million days (2737
years) of computational time. However, if the problem is sliced up then assigned to 100,000 personal computers over the internet,
the computational time drops to ten days. Convincing friends, relatives, and employers to do the same
could reduce the computational time requirement to one day or less.
required simulation time
1 billion days
(2.7 million years)
1 million days
1 thousand days
Additional information for science + technology nerds
Special-purpose research computers like IBM's BlueGene
employ 10 to 20 thousand processors (CPUs) joined by many kilometers of optical fiber to solve problems. IBM's
Roadrunner is a similar technology employing both "CPUs" and "special non-graphic GPUs that IBM refers to as cell processors"
Assuming that each GPU has 1,000 streaming processors, this leaves us with the equivalent of 60 million processors.
This means that the original million-day protein simulation problem could theoretically be
completed in (1,000,000 / 60,000,000) 0.016 days (or 23 minutes). But since there are many more
protein molecules than DNA molecules, humanity could be at this for decades. Adding your computers to Folding@home will permanently advance humanity's progress in protein
research and medicine.
When the Human Genome Project (to study
human DNA) was being planned, it was thought that the task may require 100 years. However, technological change in the areas
of computers, robotic sequencers, and the internet after the world-wide-web appeared in 1991 (to coordinate the activities of
a large number of universities where each one was assigned a small piece of the problem), allowed the human genome project to
publish results after only 15 years (a 660% increase).
Distributed computing projects like Folding@home and BOINC have only been possible since 1995:
the world-wide-web (proposed in 1989 to
solve a document sharing problem among scientists at CERN in Geneva;
then implemented in 1991) began to make the internet both popular and ubiquitous.
CISC was replaced with RISC which further evolved to superscalar RISC then multicore
Vector processing became ubiquitous (primarily) in the form of
Processor technology was traditionally defined like this:
Google released a neat math library in 2015 called TensorFlow
technological speed up
while it is possible to do floating point (FP) math on integer-only CPUs, adding specialized logic to
support FP and transcendental math can decrease FP processing time by one order of magnitude (x10) or more.
similarly, while it is possible to do vector processing (VP) on a scalar machine, adding specialized logic
can decrease VP processing time by 2 to 3 orders of magnitude (x100 to x1000).
GPU (graphics programming units) take vector processing to a whole new level. Why? A $200.00
graphics card now equip your system with 1500-2000 streaming processors and 2-4 GB of additional high speed
memory. According to the 2013 book "CUDA Programming", the author provides evidence why any modern
high-powered PC equipped with one, or more (if your motherboard supports it), graphics cards can outperform any
supercomputer listed 12 years ago on www.top500.org
Many companies manufactured graphics cards (I recall seeing them available as purchase options in the IBM-PC
back in 1981) but I will only mention two companies here
introduces the Tesla line in 2007; these pure-math video cards have no video connector so cannot be
connected to a monitor
CUDA is released in 2007
The circle of life?
specialized mainframe computers from companies like IBM and Cray are built to host many thousands of
"non-video video cards" (originally targeted for PCs and work stations). IBM's
Roadrunner is one example.
Both the PlayStation 4 as well as the XBOX One employ an 8-core APU (Accelerated
Processing Unit) made by AMD code-named
Jaguar. What is an APU? It is a multi-core CPU with an embedded Graphics Chip Engine. Placing both systems
on the same silicon die eliminates the signal delay associated with sending signals over an external bus.
I've been in the computer industry for decades but noticed that computers only began to get real
interesting again with the releases of CUDA (2007) and OpenCL (2009)
Distributed computing projects like Folding@home and BOINC
have only been practical since 2005 when the CPUs in personal computers began to out-perform mini-computers and enterprise
servers. Partly because...
AMD added 64-bit support to their x86 processor technology calling it x86-64.(Linux
distros still refer to this a 686)
Intel followed suit calling their 64-bit extension technology EM64T
DDR2 memory became popular (this dynamic memory is capable of
Intel added DDR2 support to their Pentium 4 processor line
AMD added DDR2 support to their Athlon 64 processor line
DDR3 memory became popular (this dynamic memory is capable of
Since then, the following list of technological improvements has made computers both faster while less expensive:
Intel's abandonment of NetBurst which meant a return to shorter
instruction pipelines starting with Core2 comment: AMD never went to longer pipelines; a long pipeline is only efficient when running a
static CPU benchmark for marketing purposes - not running code in real-world where i/o events interrupt the primary
foreground task (science in our case)
multi-core (each core is a fully functional CPU) chips from all
continued development of optional graphic cards where CPUs would off-load much work to a graphics co-processor system
(each card appeared as hundred to thousands streaming processors)
ATI Radeon graphics cards (ATI was acquired by AMD in 2009)
NVIDIA GeForce graphics cards
development of high performance "graphics" memory technology (e.g. GDDR3
, GDDR4 , GDDR5)
to bypass processing stalls caused when streaming processors are too fast.
Note that GDDR5 is used a main memory in the PlayStation 4
(PS4). While standalone PCs were built to host an optional graphics card, it seems that Sony has flipped things so
that their graphics system is hosting an 8-core CPU. These hybrids go by the name APU.
shifting analysis from host CPU cores (usually 2-4) to thousands of streaming processors
HP preferred Itanium2 (jointly developed by HP
and Intel) so announced their intention to gracefully shut down Alpha
Alpha technology (which included CSI) was immediately sold to Intel
approximately 300 Alpha engineers were transferred to Intel between 2002 and 2004
CSI morphed into QPI (some industry watchers say that Intel ignored CSI until the announcement by AMD to go with a
new industry-supported technology known as HyperTransport
The remainder of the industry went with a non-proprietary technology called HyperTransport which has been described as a multi-point Ethernet for use within a computer system.
As is true in any "demand vs. supply" scenario, most consumers didn't need the additional computing power which meant that
chip manufacturers had to drop their prices just to keep the computing marketplace moving. This was good news for people
setting up "folding farms". Something similar is happening today with computer systems since John-q-public is shifting
from "towers and desktops" to "laptops and pads". This is causing the price of towers and graphics cards to plummet ever
lower. You just can't beat the price-performance ratio of an Core-i7 motherboard hosting an NVIDIA graphics card.
Shifting from brute-force "Chemical Equilibrium" algorithms to techniques involving Bayesian statistics and Markov
Models will enable some exponential speedups.
Computational Chemistry Student Questions:
Using information from the periodic
table of the elements you can see that the molecular
mass of water (H2O) is ~18 which is lighter than many gases so why is water in a liquid state at room
temperature while other slightly heavier molecules take the form of a gas?
Ethanol (a liquid) has one more atom of Oxygen than Ethane (a gas). How can this small difference change the state?
This diagram depicts an
H2O molecule loosely
bound to four others by
Van der Walls forces
State at Room
In the case of an H20 (water) molecule, even though two hydrogen atoms are covalently bound to one oxygen atom, those same hydrogen atoms are also attracted to each other
which causes the water molecule to bend into a Y shape (according to
VSEPR Theory). At the mid-point of the bend, a positive electrical charge from the oxygen atom is exposed to the
world which allows a weak connection to the hydrogen atom of a neighboring H20 molecule (water molecules weakly
sticking to each other form a liquid). These weak connections are called Van der Waals forces
Here are the molecular schematic diagrams of Ethane (symmetrical) and Ethanol (asymmetrical). Notice that
Oxygen-Hydrogen kink dangling to the right of Ethanol? That kink is not much different than a similar one associated with
water. That is the location where a Van der Waal force weakly connects with an adjacent ethanol molecule (not shown). So
it should be no surprise that ethane at STP (Standard Temperature and Pressure) is a gas while Ethanol is a liquid.
H H H H H
| | | | /
| | | |
H H H H
Van der Waals did all his computations using pencil and paper long before the first computer was
invented; and this was only possible because the molecules in question were small and few.
Chemistry Caveat: The Molecular Table above was only meant to get you thinking. Now inspect this LARGER
periodic table of the elements where the color
of the atomic number indicates whether the natural state is solid or gaseous:
all elements in column 1 (except hydrogen) are naturally solid
all elements in column 8 (helium to radon) are naturally gaseous
half the elements in row 2 starting with Lithium (atomic number 3) and ending with Carbon (atomic number 6), as well as
two thirds of row 3 starting with Sodium (atomic number 11) and ending with Sulphur (atomic number 16), are naturally solid
I will leave it to you to determine why
Proteins come in many shapes and sizes. Here is a very short list:
technology problems, discussions, news, science, etc.
FAH Targeted Diseases
This "folding knowledge" will be used to develop new drugs for treating diseases such as:
ALS ("Amyotrophic Lateral Sclerosis" a.k.a. "Lou Gehrig's Disease")
Plaques, which contain misfolded peptides called amyloid beta, are formed in the brain many years before the signs of
this disease are observed. Together, these plaques and neurofibrillary tangles form the pathological hallmarks of the
Cancer & p53
P53 is the suicide gene involved in apoptosis (programmed cell death - something
necessary in order your immune system to kill cancer cells)
CJD (Creutzfeldt-Jakob Disease)
the human variation of mad cow disease
Huntington's disease is caused by a trinucleotide repeat expansion in the Huntingtin (Htt) gene and is one of
several polyglutamine (or PolyQ) diseases. This expansion produces an altered form of the Htt protein, mutant Huntingtin (mHtt),
which results in neuronal cell death in select areas of the brain. Huntington's disease is a terminal illness.
Normal bone growth is a yin-yang balance between osteoclasts and oseteoblasts.
Osteogenesis Imperfecta occurs when bone grows without sufficient or healthy collagen
The mechanism by which the brain cells in Parkinson's are lost may consist of an abnormal accumulation of the protein
alpha-synuclein bound to ubiquitin in the damaged cells.
Ribosome & antibiotics
A ribosome is a protein producing organelle found inside each cell
A single core Pentium-class CPU can provide one streaming (vector) processor under
marketing names like MMX and SSE
(most Core i5 and Core i7 CPU's offer four cores so can support four streaming processors)
A single add-on graphics card can provide several thousand vector processors
Computer technology section moved here
caveat: GPU folding on CentOS-7 failed 2021-12-xx so jump here to see a better way
I found two junk PCs in my basement with 64-bit CPUs that were running 32-bit operating systems (Windows-XP and Windows-Vista).
Unfortunately for me, neither were eligible for Microsoft's free upgrade to Windows-10; and I had no intention of buying a new
64-bit OS just for this. So I swapped out Windows with CentOS-7 and was able to get each one folding with very little difficulty.
Here are some tips for people who are not Linux gurus:
CentOS-7.7 and CentOS-8.0 were released days apart in September 2019 (perhaps due to the
invisible hand of IBM?)
Software from the top of download page preferentially offer CentOS-8 which is too large (> 4.7 GB) to write to a
single-layer writable DVD (but I have has some success with Dual Layer media). They think you will copy these images to a
USB stick but the BIOS in older PCs many times will not support booting from a stick
Using smaller downloads:
1) choose file "minimal" which is always smaller than 4.7 GB 2) burn, boot, install Linux using recipe: "Server" 3) reboot; now update via the internet like so: yum update 4) reboot; optionally enable GUI: yum group "Server with GUI" install may also need to type: systemctl isolate graphical.target systemctl set-default graphical.target
5) add development tools: yum group "Development Tools" install
Transfer to bootable media (choose one of the following)
copy the ISO image to a DVD-writer
use rufus to format an USB stick (capacity must be >= 5 GB) then copy the ISO
image to the USB caveat: PCs have transitioned from BIOS to UEFI. Older BIOS-based systems do not support booting
from a USB stick (strange because you can connect a USB-based DVD-drive then boot from that)
Boot-install CentOS-7 on the 64-bit CPU
pick and install a Linux recipe that supports a GUI (I usually choose Server with GUI and always
include development tools) in case I need to build a driver
If prompted to choose between the 'gnome' and 'kde' newbies should choose gnome
My machines hosted NVIDIA graphics cards (GTX-560 and GTX-960 respectively) so these systems required the correct NVIDIA
drivers in order to do GPU-based folding. Why? The generic drivers only support video but we also need OpenCL and CUDA
If you are not logged in as root then you must prefix every command with sudo (Super
Now jump to Linux common
Folding with Rocky Linux (2022)
Both of my CentOS-7 machines stopped folding 2021-12-xx. Apparently this is due to several changes by the FAH
First off, their new downloadable cores require a newer version of library file glibc which is not available on
CentOS-7 so you need to upgrade to CentOS-8 (or change to something else)
Secondly, changes to their GPU core now require a minimum of OpenCL-1.2. This means that my GTX-560 is no longer
useful as a streaming processor. I noticed another blurb about double-precision FP math which definitely rules out my
For the GTX-960 I replaced CentOS-7 with CentOS-8 then followed the CentOS-7 instructions (just above here). However, the
Nvidia driver from elrepo did not contain any OpenCL support so I was forced to install the Linux driver provided by Nvidia
(this is guru stuff because you first need to disable the Nouveau driver).
On my second system I replaced the GTX-560 with a GTX-1650
I do not why these CentOS-8 machines seem sluggish so I replaced CentOS-8 with Rocky Linux 8.5
Updating to a NVIDIA driver on CentOS-8 (or Rocky Linux-8 or RHEL-8) for a GTX-960 card
after the initial Linux install, type "sudo yum update" to bring the platform up to the latest level. If a new
kernel was installed then you must reboot
DO NOT use elrepo to update the Nvidia driver (on 2022-01-16 it was missing support for OpenCL and CUDA)
download the desired driver file from NVIDIA into the root account
for GTX960: NVIDIA-Linux-x86_64-470.86.run
for GTX1650: NVIDIA-Linux-x86_64-525.89.run
Disable the Nouveau driver
create file /etc/modprobe.d/blacklist-nouveau.conf containing these two lines:
options nouveau modeset=0
create a new ramdisk for use during system boot: dracut --force
Install the NVIDIA driver
yum group "Development Tools" install
chmod 777 NVIDIA-Linux-x86_64-470.86.run
kernel updates via "yum update" (CentOS-7) or "dnf update" (CentOS-8) always require a reboot. Ninety
percent of the time you will need to to reinstall the Nvidia driver after a kernel update (just repeat step 3).
If your console is blank then type this three-key-combo CTL-ALT-F3 (control alternate F3) then log in as
caveat: As of this writing, your CentOS-7 system most likely depends upon some version of Python2. Python3 can be
easily added to the system but do not remove or disable Python2 because this will break certain system utilities like
yum or firewall-cmd to name two of many.
starting the client with the --configure switch will generate an XML configuration file
starting the client with the --config switch will let you test an XML configuration file
starting the client with the --help switch will display more help than you ever dreamed
Caveat: just installing the FAH-Client will cause it to be installed as a service then start CPU
folding (which is probably what you do not want). If you want to enable GPU-based folding then you will need to stop the
client, modify the config file, test the config file, then restart the client. Here are some commands to help out.
sudo /etc/init.d/FAHClient stop
sudo systemctl stop fahclient
note: GPU folding requires OpenCL-1.2 so if you see errors like
'cannot find OpenCL' then you might need to rebuild your NVIDIA
driver (almost always required after any yum/dnf update that
changes the kernel
end interactive test
start the service
sudo /etc/init.d/FAHClient start &
top -d 0.5
tail -40 /var/lib/fahclient/logs.txt
Stopped services may only be deleted from DOS like so:
sc query neil369
sc delete neil369
BOINC (Berkeley Open Infrastructure for Network Computing)
BOINC (Berkeley Open Infrastructure for Network Computing) is a science framework
in which you can support one, or more, projects of choice.
If you are unable to pick a single cause then pick several because the BOINC manager will switch between science clients
every hour (this interval is adjustable). In my case I actively support POEM, Rosetta, and Docking.
http://boinc.bakerlab.org/rosetta/ is the home of Rosetta@home
which operates through the BOINC framework. Their graphics screen-saver is one very effective way to help visualize "what
molecular dynamics is all about". All science teachers must show this to their students.
I'm sure everyone already knows that a computer "rendering beautiful graphical displays" is doing less science. That
said, humans are visual creatures and graphical displays have their place in our society. Except for some public
locations, all clients should be running in non-graphical mode so that more system resources are diverted to protein
Five questions for Rosetta@home: How Rosetta@home helps cure cancer, AIDS, Alzheimer's, and more
some people may prefer to use the generic BOINC client from Berkley then install the WCG plugin from that application;
you will still need to create your WCG account at the WCG site
You only need to do this if you want to cycle your BOINC client between multiple projects of which WCG is just one
If you only want to run the WCG project (which also switches between IBM sponsored science projects) then it probably
makes more sense to use the WCG-specific client
https://en.wikipedia.org/wiki/World_community_grid (WCG) is
an effort to create the world's largest public computing grid to tackle scientific research projects that benefit humanity.
Launched 2004-11-16, it is funded and operated by IBM with client software currently available for Windows, Linux,
Mac-OS-X and FreeBSD operating systems. They encourage their employees and customers to do the same.
Personal Comment: I wonder why HP (Hewlett-Packard) has not followed IBM's lead. Up until now I
always thought of IBM as the template of uber-capitalism but it seems that the title of "king of profit by the elimination of
seemingly superfluous expenses" goes to HP. Don't they realize that IBM's effort in this area is done under IBM's advertising
budgets? Just like IBM's 1990s foray into chess playing systems (e.g. Deep Blue) led to increased sales as well as share prices,
one day IBM will be able to say "IBM contributed to a treatments for human diseases including cancer". IBM actions in this area
reinforce the public's association with IBM and information processing.
The Encyclopedia of DNA Elements (ENCODE) Consortium is an
international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human
genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in
which a gene is active.
The Encyclopedia of DNA Elements (ENCODE) is a public research consortium launched by the US National Human Genome Research Institute (NHGRI) in September 2003.
The goal is to find all functional elements in the human genome, one of the most critical projects by NHGRI after it completed
the successful Human Genome
Project. All data generated in the course of the project will be released rapidly into public databases.
On 5 September 2012, initial results of the project were released in a coordinated set of 30 papers published in the
journals Nature (6 publications), Genome Biology (18 papers) and Genome Research
(6 papers). These publications combine to show that approximately 20% of noncoding DNA in the human genome is functional while an additional 60% is transcribed with no
known function. Much of this functional non-coding DNA is involved in the regulation of the
expression of coding genes. Furthermore the expression of each coding gene is controlled by multiple regulatory sites
located both near and distant from the gene. These results demonstrate that gene regulation is far more complex than
http://www.technologyreview.com/view/510571/the-million-core-problem/ The Million-Core Problem - Stanford
researchers break a supercomputing barrier.
quote: A team of Stanford researchers have broken a record in supercomputing, using a million cores to model a complex
fluid dynamics problem. The computer is a newly installed Sequioa IBM Bluegene/Q system at the Lawrence Livermore National
Laboratories. Sequoia has 1,572,864 processors, reports Andrew Myers of Stanford Engineering, and 1.6 petabytes of memory.