Folding@home is biological research
based upon the science of
where molecular chemistry and mathematics are combined in computer-based models
to predict how protein molecules might fold (or misfold) in three spatial dimensions
When I first heard about this, I recalled Isaac Asimov's sci-fi magnum opus
colloquially known as
The Foundation Trilogy
which introduced a fictional branch of science called
where statistics, history and sociology are combined in computer-based models
to predict humanity's future.
the real computer-based data methodology used by
Cambridge Analytica to convince a fraction of American Facebook users to vote
differently (or stay home) during the 2016 Presidential Elections.
AggregateIQ did the same thing to influence British voters during the
Years ago I became infected with
inspired optimism about humanity's future and have since felt the need to
contribute to it. While Folding@home will not cure my "infection of optimism",
I am convinced Dr. Asimov
(who received a Ph.D. in Biochemistry from Columbia in 1948 then was employed
as a Professor of Biochemistry at the Boston School of Medicine until 1958
the his publishing workload became too large) would have
been fascinated by something like this.
Dr. Asimov, I'm computing these protein folding sequences in memory of you,
and your work.
I was considering a financial charitable donation
to Folding@home when it occurred to me that my money would be better spent
Making a knowledgeable charitable donation to all of humanity by
increasing my Folding@home computations (which will advance medical
discoveries along with associated pharmaceutical treatments thus lengthening
human life). I was already folding on a half-dozen computers anyway
so all I needed to do was purchase used video cards on eBay.
Convincing others (like you) to follow my
example. My solitary folding efforts will have little effect on humanity's
future. Together we can make a real difference.
Misfolded proteins have been associated with numerous diseases and age-related
illnesses. However, proteins are so much more larger and complicated than smaller
molecules that it is not possible to begin a chemical experiment without first
providing hints to researchers about where to look and
what to look for. Since the behavior of atoms-in-molecules
Chemistry) as well as atoms-between-molecules (Molecular
Dynamics) can be modeled, it makes more sense to begin with a computer analysis.
Then permitted configurations can then be passed on to experimental researchers.
Cooking an egg causes the clear protein
(albumen) to unfold into long strings which now can intertwine into a tangled
network which will stiffen then scatter light (appear white). No chemical change
has occurred but taste, volume and color have been altered.
here to read a short "protein article" by Isaac Asimov
published in 1993 shortly after his death.
Using the most powerful single core processor (CPU) available today, simulating
the folding possibilities of one large protein molecule for one
millisecond of chemical time might require one million days
(2737 years) of computational time. However (and this is where you come in), if
the problem is sliced up then assigned to 100,000 personal computers over the internet,
the computational time would drop to ten days. Convincing
friends, relatives, and employers to do the same will reduce the computational
time requirement further.
chemical time in nature
1 million computers
1 S (1.0 seconds)
1 mS (0.001 seconds)
1 uS (0.000001 seconds)
Additional information for techies,
hackers and science buffs
Special-purpose research computers like IBM's
BlueGene employ 10,000
to 20,000 processors (CPUs) joined by many kilometers of
optical fiber to do
solve problems. IBM's
Roadrunner is a similar technology employing both "CPUs" and "special non-graphic
GPUs called cell
As of April 2018, the Folding@home
project consists of
86,773 active CPU
platforms (some hosting quad-core CPUs, some hosting GPUs) which is equivalent
to 8,745,675 processors
capable of 84,037 TeraFLOPS (84 PetaFLOPS). This means that the
million-day protein simulation problem could theoretically
be completed in (1,000,000/8,745,675)
0.11 days. But since there are many more protein molecules than DNA molecules,
humanity could be at this for decades. Adding your computers to
permanently advance humanity's progress in protein
These numbers were previously much higher before a large fraction of society
shifted from PCs to tablets, pads and phones. Around the same time, lower end
PCs dropped costs my providing cheaper embedded graphic chips while dropping
video expansion slots. Does this mean that only higher power PCs and gaming
rigs will be contributing to distributed computing projects? Perhaps.
When the Human
Genome Project (to study human DNA) was being planned, it was thought that
the task may require 100 years. However, technological change in the areas of
computers, robotic sequencers, and use of the internet to coordinate the activities
of a large number of universities (each assigned a small piece of the problem),
allowed the human genome project to publish results after only 15 years. A
660% increase in speed.
Distributed computing projects like
BOINC have only been possible since
1995 when the world-wide-web
(which was first proposed in 1989 to solve a document sharing problem amongst
scientists at CERN in Geneva) began to make
the internet both popular and ubiquitous.
Processor technology was traditionally defined like this:
Taxonomy for definitions like
SIMD (but remember that
"Data" represents "Data stream") Caveat: this list purposely omits things
like SMP (symmetric multiprocessing) and VAX Clusters
Then CISC and RISC vendors began to add vector processing instructions
to their processor chips which blurred everything:
vector processing (also known as matrix processing) usually involves
only two data points
anything higher than two data points is usually referred to as tensor
while it is possible to do floating point math on integer only hardware,
floating point hardware can speed up floating point math by an order
of magnitude or more. Likewise, you do not need special hardware to
compute vectors or tensors but certain applications (climate models,
artificial intelligence, triple-A video games etc.) demand it.
Minicomputer / Workstation
DEC adds vector processing capabilities to their
DEC adds optional vector processing to
400 (called VAXvector)
But GPU (graphics programming units) take vector processing to a whole new level.
Why? A $200.00 graphics card now equip your system
with 1500-2000 streaming processors and 2-4 GB of additional high speed memory.
According to the 2013 book "CUDA Programming", the author provides
evidence why any modern high-powered PC equipped with one, or more (if your
motherboard supports it), graphics cards can outperform any supercomputer listed
12 years ago on www.top500.org
AMD will manufacture an 8-core APU in 2013 which will be targeted
at Sony's PS4 (PlayStation 4) and Microsoft's XBOX-One (a.k.a. XBOX-720).
I've been in the computer hardware-software business for a long while now but
can confirm that computers have only started to get real interesting
again this side of 2007 with the releases of CUDA, OpenCL, etc.
Distributed computing projects like
BOINC have only been practical since
2005 when the CPUs in personal computers began to out-perform mini-computers
and enterprise servers. This was partly because...
AMD added 64-bit support to their x86 processor technology calling it
Intel followed suit calling their 64-bit extension technology
Since then, the following list of technological improvements has only made
computers both faster and cheaper:
core is a fully functional CPU) chips from all manufacturers
shifting analysis from each CPU core into multiple (hundreds to thousands)
streaming processors found in high-end graphics cards
ATI (now AMD) Radeon graphics cards
NVIDIA GeForce graphics cards
development of high performance "graphics" memory technology (e.g.
GDDR5) to bypass processing
stalls caused when processors are too fast. Note that GDDR5 will represent
main memory in the not-yet-release
Intel's abandonment of
NetBurst which meant
a return to shorter instruction pipelines starting with
Core2 Comment: AMD never went to longer pipelines; a long
pipeline is only efficient when running a static CPU benchmark for marketing
purposes - not running code in real-world where i/o events interrupt the
primary foreground task (science in our case)
HP preferred Itanium2
(jointly developed by HP and Intel) so announced their intention to
gracefully shut down Alpha (it would take more than a year to boot OpenVMS
on Itanium2 and another year for big-system qualification tests)
Alpha technology (which included CSI) was immediately sold to Intel
approximately 300 Alpha engineers were transferred to Intel between
2002 and 2004
CSI morphed into QPI (some industry watchers say that Intel ignored
CSI until the announcement by AMD to go with the industry-supported
technology known as HyperTransport
The remainder of the industry went with a non-proprietary technology
which has been described as a multipoint Ethernet for use within a computer
As is true in any "demand vs. supply" scenario, most consumers didn't need
the additional computing power which meant that chip manufacturers had to drop
their prices just to keep the computing marketplace moving. This was good news
for people setting up "folding farms". Something similar is happening
today with computer systems since John-q-public is shifting from "towers and
desktops" to "laptops and pads". This is causing the price of towers and graphics
cards to plummet ever lower. You just can't beat the price-performance ratio
of an Core-i7 motherboard hosting an NVIDIA graphics card.
(prediction: laptops and pads will never ever be able
to fold as well as a tower; towers will always be around in some form; low form-factor
desktops might become extinct)
Shifting from brute-force "Chemical Equilibrium" algorithms to techniques
statistics and Markov
Models will enable some exponential speedups.
Liquid Water This diagram depicts an H2O
molecule loosely connected to four others
After perusing the
table of the elements for a moment you will soon realize that the
of water (H2O) is ~18 while the molecular mass of oxygen (O2)
is ~32, carbon dioxide (CO2) is ~44 and ozone (O3)
is ~48. So why is H20 in a liquid state at room temperature while
other slightly heavier molecules take the form of a gas?
Ethanol (a liquid) has one more atom of Oxygen than Ethane (a gas).
How can this small difference change the state?
State at Room Temperature
In the case of an H20 molecule, even though two hydrogen
atoms are electrically bound to one oxygen atom, the same hydrogen atoms
are also attracted to each other and this causes the water molecule to bend
into a Y shape. At the mid-point of the bend, an electrical charge from
the oxygen atom is exposed to the world which allows a weak connection to
the hydrogen atom of a neighboring H20 molecule (water molecules
weakly sticking to each other form a liquid). These weak connections are
called Van der
Here are the molecular schematic diagrams of Ethane (symmetrical) and
Ethanol (asymmetrical). Notice that Oxygen-Hydrogen kink dangling to the
right of Ethanol? That kink is not much different than a similar one associated
with water. That is the where a Van der Waal force weakly connects with
an adjacent ethanol molecule (not shown)
H H H H H
| | | | /
| | | |
H H H H
Van der Waals did all his computations with pencil and paper long before the
computer was invented and this was only possible because the molecules in question
were small and few.
Chemistry Caveat: The
Molecular Table above was only meant to get you thinking. Now inspect this LARGER
table of the elements where the color of the atomic number indicates whether
solid or gaseous:
all elements in column 1 (except hydrogen) are naturally solid
all elements in column 8 (helium to radon) are naturally gaseous
half the elements in row 2 starting with Lithium (atomic number 3) and
ending with Carbon (atomic number 6), as well as two thirds of row 3
starting with Sodium (atomic number 11) and ending with Sulphur (atomic
number 16), are naturally solid
I will leave it to you to determine why
Proteins come in many shapes
and sizes. Here is a very short list:
This "folding knowledge" will be used to develop new drugs for treating diseases
ALS ("Amyotrophic Lateral Sclerosis" a.k.a. "Lou Gehrig's Disease")
Plaques, which contain misfolded peptides called amyloid beta, are formed
in the brain many years before the signs of this disease are observed. Together,
these plaques and neurofibrillary tangles form the pathological hallmarks
of the disease
Cancer & p53
P53 is the suicide gene involved in
apoptosis (programmed cell death - something necessary in order your immune
system to kill cancer cells)
CJD (Creutzfeldt-Jakob Disease)
the human variation of mad cow disease
Huntington's disease is caused by a trinucleotide repeat expansion in
the Huntingtin (Htt) gene and is one of several polyglutamine (or
PolyQ) diseases. This expansion produces an altered form of the Htt protein,
mutant Huntingtin (mHtt), which results in neuronal cell death in
select areas of the brain. Huntington's disease is a terminal illness.
Normal bone growth is a yin-yang balance
between osteoclasts and oseteoblasts. Osteogenesis Imperfecta occurs when
bone grows without sufficient or healthy collagen
The mechanism by which the brain cells in Parkinson's are lost may consist
of an abnormal accumulation of the protein alpha-synuclein bound to ubiquitin
in the damaged cells.
Ribosome & antibiotics
A ribosome is a protein producing organelle
found inside each cell
Executive Summary: a single core Pentium-class CPU can
provide one streaming (vector) processor under marketing
names like MMX and SSE. A single graphics card can provide hundreds
Modern computers can do 3d graphics two different ways:
in software using a general purpose CPU (central processing unit)
like Intel's Pentium or AMD's Athlon
in specialized hardware using a special purpose GPU (graphics processing
unit) like those found in:
NVIDIA graphics cards
AMD graphics cards (AMD acquired ATI in 2006)
seventh-generation gaming consoles
Sony's PS3 (PlayStation 3) system which can achieve speeds
of 100 GigaFLOPS per console
Microsoft's XBOX-360 game console
eighth generation gaming consoles
Both the PS4 as well as the XBOX One employ an 8-core APU
made by AMD code-named
Question: What is an APU?
Accelerated Processing Unit which is a multi-core CPU with
an embedded Graphics Chip Engine. Placing both systems on the
same silicon die eliminates the signal delay associated with
sending electrons off-chip (they would need to be transmitted
on one side then received on the other)
Scalar vs. Vector
CPUs (central processing units) are scalar processors which execute
RISC processors can exploit certain kinds of instruction-level parallelism.
In some cases they can execute instructions out-of-order.
Modern processors (CISC and RISC) also support SIMD (single instruction
- multiple data) technology for certain applications involving DSP (digital
signal processing) or multi-media.
In the Intel world, SIMD technology goes by the name MMX/SSE/SSE2,
GPUs (graphics programming units) are vector processors which
easily execute parallel operations
AMD/ATI cards typically support anywhere between 800 and 200 streaming
processors (typically labeled "unified shaders")
NVIDIA cards typically support fewer streaming processors but seem
to be able to utilize them more efficiently
Since graphics cards have their own large memory systems, they should
be thought of as a private computer system within your computer.
Historical chart showing the rapid evolution of
video card technology (2006-2012)
Radeon x1950 Pro
8 : 36
Vertex : Pixel
36 pixel shaders
Radeon x1950 XTX
8 : 48
Vertex : Pixel
48 pixel shaders
unified (programmable) shaders
unified (programmable) shaders 800/36 = 22 fold increase in only
two years (growth is currently exceeding Moore's law)
synchronized with delayed release of
Windows 7 1600/800
= 2.0 fold increase in one year (growth is currently exceeding Moore's
GeForce GTX 560
This is the most powerful folder in my collection. Notice the
lower (compared to AMD/ATI) number of shaders. Performance is due
to due to architectural differences
Note: AMD acquired ATI in 2006 but continued to
use the ATI name into 2009
Using your NVIDIA graphics card to do protein-folding science
My Personal Experience Doing GPU-based Science:
I now run a mixture of systems employing graphics cards from both AMD and
The HD-6670 from AMD
The GTX-560 from NVIDIA
I was forced to buy these cards when AMD removed OpenCL support
from their Windows-XP device driver in the Spring of 2012
The price of GTX-560 is approximately twice that of the
HD-6670 but appears to be
doing 9-10 times more science.
It appears that the best NVIDIA bang-for-the buck comes from a card with
model prefix of GTX and a model number ending in
In 2016 many machines were unable to get work units for GTX-560 on 32-bit
versions of Windows-XP (Huh? I thought the GPU did all the work). Here is what
Stanford published on 2016-07-03:
FAH tends to push the limits of science and that means that some things
can no longer be done with Windows-XP or with 32-bit
CPUs. At some point all new projects will
require 64-bit and all new projects will require Windows7 or above.
The studies of "easy" proteins have been or soon will be completed. I can't
predict when that will happen and I doubt anybody else can.
So it probably makes little sense to continue working with 32-bit OSs. If your
hardware is 64-bit capable you might wish to shift to a 64-bit version of Linux
As of 2016 I now recommend the GTX-960 (or any NVIDIA card
ending in 60)
Using your AMD graphics card to do protein-folding science
Note: AMD acquired ATI in 2006 but continued to use the ATI name into 2009
AMD related problems
Time and technology never stand still and this applies to graphics cards.
You can imagine the difficulty researchers experience while attempting to
keep up with the continual introduction of new products from hardware manufacturers.
So for the past half-decade the computer industry as been working on heterogeneous
etc.) for doing science on graphics cards. Stanford Folding Software requires
OpenCL (Open Computing Language) not to be confused with
OpenGL (Open Graphics Library).
Announcement: Stanford to drop GPU2 cards made
March-2012 :: Stanford University announced
their intention to drop AMD/ATI GPU2 cards in September, 2012
I found two junker 64-bit PCs (with decent graphic cards) in my basement
For some reason I still do not understand, two PCs in my basement employed 64-bit
CPUs but were running 32-bit operating systems: Windows-XP and Windows-Vista.
Unfortunately for me, neither were eligible for the free upgrade to Windows-10
and I had no intention of buying a new 64-bit OS just to fold. So I replaced these
old Windows instances with CentOS-7.3
and was able to get each one folding with very little difficulty. Here are some
tips for people who are not Linux gurus:
2019 caveat: CentOS-7.7 and CentOS-8.0 were
released days apart in September 2019 (does this have anything to do
with IBM being the new owner?)
The link above will download CentOS-8.0 which is not ready for
general use. On top of that, it is 7-GB in size which means you will
only be able to install it via a double-layer DVD or USB
Be sure to only download CentOS-7.7 until CentOS-8.1 is finally
released sometime in 2020.
Transfer to bootable media
copy the ISO image to a DVD-writer -or-
use rufus to format an USB stick
(capacity must be >= 5 GB) then copy the ISO image to the USB caveat: older PC's do not support booting from a USB stick (weird
because you can connect a USB-connected DVD)
Boot/install CentOS-7 on the 64-bit CPU
pick and install a Linux recipe that supports a GUI (I usually
choose Server with GUI)
My machines hosted NVIDIA graphics cards (GTX-560 and GTX-960 respectively)
so these systems required correct NVIDIA drivers in order to do GPU-based folding.
Why? you will need CUDA and/or OpenCL software which is not loaded on the
graphics card firmware.
HARD WAY: If you are a Linux guru and don't mind
trying to disable the Nouveau driver first, then install the 64-bit
Linux driver provided by NVIDIA here:
(be advised that you will need to do this every time you update your Linux
If you are not logged in as root then you must prepend every command with
User DO) but this also means you must
be part of the wheel group. If that does not work for you then you will
need to resort to su root.
caveat: As of this writing, your Linux system most likely
depends upon some version of Python2. Python3 can be added to the system IF YOU ARE
CAREFUL but do not remove or disable Python2 because this will break
certain system utilities like yum or firewall-cmd
to name two of many.
Anyway, after an hour of work I've got two HP Presario PCs doing GPU-based folding
Folding@work :: Backroom Servers (no graphics card)
We've got maintenance spare servers HP servers (one DL385-G7 with 12-cores;
one DL385p-gen8 with 24-cores) doing nothing
other than acting as a hot-standby machines. Both were already running CentOS-7 installed so
I installed an SMP version of folding-at-home in order to keep the cores busy.
Here is the config file I used:
with power=full it might be better to only use
half, or three-quarters, of the available CPU cores
Folding@work :: problems with the SMP client
I've got two machines where the SMP client runs once then stops. These machines
have two 12-core AMD processors and 132 GB of RAM.
When the clients fail I see life-line errors (whatever that is) in the logs indicating that software was
having difficulty synchronizing these cores as one single team of horses.
Not sure if this is related to "an incompatibility between CentOS-7.4 and the HP
hardware" or something else (I have been told that situations like this indicate
that the AMD CPUs do not work as well as Intel CPUs). Anyway, the following
non-SMP config allows me to use those machines for protein folding. I left
4-cores unused in case Linux needs it for some reason.
Stopped services may only be deleted from DOS like so:
sc query neil369
sc delete neil369
BOINC (Berkeley Open Infrastructure for Network Computing)
BOINC (Berkeley Open Infrastructure
for Network Computing) is a science framework in which you can support one,
or more, projects of choice.
If you are unable to pick a single cause then pick several because the
BOINC manager will switch between science clients every hour (this interval
is adjustable). In my case I actively support POEM, Rosetta, and Docking.
is the home of Rosetta@home
which operates through the BOINC framework. Their graphics screen-saver is one
very effective way to help visualize "what molecular dynamics is all about".
All science teachers must show this to their students.
I'm sure everyone already knows that a computer "rendering beautiful
graphical displays" is doing less science. That said, humans are visual
creatures and graphical displays have their place in our society. Except
for some public locations, all clients should be running in non-graphical
mode so that more system resources are diverted to protein analysis.
Five questions for Rosetta@home: How Rosetta@home helps cure cancer, AIDS,
Alzheimer's, and more
some people may prefer to use the generic BOINC client from Berkley
then install the WCG plugin from that application; you will still need to
create your WCG account at the WCG site
You only need to do this if you want to cycle your BOINC client between
multiple projects of which WCG is just one
If you only want to run the WCG project (which also switches between
IBM sponsored science projects) then it probably makes more sense to use
the WCG-specific client
(WCG) is an effort to create the world's largest public computing grid to tackle
scientific research projects that benefit humanity. Launched 2004-11-16, it
is funded and operated by IBM with client software currently available
for Windows, Linux, Mac-OS-X and FreeBSD operating systems. They encourage their
employees and customers to do the same.
Personal Comment: I wonder why HP (Hewlett-Packard) has not
followed IBM's lead. Up until now I always thought of IBM as the template of
uber-capitalism but it seems that the title of "king of profit by the elimination
of seemingly superfluous expenses" goes to HP. Don't they realize that IBM's
effort in this area is done under IBM's advertising budgets? Just like IBM's
1990s foray into chess playing systems (e.g. Deep Blue) led to increased sales
as well as share prices, one day IBM will be able to say "IBM contributed to
a treatments for human diseases including cancer". IBM actions in this area
reinforce the public's association with IBM and information processing.
The Encyclopedia of DNA
Elements (ENCODE) Consortium is an international collaboration of research groups
funded by the National Human Genome Research Institute (NHGRI).
The goal of ENCODE is to build a comprehensive parts list of functional elements
in the human genome, including elements that act at the protein and RNA levels,
and regulatory elements that control cells and circumstances in which a gene is
The Encyclopedia of DNA Elements (ENCODE) is a public research
consortium launched by the US
National Human Genome Research Institute (NHGRI) in September 2003. The
goal is to find all functional elements in the human genome, one of the most
critical projects by NHGRI after it completed the successful
Human Genome Project. All data generated in the course of the project will
be released rapidly into public databases.
On 5 September 2012, initial results of the project were released in a coordinated
set of 30 papers published in the journals Nature (6 publications),
Genome Biology (18 papers) and Genome Research
(6 papers). These publications combine to show that approximately 20% of
noncoding DNA in the human genome is functional while an additional 60%
is transcribed with no known function. Much of this functional non-coding DNA
is involved in the
regulation of the
expression of coding genes. Furthermore the expression of each coding gene
is controlled by multiple regulatory sites located both near and distant from
the gene. These results demonstrate that gene regulation is far more complex
than previously believed.
The Million-Core Problem - Stanford researchers break a supercomputing barrier.
quote: A team of Stanford researchers have broken a record in supercomputing,
using a million cores to model a complex fluid dynamics problem. The computer
is a newly installed Sequioa IBM Bluegene/Q system at the Lawrence Livermore
National Laboratories. Sequoia has 1,572,864 processors, reports Andrew
Myers of Stanford Engineering, and 1.6 petabytes of memory.