Folding@Home + BOINC - Tips and Advocacy

Hacker Perspective: Folding@home Tips and Advocacy

In 1597, English philosopher Francis Bacon said 'knowledge is power'. The Folding@home project is proof that 'power (both electrical + computer) produces knowledge'. This is even more true with the rise of Artificial Intelligence projects like Google's AlphaFold.

This Wikipedia article may be more informative than the material below.

Misfolded proteins have been implicated in numerous diseases. Folding@home is biological research based upon the science of Molecular Dynamics where Molecular Chemistry and mathematics are combined in computer-based models to predict how protein molecules might fold, or misfold, in space over time. This information is then used to guide scientific and medical research.

When I first heard about this, I recalled the science-fiction magnum opus by Isaac Asimov colloquially known as The Foundation Trilogy which introduced the fictional science of psychohistory (where statistics, history and sociology were combined in computer-based models) to guide humanity's future in order to minimize a potential dark age. How did Asimov conceive of such a thing?

During my time as a student, I became infected with an Asimov inspired optimism about humanity's future and have since felt the need to promote Asimov's vision. While Folding@home will not cure my "infection of optimism", I am convinced Dr. Isaac Asimov (PhD in Biochemistry from Columbia in 1948, then was employed as a Professor of Biochemistry at the Boston University School of Medicine for 10-years until his personal publishing workload became too large) would have been fascinated by something like this. (he actually published a few popular articles on proteins before his death in 1992)

I was considering a financial charitable donation to Folding@home when it occurred to me that my money would be better spent by making a knowledgeable charitable donation to all of humanity by:

Increasing my Folding@home computations (which will advance medical discoveries to increase both the length and quality of human life). I was already folding on a half-dozen computers so all I needed to do was purchase video graphics cards which would increase my computational throughput by a thousand-fold (three orders of magnitude).
Convincing others (like you) to follow my example. My solitary folding efforts will have little effect on humanity's future but together we can make a real difference.
Showing you how to get the client working on Linux

Dr. Asimov: I am dedicating this website to you and your work. You have greatly influenced my life.

Quick-Navigation Menu

my Computational Statistics with help from friends in China
Protein Folding Overview - doing science with computers.
Folding-at-home links.
Folding via graphics cards
Folding with Nvidia on Linux
- Folding with EL7 (2019: CentOS)
- Folding with EL8 (2022: Rocky, AlmaLinux OS, etc.)
- Common to all Enterprise Linux installs
Windows Scripting and Programming
BOINC (Berkeley Open Infrastructure for Network Computing)
- World Community Grid (Sponsored by IBM)
Biology Science Links
Recommended Biology Books
Isaac Asimov (includes his 15-book recommended reading order)

Protein Folding Overview

Science Problem

Misfolded proteins have been associated with numerous diseases and age-related illnesses. However, proteins are much larger and so much more complicated than smaller molecules that it is not possible to begin a chemical experiment without first providing hints to researchers about where to look and what to look for. Since the behavior of atoms-in-molecules (Computational Chemistry) as well as atoms-between-molecules (Molecular Dynamics) can be modeled, it makes more sense to begin with a computer analysis. Once that has been completed, permitted configurations can then be passed on to experimental researchers.

Real-world observation

From a kitchen point of view, chicken eggs are a mix of water, fat (yolk), and protein (albumen). Cooking an egg causes the semi-clear protein to unfold into long strings which now can intertwine into a tangled network which will stiffen then scatter light (now appears white). No chemical change has occurred, but taste, volume and color have been altered.

Click here to read a short "protein article" by Isaac Asimov published in 1993 shortly after his death.

Two (of many) TED-Talk Videos:

Computer Solutions

Single CPU Systems

Using the most powerful single core processor (Pentium-4), simulating the folding possibilities of one large protein molecule for one millisecond of chemical time might require one million days (2737 years) of computational time. However, if the problem is sliced up then assigned to 100,000 personal computers over the internet, the computational time drops to ten days. Convincing friends, relatives, and employers to do the same could reduce the computational time requirement to one day or less.

chemical time in nature	required simulation time
chemical time in nature	one computer	100,000 computers	1 million computers
1 Second	1 billion days (2.7 million years)	27 years	2.7 years
1 mS	1 million days (2,737 years)	10 days	1 day
1 uS	1 thousand days (2.73 years)	14.4 minutes	1.44 minutes

Additional information for science + technology nerds

Not a nerd? Click skip this section

Special-purpose research computers like IBM's Blue Gene employ 10 to 20 thousand processors (CPUs) joined by many kilometers of optical fiber to solve problems. IBM's Roadrunner is a similar technology employing both "CPUs" and "special non-graphic GPUs that IBM refers to as cell processors"
1. The basic terms:
  - Early CPUs were built around integer processing with floating point operations being simulated in software.
  - Later CPUs offered built-in support for floating point operations
  - The combined throughput (fetch floating point data from memory, manipulate it, then write it back) is known as a FLOP (FLoating point OPeration) with the total spec being published as FLOPS (FLoating point OPerations per Second)
2. As of June 2023, the Folding@home project consists of ~ 40,000 active platforms (some hosting 14,000 GPUs) which yield 26,135 TeraFLOPS (26 PetaFLOPS).
3. Equivalents:
  - A Pentium-4 was rated at 12 GigaFLOPS
    (26 x 10^15) / (12 x 10^9) = 2,167,000 Pentium-4 processors
  - A Core i7 was rated at 100 GigaFLOPS which is 8-times higher than the Pentium-4).
    (26 x 10^15) / (100 x 10^9) = 260,000 Core i7 processors
4. This means that the original million-day protein simulation problem could theoretically be completed in (1,000,000 / 2,167,000) 0.46 days (or 11 hours). But since there are many more protein molecules than DNA molecules, humanity could be at this for decades. Adding your computers to Folding@home will permanently advance humanity's progress in protein research and medicine.
When the Human Genome Project (to study human DNA) was being planned, it was thought that the task may require 100 years. However, technological change in the areas of computers, robotic sequencers, and the internet after the world-wide-web appeared in 1991 (to coordinate the activities of a large number of universities where each one was assigned a small piece of the problem), allowed the human genome project to publish results after only 15 years (a 660% increase).

Distributed computing projects like Folding@home and BOINC have only been possible since 1995:

the world-wide-web (proposed in 1989 to solve a document sharing problem among scientists at CERN in Geneva; then implemented in 1991) began to make the internet both popular and ubiquitous.
CISC was replaced with RISC which further evolved to superscalar RISC then multicore.

Vector processing became ubiquitous (primarily) in the form of video cards.

Processor technology was traditionally defined this way:

Scalar (one data stream per instruction. e.g. CISC CPU)
Superscalar (1-6 non-blocking scalar instructions simultaneously in a pipeline. e.g. RISC CPU)
See: Flynn's Taxonomy for definitions like SISD (single instruction single data) and SIMD (single instruction multiple data) but remember that Data represents "Data stream"
See: Duncan's taxonomy for a more modern twist
Caveat: these lists purposely omit things like SMP (symmetric multiprocessing) and VAX Clusters

Then CISC and RISC vendors began adding vector processing instructions to their CPU chips which blurred everything:

Vector Processing (multiple data streams per instruction)
- Terminology from math, science and computer science:
  - scalar: any measurement described by one data point (e.g. 30 km/hour)
  - vector: any measurement described by two data points (e.g. 30 km/hour, North)
    - A collection of vectors is usually referred to as a matrix (although a 2-dimensional data structure created in a computer is also known as a matrix; this includes a single spreadsheet as well as a set of database records where the columns represent field names; note that these examples can contain scalars, vectors, and tensors)
  - tensor: any item involving three, or more, data points.
- Vector processing is a generic name for any kind of multi data point math (vector or tensor) performed on a computer.
  - Tensor programming is not new in computing. Climate modelling begins with weather-prediction trials on ENIAC (a scalar machine) in 1947
  - Google released a neat math library in 2015 called TensorFlow
- Technological speed up:
  - While it is possible to do floating point (FP) math on integer-only CPUs, adding specialized logic to support FP and transcendental math can decrease FP processing time by one order of magnitude (x10) or more.
  - Similarly, while it is possible to do vector processing (VP) on a scalar machine, adding specialized logic can decrease VP processing time by 2 to 3 orders of magnitude (x100 to x1000).

Development over the decades:

Mainframe Computers
Minicomputer / Workstation
1. 1989: DEC (Digital Equipment Corporation) adds vector processing to their Rigel uVAX chip
2. 1989: DEC adds optional vector processing to VAX-6000 model 400 minicomputer
  - http://manx-docs.org/collections/mds-199909/cd1/vax/60vaaom1.pdf (VAX 6000 Series - Vector Processor Owner’s Manual)
  - http://www.1000bit.it/ad/bro/digital/djt/dtj_v02-02_1990.pdf (Digital Technical Journal - Vol-2-Num-2 - Spring 1990)
  - comment: VAX-6000 was the Chevy of the computer industry at this time.
3. 1994: VIS 1 (Visual Instruction Set) was introduced into UltraSPARC processors by Sun Microsystems
  - comment: UltraSPARC was a 64-bit implementation of 32-bit SPARC.
4. 1996: MDMX (MIPS Digital Media eXtension) is released by MIPS.
5. 1997: MVI (Motion Video Extension) was implemented on the DEC Alpha 21164. MVI appears again in Alpha 21264 and Alpha 21364.
  - http://www.alphalinux.org/docs/MVI-full.html
  - archive: https://web.archive.org/web/20140909020709/http://www.alphalinux.org/docs/MVI-full.html
  - comment: Alpha was a 64-bit RISC successor to VAX.

Microcomputer / Desktop

1997: MMX was implemented on P55C (a.k.a. Pentium 1) from Intel and introduced 57 MMX-specific instructions.
1998: 3DNow! was implemented on AMD K-2.
1999: AltiVec (also called "VMX" by IBM and "Velocity Engine" by Apple) was implemented on PowerPC 4 from Motorola.
1999: SSE (Streaming SIMD Extensions) was implemented on Pentium 3 "Katmai" from Intel.
1. This technology employs 128-bit instructions on eight additional registers.
2. SSE was Intel's reply to AMD's 3DNow!
3. SSE replaces MMX (both are SIMD, but SSE uses its own floating-point registers)
2001: SSE2 was implemented on Pentium 4 from Intel
2004: SSE3 was implemented on Pentium 4 Prescott on from Intel
2006: SSE4 was implemented on Intel Core and AMD K10
2008: AVX (Advanced Vector Instructions) proposed by Intel + AMD but not seen until 2011.
1. many components extended to 256-bits.
2012: AVX2 (more components extended to 256-bits)
2015: AVX-512 (512-bit extensions) first proposed in 2013 but not seen until 2015
1. many components extended to 512-bits.

Technology	Width	Year
MMX	64 bits	1997
SSE	128 bits	1999
AVX	256 bits	2008
AVX-512	512 bits	2015

Add-on graphics cards
- GPU (graphics programming unit) takes vector processing to a whole new level. Why? A $200.00 graphics card now can equip your system with 1500-2000 streaming processors and 2-4 GB of additional high-speed memory. According to the 2013 book "CUDA Programming", the author provides evidence why any modern high-powered PC equipped with one, or more (if your motherboard supports it), graphics cards can outperform any supercomputer listed 12 years ago on www.top500.org
- Many companies manufactured graphics cards (I recall seeing them available as purchase options in the IBM-PC back in 1981) but I will only mention two companies here:
  - ATI Technologies (founded in 1985)
    - introduces GPU chipsets in the early 1990s that can do video processing without the need for a CPU.
    - introduces the Radeon line in 2000 specifically targeted at DirectX 7.0 3D acceleration.
    - acquired by AMD in 2006.
  - Nvidia (founded in 1993)
    - introduces the GeForce line in 1999.
    - introduces the Tesla line in 2007; these pure-math video cards have no video connector so cannot be connected to a video monitor.
    - CUDA is released in 2007.
The circle of life?
- specialized mainframe computers from companies like IBM and Cray are built to host many thousands of "non-video video cards" (originally targeted for PCs and workstations). IBM's Roadrunner is one example.

To learn more:

https://en.wikipedia.org/wiki/Graphics_processing_unit (GPU)
https://en.wikipedia.org/wiki/GPGPU (General Purpose computing on Graphics Processing Units)
Math + Science:
- https://en.wikipedia.org/wiki/CUDA
- https://en.wikipedia.org/wiki/OpenCL
- folding at home (protein analysis)
- http://chortle.ccsu.edu/VectorLessons/vectorIndex.html (Vector Math Tutorial)
- https://en.wikipedia.org/wiki/Tensor_processing_unit (a Google TPU is built using ASICs)
- NVIDIA science cards (graphics cards without a video connector) break Moore's Law every year
- eBooks
  - Programming on Parallel Machines (V1.4 2014)
    - see chapter 5 for CUDA examples in the C programming language.
  - Is Parallel Programming Hard, And, If So, What Can You Do About It? (2017)
  - https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing
Graphics:
- https://en.wikipedia.org/wiki/DirectX
- https://en.wikipedia.org/wiki/Direct3D (When the history of computing is written, Microsoft will be better known for DirectX and Direct3D than Windows)
- https://en.wikipedia.org/wiki/OpenGL
Gaming
- Both the PlayStation 4 as well as the XBOX One employ an 8-core APU (Accelerated Processing Unit) made by AMD code-named Jaguar. What is an APU? It is a multi-core CPU with an embedded Graphics Chip Engine. Placing both systems on the same silicon die eliminates the signal delay associated with sending signals over an external bus.
- How do Video Game Graphics Work? (2024) https://www.youtube.com/watch?v=C8YtdC8mxTU

I've been in the computer industry for decades but noticed that computers only began to get really interesting again with the releases of CUDA (2007) and OpenCL (2009)

Distributed computing projects like Folding@home and BOINC have only been practical since 2005 when the CPUs in personal computers began to out-perform mini-computers and enterprise servers. Partly because...
1. AMD added 64-bit support to their x86 processor technology calling it x86-64. (Linux distros still refer to this as a 686)
2. Intel followed suit calling their 64-bit extension technology EM64T
3. DDR2 memory became popular (this dynamic memory is capable of double-data-rate transfers)
  1. Intel added DDR2 support to their Pentium 4 processor line (2002)
  2. AMD added DDR2 support to their Athlon 64 processor line (2006)
4. DDR3 memory became popular (this dynamic memory is capable of quadruple-data-rate transfers)
Since then, the following list of technological improvements has made computers both faster and less expensive:
1. Intel's abandonment of NetBurst which meant a return to shorter instruction pipelines starting with Core2
  comment: AMD never went to longer pipelines; a long pipeline is only efficient when running a static CPU benchmark for marketing purposes - not running code in real-world where i/o events interrupt the primary foreground task (science in our case)
2. multi-core (each core is a fully functional CPU) chips from all manufacturers.
3. continued development of optional graphic cards where CPUs would off-load much work to a graphics co-processor system (each card appeared as many hundreds to several thousand streaming processors)
  1. ATI Radeon graphics cards (ATI was acquired by AMD in 2009)
  2. NVIDIA GeForce graphics cards
  3. development of high performance "graphics" memory technology (e.g. GDDR3 , GDDR4 , GDDR5) to bypass processing stalls caused when streaming processors are too fast.
  4. Note that GDDR5 is used as main memory in the PlayStation 4 (PS4). While standalone PCs were built to host an optional graphics card, it seems that Sony has flipped things so that their graphics system is hosting an 8-core CPU. These hybrids go by the name APU (Accelerated Processing Unit)
4. shifting analysis from host CPU cores (usually 2-4) to thousands of streaming processors
5. Intel replacing 20-year old FSB technology with a proprietary new approach called QuickPath Interconnect (QPI) which is now found in Core-i3, Core-i5, Core i7 and Xeon
  Historical note:
  1. DEC created the 64-bit Alpha processor which was first announced in 1992 (21064 was first, 21164, 21264, 21364, came later)
  2. Compaq bought DEC in 1998.
  3. The DEC division of Compaq created CSI (Common System Interface) for use in their EV8 Alpha processor which was never released.
  4. HP merged with Compaq in 2002.
  5. HP preferred Itanium2 (jointly developed by HP and Intel) so announced their intention to gracefully shut down Alpha.
  6. Alpha technology (which included CSI) was immediately sold to Intel.
  7. Approximately 300 Alpha engineers were transferred to Intel between 2002 and 2004.
  8. CSI morphed into QPI (some industry watchers say that Intel ignored CSI until the announcement by AMD to go with a new industry-supported technology known as HyperTransport.
6. The remainder of the industry went with a non-proprietary technology called HyperTransport which has been described as a multi-point Ethernet for use within a computer system.
As is true in any "demand vs. supply" scenario, most consumers didn't need additional computing power which meant that chip manufacturers had to drop their prices just to keep the computing marketplace moving. This was good news for people setting up "folding farms". Something similar is happening today with computer systems since John-q-public is shifting from "towers and desktops" to "laptops and pads". This is causing the price of towers and graphics cards to plummet ever lower. You just can't beat the price-performance ratio of a Core-i7 motherboard hosting an NVIDIA graphics card.

Shifting from brute-force "Chemical Equilibrium" algorithms to techniques involving Bayesian statistics and Markov Models will enable some exponential speedups.

Computational Chemistry
Student Questions:

Using information from the periodic table of the elements you can see that the molecular mass of water (H₂O) is ~18 which is lighter than many gases so why is water in a liquid state at room temperature while other slightly heavier molecules take the form of a gas?
Ethanol (a liquid) has one more atom of Oxygen than Ethane (a gas). How can this small difference change the state?

Liquid Water
This diagram depicts an
H₂O molecule loosely
bound to four others by
Van der Walls forces.

Substance	Molecule	Atomic Masses	Molecular Mass	State at Room Temperature
Water	H₂O	(1x2) + 16	18	liquid
Carbon Monoxide	CO	12 + 16	28	gas
Molecular Oxygen	O₂	(16x2)	32	gas
Carbon Dioxide	CO₂	12 + (16x2)	44	gas
Ozone	O₃	(16x3)	48	gas

Methane	CH₄	12 + (1x4)	16	gas
Ethane	C₂H₆	(12x2) + (1x6)	30	gas
Ethanol	C₂H₆O	(12x2) + (1x6) + 16	46	liquid
Propane	C₃H₈	(12x3) + (1x8)	44	gas
Butane	C₄H₁₀	(12x4) + (1x10)	58	gas
Pentane	C₅H₁₂	(12x5) + (1x12)	72	gas
Hexane	C₆H₁₄	(12x6) + (1x14)	86	liquid
Heptane	C₇H₁₆	(12x7) + (1x16)	100	liquid
Octane	C₈H₁₈	(12x8) + (1x18)	114	liquid

Short answers:

In the case of an H₂0 (water) molecule, even though two hydrogen atoms are covalently bound to one oxygen atom, those same hydrogen atoms are also attracted to each other which causes the water molecule to bend into a Y shape (according to VSEPR Theory). At the mid-point of the bend, a positive electrical charge from the oxygen atom is exposed to the world which allows a weak connection to the hydrogen atom of a neighboring H₂0 molecule (water molecules weakly sticking to each other form a liquid). These weak connections are called Van der Waals forces
Here are the molecular schematic diagrams of Ethane (symmetrical) and Ethanol (asymmetrical). Notice that Oxygen-Hydrogen kink dangling to the right of Ethanol. That kink is not much different than a similar one associated with water. That is the location where a Van der Waal force weakly connects with an adjacent ethanol molecule (not shown). So it should be no surprise that ethane at STP (Standard Temperature and Pressure) is a gas while Ethanol is a liquid.
```
 Ethane        Ethanol
(symmetrical) (asymmetrical)

  H H           H H   H
  | |           | |  /
H-C-C-H       H-C-C-O
  | |           | |
  H H           H H
```

Van der Waals did all his computations using pencil and paper long before the first computer was invented; and this was only possible because the molecules in question were small and few.

Chemistry Caveat: The Molecular Table above was only meant to get you thinking. Now inspect this LARGER periodic table of the elements where the color of the atomic number indicates whether the natural state is solid or gaseous:

all elements in column 1 (except hydrogen) are naturally solid.
all elements in column 8 (helium to radon) are naturally gaseous.
half the elements in row 2 starting with Lithium (atomic number 3) and ending with Carbon (atomic number 6), as well as two thirds of row 3 starting with Sodium (atomic number 11) and ending with Sulfur (atomic number 16), are naturally solid. I will leave it to you to determine why.

Molecular Dynamics
Proteins come in many shapes and sizes. Here is a very short list:

Protein	Mass	Function	Notes
Chlorophyll a	893	facilitates photosynthesis in plants
Heme A	852	common ligand for many hemeproteins including hemoglobin and myoglobin
Alpha-amylase	56,031	salivary enzyme to digest starch	pdbId=1SMD
hemoglobin	64,458	red blood cell protein
DNA polymerase	varies from 50k to 200k	enzyme responsible of DNA replication

These molecules are so large that modeling their interactions can only be done accurately with a computer.

FAH Links

https://foldingathome.org	cover page
https://foldingathome.org/home/	home page
https://foldingathome.org/start-folding/	download page
https://foldingforum.org	technology problems, discussions, news, science, etc.

FAH Targeted Diseases

This "folding knowledge" will be used to develop new drugs for treating diseases such as:

ALS ("Amyotrophic Lateral Sclerosis" a.k.a. "Lou Gehrig's Disease")
Alzheimer's Disease
- Plaques, which contain misfolded peptides called amyloid beta, are formed in the brain many years before the signs of this disease are observed. Together, these plaques and neurofibrillary tangles form the pathological hallmarks of the disease.
Cancer & P53
- P53 is the suicide gene involved in apoptosis (programmed cell death - something necessary in order your immune system to kill cancer cells)
CJD (Creutzfeldt-Jakob Disease)
- the human variation of mad cow disease
Huntington's Disease
- Huntington's disease is caused by a trinucleotide repeat expansion in the Huntingtin (Htt) gene and is one of several polyglutamine (or PolyQ) diseases. This expansion produces an altered form of the Htt protein, mutant Huntingtin (mHtt), which results in neuronal cell death in select areas of the brain. Huntington's disease is a terminal illness.
Osteogenesis Imperfecta
- Normal bone growth is a yin-yang balance between osteoclasts and oseteoblasts. Osteogenesis Imperfecta occurs when bone grows without sufficient or healthy collagen (protein)
Parkinson's Disease
- The mechanism by which the brain cells in Parkinson's are lost may consist of an abnormal accumulation of the protein alpha-synuclein bound to ubiquitin in the damaged cells.
Ribosome & antibiotics
- A ribosome is a protein producing organelle found inside each cell.

Reference Links: Folding@home - FAQ Diseases

More Information About Proteins and Protein-Folding Science

Online Documents

Wikipedia
- https://en.wikipedia.org/wiki/Folding@home
- https://en.wikipedia.org/wiki/Protein_folding
AlphaFold by DeepMind (a Google Alphabet Company)
- https://en.wikipedia.org/wiki/AlphaFold
- https://www.deepmind.com/research/highlighted-research/alphafold/timeline-of-a-breakthrough
  - While not part of the folding@home project, AlphaFold involves protein folding.
  - Their project took 5 years to design then implement.
  - They mapped all 200 million known proteins then freely published it as a gift to humanity.
  - Assuming that one human researcher (a PhD) could do one protein in 5-years, then AlphaFold saved 1 billion-man-years of time.
  - https://alphafold.ebi.ac.uk/ <<<--- access their online protein database
- Science (2020-11-30) https://www.science.org/content/article/game-has-changed-ai-triumphs-solving-protein-structures
- Nature (2020-11-30) https://www.nature.com/articles/d41586-020-03348-4
- Nobel Prize in Chemistry (2024-10-09) https://www.nobelprize.org/prizes/chemistry/2024/press-release/ (Protein Folding)
- Nobel Prize in Physics (2024-10-08) https://www.nobelprize.org/prizes/physics/2024/press-release/ (Neural Network path to modern A.I.)
AlphaFold 2 (2021-07-19)
- https://www.zdnet.com/article/deepminds-alphafold-2-reveal-what-we-learned-and-didnt-learn/#google_vignette
AlphaFold 3 (2024-05-08)
- https://www.nature.com/articles/d41586-024-01385-x
- https://arstechnica.com/science/2024/05/deepmind-adds-a-diffusion-engine-to-latest-protein-folding-software/
Cornell University
- https://arxiv.org/abs/2303.08993 Folding@home: achievements from over twenty years of citizen science herald the exascale era. (a.k.a. 10^18 FLOPS)
How AI Cracked the Protein Folding Code and Won a Nobel Prize (22-min video)
Veritasium: The Most Useful Thing AI Has Done (Protein folding: from Rosetta@home to DeepMind)
https://www.youtube.com/watch?v=P_fHJIYENdI
With an overview of CASP (Critical Assessment of Structure Prediction) from CASP1 (1994) to CASP13 (2018) and beyond
We Solved the Protein Folding Problem… Now What? (StarTalk + Neil dNeil deGrasse Tyson)
https://www.youtube.com/watch?v=hJ4LRswmZEs

My Computational Statistics (with some help from China)

BOINC stats:
- https://www.boincstats.com/stats/-1/user/detail/206514146972/projectList
  Name: Neil Steven Rieck
  Projects: Rosetta@Home, Docking@Home, POEM@Home, SETI@home
FAH official stats:
- https://stats.foldingathome.org/team/3213 (team: China Folding@Home Power) Top 100 Ranked Team
- https://stats.foldingathome.org/team/10987 (team: Canada) Top 1K Ranked Team
- https://stats.foldingathome.org/donor/75502 (user: neil_rieck) Top 1K Ranked Donor world rank: 442 as of 2025-07-11
  - 10,053,000,000 points: 2025-07-11 (yes, over 9 billion - done almost entirely with Nvidia graphics cards as compute engines)
  - 149,400 work units: 2025-07-11
- https://folding.lar.systems

FAH third-party stats:

Many like-minded people in China are helping with protein folding (this is great news). Some of their client processes are folding under my account name at a very respectable 8 million points per day. Whoever you are, many thanks for helping humanity advance biological knowledge. Isaac Asimov would approve.

from	to	user	team	third party statics	rank
2007-12-22	2022-01-28	neil_rieck	Default	Final Account Tally ( Points: 418,381,428 WU: 110,984 ) https://folding.extremeoverclocking.com/user_summary.php?s=&u=306663 Note: no new stats because I changed teams which generates a new 'u'	1005
2022-01-28	present	neil_rieck	Canada	https://folding.extremeoverclocking.com/user_summary.php?s=&u=1291088	12
2023-03-04	present	neil_rieck	China Folding@Home Power	https://folding.extremeoverclocking.com/user_summary.php?s=&u=1316605	13
			Canada	https://folding.extremeoverclocking.com/team_summary.php?s=&t=10987	91
			China Folding@Home Power	https://folding.extremeoverclocking.com/team_summary.php?s=&t=3213	12

Folding via GPUs (graphics cards)

A single core Pentium-class CPU provides one scalar processor. It also provides one streaming (vector) processor under marketing names like MMX (64-bit) , SSE (128-bit), AVX (256-bit) and AVX-512 (512-bit)
Most Core i5 and Core i7 CPU's provide four scalar cores so offer at least four streaming (vector) processors.
A single add-on graphics card can provide hundreds to thousands vector processors (the Nvidia RTX-3090 provides 10,496)

Computer technology section moved here

Folding with Linux (2019 - 2025)

click: Linux tips + real world problems and solutions

Folding with EL7 (2019)

caveat: GPU folding on CentOS-7 failed December-2021 and is no longer possible so: jump to 2022

I found two junk PCs in my basement with 64-bit CPUs that were running 32-bit operating systems (Windows-XP and Windows-Vista). Unfortunately for me, neither were eligible for Microsoft's free upgrade to Windows-10, and I had no intention of buying a new 64-bit OS just for this hobby. So I swapped out Windows with CentOS-7 and was able to get each one folding with very little effort. What follows are some tips for people who are not Linux gurus:

Download a DVD ISO image of CentOS-7 via this link: https://www.centos.org/download/
- CentOS-7.7 and CentOS-8.0 were released days apart in September 2019 (perhaps due to the invisible hand of IBM?)
- Software from the top of download page preferentially offers CentOS-8 which is too large (> 4.7 GB) to write to a single-layer writable DVD, but I have has some success with Dual Layer optical media. Linux distros assume you will copy these images to a USB stick, but the BIOS in many older machines does not support booting from a thumb drive.
Transfer to bootable media (choose one of the following)
- copy the ISO image to a DVD-writer
  -OR-
- use rufus to format an USB stick then copy the ISO image to the USB
  caveat: newer PCs have transitioned from BIOS to UEFI. Older BIOS-based systems do not support booting from a USB stick (strange because you can connect a USB-based DVD then boot from that)

Boot-install CentOS-7 on the 64-bit CPU

Using a larger downloaded image:

1) burn, boot, install Linux using recipe: "GUI with Server"
2) reboot; now update via the internet like so: yum update
3) reboot;
4) add development tools: yum group "Development Tools" install

Using smaller downloads (then adding GUI after the fact):

1) burn, boot, install Linux using recipe: "Server"
2) reboot; now update via the internet like so: yum update
3) reboot; optionally enable GUI: yum group "Server with GUI" install
   may also need to type:
      systemctl isolate     graphical.target
      systemctl set-default graphical.target
4) add development tools: yum group "Development Tools" install

My machines hosted NVIDIA graphics cards (GTX-560 and GTX-960 respectively) so these systems required the correct NVIDIA drivers in order to do GPU-based folding. Why? The generic drivers only support video but folding science requires OpenCL and CUDA

HARD WAY: If you are a Linux guru and know how to first disable the Nouveau driver, then install the 64-bit Linux driver provided by NVIDIA here: http://www.nvidia.com/Download/index.aspx

EASY WAY: First install the nvidia-detect module found at elrepo (Enterprise Linux REPOsitory) here ( http://elrepo.org/tiki/tiki-index.php ) and documented here ( https://elrepo.org/tiki/nvidia-detect ) then use the utility to install another elrepo package. The steps look similar to this if you are logged in as root:

step-1 (install one/two repos via yum)

activity	Linux command	Notes	Description
view available repos	yum list \-release\		backslash escapes the asterisk
install elrepo	yum install elrepo-release	required	Enterprise Linux REPO
install epel	yum install epel-release	optional	Extra Packages for Enterprise Linux

step-2 (install nvidia-detect from elrepo)

activity	Linux command
install nvidia-detect	yum install nvidia-detect
test nvidia-detect	nvidia-detect -v
install nvidia driver	yum install $(nvidia-detect)
reboot	reboot

If you are not logged in as root then you must prefix every command with sudo (Super User DO).
Now jump to Linux common

Folding with EL8 (2022 - 2025)

My CentOS-7 systems stopped folding in 2021-12-xx (December 2022). Apparently this is due to several changes by the FAH research team.

First off, new FAH downloadable cores require an updated version of Linux library glibc which is not available on CentOS-7, so you need to upgrade to CentOS-8, or any other EL8 (see note #5 below)
Secondly, changes to the FAH GPU cores now require a minimum of OpenCL-1.2. This means that my GTX-560 is no longer useful as a streaming processor. I noticed another blurb about double-precision FP math which definitely rules out my GTX-560 so I replaced it with a GTX-1650
On both systems I replaced CentOS-7 with CentOS-8 (via a fresh install) then followed the CentOS-7 instructions just above. However, the Nvidia driver from elrepo-release (EL8) no longer contains any support for OpenCL or CUDA, so I was forced to install the Linux driver provided by Nvidia
- you will first need to disable the Nouveau video driver which requires some Linux guru voodoo (see below)
For some reason, these CentOS-8 machines seemed sluggish at the time, so I jumped to Rocky Linux 8.5 which fixed my problem. Since then I have come to believe that AlmaLinux would be a better choice.
Caveat: IBM purchased Red Hat in 2019 for the sum of US$34 billion. In 2020 the resultant Blue Hat announced their intent to discontinue CentOS-8 minor point updates after 2021-12-31 because too many companies (like Facebook and Twitter) were using the free version rather than paying for RHEL (gotta speed up the ROI on that $34 billion investment). Even though Red Hat begins each OS iteration using open-source software, on 2023-06-21 they announced that they were going to restrict access (close source) their modifications. This will affect all EL (Enterprise Linux) flavors of Linux including: AlmaLinux, EuroLinux, Oracle Linux, Rocky Linux, etc. so you might consider moving to any target supported by this migration tool (ELevate - leapp). Note that academics working at Fermilab just outside of Chicago, or CERN (home of the LHC) in Geneva, recommend moving to AlmaLinux and I agree. My advice is to always employ a Linux that can be downloaded from a university mirror. Here is one example of 10,000: https://mirror.csclub.uwaterloo.ca/
Mirror observations (2024):
- active offerings exist only for AlmaLinux (this appears to be the only EL available)
- all CentOS folders (point release and stream) are empty
- no mirror offerings for Oracle Linux, EuroLinux, Rocky Linux
Real world concern: you never know when problems (political, commercial, technological) will block you from getting to a single site. But you can always modify "/etc/yum.conf" to point directly to a nearby university mirror (if files exist on the university mirror; most mirrors now only contain AlmaLinux offerings)

Updating to a NVIDIA published driver
- after the initial Linux install, type "sudo yum update" to bring the platform up to the latest level. If a new kernel was installed then you must reboot before you continue.
- DO NOT use elrepo to update the Nvidia driver (on 2022-01-16 it was missing support for OpenCL and CUDA)
- read these reference notes:
  - http://www.nvidia.com/object/unix.html
  - https://linuxconfig.org/install-the-latest-nvidia-linux-driver
  - https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-centos-8
  - https://support.huawei.com/enterprise/en/doc/EDOC1100165479/93fe5683/how-to-disable-the-nouveau-driver-for-different-linux-systems (many thanks to my computer buddies in China for this last tip)
- steps:
  1. Only download the desired driver file from NVIDIA into the root account
    1. for GTX960 I used: NVIDIA-Linux-x86_64-470.86.run (will probably be renamed before you read this)
    2. for GTX1650 I used: NVIDIA-Linux-x86_64-525.89.run (will probably be renamed before you read this)
  2. Now disable the Nouveau driver
    - create file /etc/modprobe.d/blacklist-nouveau.conf containing these two lines:
      blacklist nouveau
      options nouveau modeset=0
    - create a new ramdisk for use during system boot: dracut --force
    - reboot
    - caveat: at this point your monitor is no longer capable of displaying small text in character-cell mode but you don't care because you've already downloaded the necessary files from Nvidia corporation. Right?
  3. Install the NVIDIA driver
    - yum group "Development Tools" install
    - chmod 777 NVIDIA-Linux-x86_64-470.86.run
    - ./NVIDIA-Linux-x86_64-470.86.run
      - caveat: the previous command might fail for the following reasons:
        
        not an executable file (did you use chmod ?)
        
        no build tools (gcc compiler, etc. Did you install "Development Tools"?)
        
        nouveau driver is running
    - reboot
- caveats:
  - kernel updates via "yum update" (CentOS-7) or "dnf update" (CentOS-8) always require a reboot. Ninety percent of the time you will need to reinstall the Nvidia driver after a kernel update (just repeat step 3 above).
  - If your console is blank then type this three-key-combo CTL-ALT-F3 (control alternate F3) then log in as root on the third virtual console.

Common to all Linux installs

Linux software an be downloaded from here:
- https://download.foldingathome.org/ (page of supported V7 + V8 software)
- https://download.foldingathome.org/releases/ (okay to browse)
- https://download.foldingathome.org/releases/v7/public/fahclient/centos-6.7-64bit/release/
- https://foldingathome.org/v7-client/
- https://github.com/FoldingAtHome/fah-client-bastet/blob/master/BUILDING-RPM.md (building your own V8 client)
- People wishing a challenge may attempt to build their own RPM: https://github.com/FoldingAtHome/fah-client-bastet/blob/master/BUILDING-RPM.md

Installing Folding-at-home on Linux

some of the GUI-based software may not work properly (on CentOS-7 or earlier) because Python3 is a prerequisite
- Python3 can be easily added to EL7 systems like CentOS-7 but do not remove or disable Python2 because that action will break certain system utilities like yum or firewall-cmd to name two of many.
```
sudo yum install epel-release
sudo yum install python36
```
- Python-3.6.8 is the standard offering in EL8 offerings like RHEL-8, AlmaLinux-8, etc.
So I recommend:
- only install the command-line client as described here: https://foldingathome.org/support/faq/installation-guides/linux/manual-installation-advanced/
- perform a manual configure as described here: https://foldingathome.org/support/faq/installation-guides/linux/command-line-options/
  - starting the client with the --configure switch will generate an XML configuration file
  - starting the client with the --config switch will let you test an XML configuration file
  - starting the client with the --help switch will display more help than you ever dreamed possible

Caveat: just installing the FAH-Client will cause it to be installed as a service then start CPU folding (which is probably what you do not want). If you want to enable GPU-based folding (and/or disable CPU folding) then you will need to stop the client, modify the config file, test the config file, then restart the client. Here are some commands to help out.

task	command
stop service	sudo systemctl stop fahclient # for: Enterprise Linux or sudo /etc/init.d/FAHClient stop # for: Linux/UNIX
navigate here	cd /etc/fahclient
edit config file (nano does not require VI or VIM skills)	nano config.xml <config> <power v='full'/> <user v='neil_rieck'/> <team v='10987'/> <gpu v='true'/> <smp v='true'/> <slot id='0' type='CPU'/> <slot id='1' type='GPU'/> </config> notes: 1) type can be one of: 'cpu', 'smp', 'gpu' 2) never enable 'smp' unless to have >= 4 cores 3) with 'gpu' enabled I see no point in folding with 'cpu' or 'smp'
navigate here	cd /var/lib/fahclient
(optional) see other switches	/usr/bin/FAHClient --help
interactive testing	/usr/bin/FAHClient --config /etc/fahclient/config.xml -v note: GPU folding requires OpenCL-1.2 so if you see errors like 'cannot find OpenCL' then you might need to rebuild your NVIDIA driver (almost always required after any Linux update that replaces the kernel
end interactive test	hit: <ctrl-c>
start the service	systemctl start fahclient or sudo /etc/init.d/FAHClient start &
monitor step #1 monitor step #2 monitor step #3	top -d 0.5 cat /var/lib/fahclient/logs.txt tail -40 /var/lib/fahclient/logs.txt

Microsoft Windows Scripting and Programming

MS-DOS/MSDOS Batch Files: Batch File Tutorial and Reference
MS-DOS @wikipedia
Batch file @wikipedia
Microsoft Windows XP - Batch files

Experimental Stuff for Windows Hackers

DOS commands for creating, and starting, a Windows Service to execute a DOS script.

sc create neil369 binpath="cmd /k start c:\folding-0\neil987.bat" type=own type=interact
sc start neil369

Once created, you can stop/start/modify a service graphically from this Windows location:

Start >> Programs >> Administrative Tools >> Services

Stopped services may only be deleted from DOS like so:

sc query neil369
sc delete neil369

BOINC (Berkeley Open Infrastructure for Network Computing)

BOINC (Berkeley Open Infrastructure for Network Computing) is a science framework in which you can support one, or more, projects of choice.

If you are unable to pick a single cause then pick several because the BOINC manager will switch between science clients every hour (this interval is adjustable). In my case I actively support POEM, Rosetta, and Docking.
- http://boincstats.com/en/stats/-1/user/detail/10937/projectList
The current BOINC client can be programmed to use one, some, or all cores of a multi-core machine. The BOINC client can also utilize (or not) the streaming processors on your Graphics Card.

Protein / Biology / Medicine Projects

General

https://boinc.berkeley.edu/wiki/Installing_BOINC#Linux

POEM@home (via BOINC)

https://en.wikipedia.org/wiki/POEM@Home
https://www.youtube.com/watch?v=EchPDQRRYq4 (promotional German-language video with English subtitles)

Rosetta@home (via BOINC)

http://boinc.bakerlab.org/rosetta/ is the home of Rosetta@home which operates through the BOINC framework. Their graphics screen-saver is one very effective way to help visualize "what molecular dynamics is all about". All science teachers must show this to their students.
- I'm sure everyone already knows that a computer "rendering beautiful graphical displays" is doing less science. That said, humans are visual creatures and graphical displays have their place in our society. Except for some public locations, all clients should be running in non-graphical mode so that more system resources are diverted to protein analysis.
Five questions for Rosetta@home: How Rosetta@home helps cure cancer, AIDS, Alzheimer's, and more
- http://boinc.bakerlab.org/rosetta/rah_education/
https://en.wikipedia.org/wiki/Rosetta@home
https://www.youtube.com/watch?v=GzATbET3g54 (official 7 minute video at YouTube.com)

World Community Grid (via BOINC)

http://www.worldcommunitygrid.org - sponsored by IBM
http://www.wcgrid.org/join - use this link to access the WCG-specific BOINC client.
Notes:
1. some people may prefer to use the generic BOINC client from Berkley then install the WCG plugin from that application; you will still need to create your WCG account at the WCG site
2. You only need to do this if you want to cycle your BOINC client between multiple projects of which WCG is just one
3. If you only want to run the WCG project (which also switches between IBM sponsored science projects) then it probably makes more sense to use the WCG-specific client
https://en.wikipedia.org/wiki/World_community_grid (WCG) is an effort to create the world's largest public computing grid to tackle scientific research projects that benefit humanity. Launched 2004-11-16, it is funded and operated by IBM with client software currently available for Windows, Linux, Mac-OS-X and FreeBSD operating systems. They encourage their employees and customers to do the same.
http://www.worldcommunitygrid.org/research/proteome/overview.do - Human Proteome Folding
http://www.worldcommunitygrid.org/research/hpf2/overview.do - Human Proteome Folding - Phase 2

Personal Comment: I wonder why HP (Hewlett-Packard) has not followed IBM's lead. Up until now I always thought of IBM as the template of uber-capitalism but it seems that the title of "king of profit by the elimination of seemingly superfluous expenses" goes to HP. Don't they realize that IBM's effort in this area is done under IBM's advertising budgets? Just like IBM's 1990s foray into chess playing systems (e.g. Deep Blue) led to increased sales as well as share prices, one day IBM will be able to say "IBM contributed to a treatments for human diseases including cancer". IBM actions in this area reinforce the public's association with IBM and information processing.

Biology Science Links

Howard Hughes Medical Institute
- http://www.hhmi.org/
Research Collaboratory for Structural Bioinformatics (RCSB)
- http://www.rcsb.org/pdb/101/structural_view_of_biology.do - Structural View of Biology
http://www.biomachina.org/research/
- Custom-Built Supercomputer Brings Protein Folding Into View
  http://biomachina.org/publications_web/SHAW10_news_of_the_week.pdf
Introduction to Protein Folding and Molecular Simulation - Tokyo University of Science - Tadashi Ando
- http://issofty17.is.noda.tus.ac.jp/doc/protein_simulation_English_Sep_2006.ppt

Protein Data Bank Links

http://www.pdb.org/pdb/101/motm.do?momID=31 p53 Tumor Suppressor
http://www.pdb.org/pdb/101/motm.do?momID=78 Luciferase
http://www.pdb.org/pdb/101/motm.do?momID=74 Alpha-amylase (Salivary Digestive Enzyme)
http://www.pdb.org/pdb/101/motm.do?momID=41 Hemoglobin
http://www.pdb.org/pdb/101/motm.do?momID=11 Rubisco
http://www.pdb.org/pdb/101/motm.do?momID=3 DNA Polymerase

ENCODE

The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.

http://genome.ucsc.edu/ENCODE/
https://en.wikipedia.org/wiki/ENCODE
The Encyclopedia of DNA Elements (ENCODE) is a public research consortium launched by the US National Human Genome Research Institute (NHGRI) in September 2003. The goal is to find all functional elements in the human genome, one of the most critical projects by NHGRI after it completed the successful Human Genome Project. All data generated in the course of the project will be released rapidly into public databases.
On 5 September 2012, initial results of the project were released in a coordinated set of 30 papers published in the journals Nature (6 publications), Genome Biology (18 papers) and Genome Research (6 papers). These publications combine to show that approximately 20% of noncoding DNA in the human genome is functional while an additional 60% is transcribed with no known function. Much of this functional non-coding DNA is involved in the regulation of the expression of coding genes. Furthermore, the expression of each coding gene is controlled by multiple regulatory sites located both near and distant from the gene. These results demonstrate that gene regulation is far more complex than previously believed.
http://www.nature.com/encode/
- http://www.nature.com/nature/journal/v489/n7414/full/489052a.html - ENCODE Explained
- http://www.nature.com/nature/journal/v489/n7414/full/489049a.html - The Making of ENCODE
- http://www.nature.com/news/encode-the-human-encyclopaedia-1.11312 - What’s next for ENCODE?
Bits of Mystery DNA, Far From ‘Junk,’ Play Crucial Role (what was once called "junk DNA" is now referred to as "dark genetic material"
http://www.nytimes.com/2012/09/06/science/far-from-junk-dna-dark-matter-proves-crucial-to-health.html
http://silvertonconsulting.com/blog/2012/09/07/big-sciencebig-data-encode-project-decodes-junk-dna

Local Links

Genes as Technology 1 - comparing genes to computers (information storage)
Genes as Technology 2 - comparing genes to computers (error detection and correction)
Guaranteed Human Life Extension - not a joke or scam, but it will cost you $6.00 per month.

(noteworthy) Remote Links

https://www.youtube.com/watch?v=lNh0Km4bv18&NR=1 - An introduction to protein structure and function.
https://www.khanacademy.org/science/biology/evolution-and-natural-selection/v/dna - An introduction to DNA
- https://www.youtube.com/watch?v=w8VOfmG985U - DNA Lesson - Khan Academy
https://www.khanacademy.org/science/biology/cell-division/v/chromosomes--chromatids--chromatin--etc - Chromosomes, Chromatids, Chromatin, etc.
2013
- http://www.technologyreview.com/view/510571/the-million-core-problem/ The Million-Core Problem - Stanford researchers break a supercomputing barrier.
  quote: A team of Stanford researchers have broken a record in supercomputing, using a million cores to model a complex fluid dynamics problem. The computer is a newly installed Sequioa IBM Bluegene/Q system at the Lawrence Livermore National Laboratories. Sequoia has 1,572,864 processors, reports Andrew Myers of Stanford Engineering, and 1.6 petabytes of memory.
- http://www.wired.com/wiredenterprise/2013/01/million-core-supercomputer/ Researchers Set Record With Million-Core Calculation
2020
- https://www.deepmind.com/research/highlighted-research/alphafold/timeline-of-a-breakthrough
2024
- How AI Cracked the Protein Folding Code and Won a Nobel Prize (22-min video)
2025
- Veritasium: The Most Useful Thing AI Has Done (Protein folding from Rosetta@home to DeepMind)
  https://www.youtube.com/watch?v=P_fHJIYENdI
  With an overview of CASP (Critical Assessment of Structure Prediction) from CASP1 (1994) to CASP13 (2018) and beyond

Recommended Biology Books (I own them all)

The Eighth Day of Creation (1979/1993/1996/2004) Horace Freeland Judson - highly recommended (25^th anniversary edition)
- starts with DNA; ends with RNA to Amino Acid mapping in ribosomes.
The Code of Codes (1992/2000) Daniel Kevles and Leroy Hood
- subtitled: Scientific and Social Issues in the Human Genome Project
Epigenetics (2011) Richard C Francis
- how our environment enables/disables/modulates DNA expression.
The Epigenetics Revolution (2012) Nessa Carey
- how our environment enables/disables/modulates DNA expression.
Life at the Speed of Light (2013) J. Craig Venter
- synthetic biology: from computers to DNA.

Back to Home
Neil Rieck
Waterloo, Ontario, Canada.