Hacker Perspective: Genes as Technology (Error Correction)

This document expresses some thoughts which have been rolling around my mind since 1980 when I rekindled by original interest in biology. At that time, I was a computer technologist who had recently decided that a career in software might offer more stability than one in hardware. That same year I stumbled upon "The Eighth Day of Creation" in a Toronto book store when something clicked in my head. This webpage will be successful if I can light a similar spark in at least one other person.

Neil Rieck - 1998.01.23

Edit History What
1998-01-23 original web page. Be sure to visit the first page at: genes as technology #1
2003-08-27 added my side of some email correspondence
2004-10-31 added links to https://en.wikipedia.org (an open-source encyclopedia)

Simple Overview of Error Detection in Computers

parity

  • originally known as vertical parity check (VPC) because of the association with 9-track magnetic tape technology
    • holding the tape horizontally, the parity-bit was at the top of a vertical stack of 8-bits
    • you could paint a clear chemical (called magna-see) on the tape. Once the chemical dried, you would apply "scotch tape" to the magnetic tape. Then you would carefully remove the "scotch tape" then apply it to a white piece of paper where the bits could be seen with the naked eye.
  • 8-data bits are written to hardware (e.g. tape or memory) accompanied by a single parity bit generated by the hardware adapter
  • odd parity:
    • if there are an even number of ones then hardware will also write a set (1) parity bit so that the written character will contain an odd total of set bits
    • if there are an odd number of ones then hardware will also write a clear (0) parity bit so that the written character will contain an odd total of set bits
  • even parity:
    • if there are an even number of set data bits then hardware will also write a clear (0) parity bit so that the written character will contain an even total of set bits
    • if there are an odd number of set data bits then hardware will also write a set (1) parity bit so that the written character will contain an even total of set bits
  • when read back into the computer, the 8-data bits are checked and then compared with the stored parity bit to determine whether there has been any corruption (bit flips).
  • problems:
    • will only detect single bit errors in a each 8-bit byte. Double bit errors pass through unnoticed.
    +--- bit names (a.k.a. tack names
    | +- data bits (for one written byte)
    0 0
    1 1
    2 0
    3 1
    4 0
    5 1
    6 0
    7 1
    P 1 (in this example P=1 for odd parity)   

block check character (BCC)

  • originally known as longitudinal parity check (LPC) because of the association with 9-track magnetic tape technology
  • a block of characters (usually 256, 512 or 1024 bytes long) is followed by a machine generated character which is an XOR (exclusive-OR) of the whole block
  • during a read operation, software can attempt to repair data bits using both VPC + LRC as pointers into a data matrix (e.g.. parity errors in column 49 and row 3 point to a single bit that could be repaired by software)
     
    0 01...0x
    1 11...1x
    2 00...0x
    3 10...1x
    4 01...0x
    5 11...1x
    6 00...1x
    7 10...1x
    P 11...0x
      !!   !+- Block Check Character (x = 0 or 1)
      !!   +-- Data character #1023
      !+------ Data character #1
      +------- Data character #0

CRC-16 (Cyclic Redundancy Check 16)

  • works like block check characters except bits are shifted and rotated before the XOR operation
  • this method is so sensitive that parity bits are not needed thus reducing the number of bits to be transmitted and/or stored
  • a variant of CRC-16 is known as ECC (Error Correcting Code) is very popular with hard disks and CD-ROMs
        BCC Shift Register Logic
        ========================
        notes: CRC-16 polynomial = x^16 + x^15 + x^2 + 1
             : x = XOR
         
        +------+-----------------+-----+ data feedback line (feedback operation before each shift)
        |      |                 |     ^
        v      v                 v     |
        +->FE->x->DCBA987654321->x->0->x<-- data input (via shift, LSB first)
  • Links:

Simple Overview of Error Detection in Biology

Suffice it to say that enzymes exist which step along DNA looking for incomplete base pairs (caused by ionizing radiation, cosmic particles, environmental toxins, etc.) and can repair the damage.  However, if DNA damage occurs during cell division the damage is usually copied (blind) which may cause one of the following events:

  1. one daughter cell may die:
    because the change was lethal
     
  2. one daughter cell may mutate for better:
    because the change was accidentally beneficial
    • this is one basis for evolution but the mutation needs to occur in the germ cell in order for the mutation to be passed on to the gene pool. However, the germ cells have the most amount of error detection/protection against DNA damage not to mention that they're deep within the body.
       
  3. one daughter cell may mutate for worse:
    • because the change was not beneficial but somehow the cell will now try to carry on in a different way which may lead to
    • a genetic disease like Huntington's Chorea
    • cancer
      • some types of dermoid tumors contain teeth, hair, fingernails and even whole fingers. Since every cell contains the whole genetic sequence for the whole body, either these cells think they're somewhere else in the body (due to a break down in inter-cellular communication) or they are just running amuck
      • most soma cells employ a "telomere length" mechanism to control their speed of growth and reduce the total number of times they may replicate (50 to 150 is typical). The shorter the telomere the older the cell and the slower it should grow. However, some tumors use the enzyme telomerase to restore telomere length after cell replication. Provided there no more fatal mutations and enough nutrients, these "anomalous life forms" are effectively immortal until they kill their hosts.
  4. one daughter cell may keep living
    • because the change occurred in an area of so-called junk DNA which has been abandoned long ago by evolution

More to follow...

Recent Email Correspondence

(I've only published my side of the dialog)

Sent: 2003-08-27

First off, the ribosome "is" the CPU (but perhaps microprocessor would be more accurate) as far as protein synthesis is concerned. As far as I can tell, only certain portions of DNA are enabled at any one time (when they are unwound) and then transcription enzymes read segments of the enabled DNA copying them into messenger RNA (mRNA) segments. The ribosomes read mRNA and then translate each triple base sequence into a single amino acid. At this point, one must wonder what is going on here since amino acids are the fundamental building blocks of proteins. Enzymes are simple proteins so they "might" be mediating the whole program (possibly enabling a subroutine on some other DNA sequence not yet unwound; possibly sending a signal to windup the DNA sequence just transcribed; but who knows because this is just conjecture on my part? It's just the way that I might have done it if I was designing the thing from scratch). Everyone only thinks of muscle tissue when protein is mentioned but it is the basis for everything from digestive enzymes, neurotransmitters, some long chain hormones (not steroids but maybe longer chain stuff like insulin etc.) so you can see how certain hormones might just express portions of DNA which then might trigger some kind of reaction.

As I understand it, the biology community thinks of the whole genome as a set of books (like encyclopedias). The chromosomes are the books and the genes are the chapters. I don't know if genes are one single code sequence or a collection of similar subroutines but I'd bet on the latter idea.

One interesting idea comes from something known as a Dermoid tumor. When these tumors are opened doctors sometimes find: whole teeth, hair, finger nails, whole fingers, etc. Now we know that healthy cells are always communicating their existence to there neighbors while tumor cells just do their own thing. Healthy cells exchange messages like these "we are liver cells" which probably keeps the "liver cell" program reinforced while all other programs are disabled. In the case of Dermoid tumors, something must be happening that causes the wrong program to become enabled and so a tooth starts growing where it shouldn't.

Sent: 2003-08-28

You mentioned the Human Genome Project and you are right about the "bits" part. Most people don't know that a new informal project, called the Human Proteome Project, will attempt to sequence all known proteins in terms of amino acids (as well as their physical structure in three dimensions). Once you know which proteins have which sequence, you can go back to the Human Genome database to annotate it (e.g. this DNA sequence produces that protein structure). This is very similar to what you would do when hacking a binary program (like Windows) then working backwards to first produce assembly language then annotate further until you have the original source code instructions (e.g. C/C++)

You don't need to be a genius to recognize that biological sciences lurched forward about the same time when computer systems dropped in price while becoming much more powerful. The same thing happened in space sciences: apparently the amount of information coming from the Hubble Space Telescope is an embarrassment of new knowledge. Before Hubble, if you would have given a lecture on dark energy or dark matter (aside from missing matter) you would have been laughed out of the profession.

The "Folding@Home" project (as well as other similar projects based upon BOINC) is a new twist on parallel computing individual PCs are doing molecular analysis for new drugs and diseases.

Wikipedia Links:

Protein Data Bank Links

Local Links

Back to Home
Neil Rieck
Waterloo, Ontario, Canada.