Hacker Perspective: Genes as Technology (Information Coding)

This document expresses some thoughts which have been rolling around my mind since 1980 when I rekindled by my original interest in biology. At that time, I was a computer technologist who had recently decided that a career in software design might offer more stability than one in hardware maintenance. That same year I stumbled upon "The Eighth Day of Creation" in a Toronto book store when something clicked in my head. This web-page will be successful if I can light a similar spark in at least one other person. Enjoy!

Neil Rieck (1998-01-23)

Computer Information Technology

  • Notation
    • Probably because humans have 10 fingers, our most common numbering system is based upon base-10 which employs ten symbols (0 to 9). When we require more than ten symbols, we left-shift, then reuse.
    • Computers represent information in base-2 notation (also known as binary digit or bit notation) where a single bit can represent two binary states:
      • 0 usually represents OFF (either "no current" or "no voltage")
      • 1 usually represents ON (either "some specified current" or "some specified voltage")
    • Base-2 means there are only two symbols. When we need a value called 2, we set the current bit to zero then carry a one to the next location (a.k.a. binary "10" means decimal "2")
  • Storage
    • When stored in dynamic memory, one transistor (not counting addressing, refresh, or buffer circuits) is required to store one bit (two states: on and off).
    • When stored in static memory, six transistors (not counting addressing or buffer circuits) are required to store one bit (two states: on and off).
    • When stored in EEPROM memory, one floating gate (not counting addressing, programming, or buffer circuits) is required to store one bit (two states: on and off).
    • In Harvard processor architectures, instructions are stored in one memory while data is stored in another.
    • In Von Neumann processor architectures, instructions and data are mixed in the same single memory.
    • so three binary bits (each capable of 2 states, on and off) can represent 8 decimal numbers (2x2x2=8) with values of 0 to 7.
      data bits equivalent decimal value
      000 (all off) 0
      001 1
      010 2
      011 3
      100 4
      101 5
      110 6
      111 (all on) 7
    • When stored outside of the computer, binary data can be represented by:
      • a hole (or not) on paper tape
      • a hole (or not) on punched cards
      • a magnetic North or South on magnetic tape or disk
      • a magnetic flux reversal on magnetic tape or disk (phase encoding)
      • a pit (or not) on a manufactured CR-ROM or DVD
      • etc.
  • Data Grouping
    • 08 bits are grouped to form a byte (also known as a binary term)
    • 16 bits are grouped to form a word (2 bytes)
    • 32 bits are grouped to form a long word (2 words or 4 bytes)
    • 64 bits are grouped to form a quad word (4 words or 8 bytes)
    • 80 bits are grouped to form IEEE floating point numbers
  • Instruction Grouping
    • This depends on the processor. Some simple appliance (eg. microwave ovens) CPUs only require 4-bits to represent a single instruction like MOVE, ADD, or STORE
    • Most 8-bit, 16-bit, 32-bit, and 64-bit processors support instructions as small as 8-bits.
    • Because the first memory systems were expensive, it made more sense to do as much as possible with each instruction. These systems were classified as CISC (Complex Instruction Set Computers). As memory became cheaper engineers decided they could do more things in parallel and out of order only if the instruction set was simplified. So a new technology known as RISC (Reduced Instruction Set Computers) was developed which can only really be described as "Relegate Important Stuff to the Compiler"
    • Newer CPUs support VLIW (Very Long Instruction Words) technology which is referred to by some as "Variable Length Instruction Words"
    • Some instruction addressing modes will increase the length of basic instructions by including:
      • data (immediate mode)
      • address (absolute mode)
      • pointer (indirect mode)

    Summary: This technology is based upon the simplicity of bits and bytes which many people take for granted after seeing binary demonstrations using everything from ping pong balls to light bulbs. In order for this simple representation to work, very complicated circuitry is required to provide memory, storage, and instruction processing.

    Points:

    • storing a bit inside either 1 transistor (dynamic memory) or 6 transistors (static memory) seems simple but overlooks the complex technology behind the semiconductor industry. For example; will a binary 1 be represented as 5 volts, 15 volts, or a stored charge? How will the transistors be connected to each other? How will they be connected to the outside world?
    • retrieving data from memory sounds simple enough, but grouping thousands to millions of transistors into addressing circuits in order to return only the desired data, is easier said than done.
    • executing instructions retrieved from memory also sounds simple enough, but grouping millions to billions of transistors into instruction processing circuits is very difficult.

Genetic Information Technology

  • Notation
    • All biology on Earth represents genetic information in a base-4 notation known as DNA (DeoxyriboNucleic Acid).
  • Storage
    • This chemical data format looks like a twisted ladder where the rails are composed of phosphorous and sugars (ribose) while the rungs are made up of complementary base sequences. This twisted appearance is why DNA is called the double helix.
    • Complimentary base sequences:
      • purine (larger base molecule; a double ring, nitrogen containing base)
        • Adenine
        • Guanine
      • pyrimidine (smaller base molecule; a single ring, nitrogen containing base)
        • Cytosine
        • Thymine
      • because of space restrictions between the rails of the ladder, a purine is always joined to a pyrimidine on each rung
      • Adenine on one rail always connects to Thymine on the opposite rail
      • Cytosine on one rail always connects to Guanine on the other rail
  • Data Grouping
    • Unknown; but data seems to be embedded with instructions inside the DNA. In this respect, DNA seems to be similar to the Von Neumann processor architecture mentioned above where code and data reside in the same object.
  • Instruction Grouping
    • Three "base sequences" are known as a codon
    • Since each base position could (in theory) have one out of four different base sequences, one codon could (in theory) represent 64 different values. (4x4x4=64)
    • a variable group of codons (depends on the information) represents a gene
  • Instruction Processing
    • This depends on the processor:
      • Protein Synthesis:
        • In a eukaryote (a cell with a nucleus), DNA is found only inside the nucleus. During protein synthesis, transcription enzymes copy small segments of the DNA to produce a molecule called mRNA (a.k.a. Messenger RiboNucleic Acid)
        • During DNA transcription, when ever the transcription enzymes encounter Adenine, Guanine, or Cytosine in the DNA source, the same base chemical is put into the destination RNA. However, when Thymine is encountered in DNA, the base chemical Uracil is written into the destination RNA molecule.
        • When DNA-RNA transcription is complete, the mRNA molecule is transported from cell's nucleus into the cell's main body to be processed by an organelle known as a ribosome.
        • when a 3 base codon is read, it specifies to the ribosome which amino acid to use next (amino Acids are the pearls which define the protein necklace). Using the following "Genetic Code Table" (which is not used by mitochondria), we can see that the codon sequence of UGG specifies the symbol trp which represents the amino acid tryptophan (see the "Amino Acid Symbol Table" further down). We can also see that codons UAA, UAG and UGA all specify the punctuation symbol stop. The ribosome employs a molecule called Transfer RNA (a.k.a. tRNA) to deliver amino acids to the site of protein assembly.
    Genetic Code vs. Mitochondrial Code"
    Genetic Code (Codon Translation) Mitochondrial Genetic Code
    yellow = differences from left-hand table
    1st 2nd 3rd
    ---U--- ---C--- ---A--- ---G---
    U phe
    phe
    leu
    leu
    ser
    ser
    ser
    ser
    tyr
    tyr
    stop 2
    stop 1
    cys
    cys
    stop 3
    trp
    U
    C
    A
    G
    C leu
    leu
    leu
    leu
    pro
    pro
    pro
    pro
    his
    his
    gln
    gln
    arg
    arg
    arg
    arg
    U
    C
    A
    G
    A ile
    ile
    ile
    met/start 4
    thr
    thr
    thr
    thr
    asn
    asn
    lys
    lys
    ser
    ser
    arg
    arg
    U
    C
    A
    G
    G val
    val
    val
    val
    ala
    ala
    ala
    ala
    asp
    asp
    glu
    glu
    gly
    gly
    gly
    gly
    U
    C
    A
    G
    1st 2nd 3rd
    ---U--- ---C--- ---A--- ---G---
    U phe
    phe
    leu
    leu
    ser
    ser
    ser
    ser
    tyr
    tyr
    stop
    stop
    cys
    cys
    trp
    trp
    U
    C
    A
    G
    C leu
    leu
    leu
    leu
    pro
    pro
    pro
    pro
    his
    his
    gln
    gln
    arg
    arg
    arg
    arg
    U
    C
    A
    G
    A ile
    ile
    met
    met
    thr
    thr
    thr
    thr
    asn
    asn
    lys
    lys
    ser
    ser
    stop
    stop
    U
    C
    A
    G
    G val
    val
    val
    val
    ala
    ala
    ala
    ala
    asp
    asp
    glu
    glu
    gly
    gly
    gly
    gly
    U
    C
    A
    G
    Superscripts:
    1. Listed as "nonsense code 1 (amber)" in book 1 below
    2. Listed as "nonsense code 2 (ochre)" in book 1 below
    3. Listed as "nonsense code 3" in book 1 below
    4. Listed as "start" in book 2 below

    References:

    1. page 489, "The Eighth Day of Creation" by Horace Judson (1980 edition, Touchstone paperback; a much newer edition is available)
    2. page 61, "Genethics - The Ethics of Engineering Life" by David Suzuki (1988 hardcover edition)
    References:
    1. page 69, "Unraveling DNA - the most important molecule of life" by Maxim D. Frank-Kamenetskii (1997 paperback edition)

    Amino Acid Symbol Table (for codon tables above)

    Symbol Amino Acid (or Function)
    ala alanine
    asn asparagine
    asp aspartic acid
    arg arginine
    cys cysteine
    gln glutamine
    gly glycine
    glu glutamic acid
    his histine
    ile isoleucine
    leu leucine
    lys lysine
    met methionine (and/or punctuation = start)
    phe phenylalanine
    pro proline
    ser serine
    thr threonine
    trp tryptophan
    tyr tyrosine
    val valine
    stop punctuation = stop (stop protein synthesis)

    Summary: Even though this technology is based upon the simplicity of base 4 math, instruction storage (DNA), instruction fetching (DNA to RNA transcription), and instruction execution (RNA to protein synthesis in the ribosome) make this information technology much more complicated than it would first seem.

    Points:

    • notice that the coded information in the DNA doesn't build up protein from scratch elements, it specifies amino acids which are already very complicated molecular structures.
    • enzymes are catalytic proteins that assist all processes some of which include; food digestion, growth, repair, transcription, and replication. These enzymes must be manufactured ahead of time before other processes can begin. This begs a which-came-first question; enzymes (which are protein) or protein synthesis? Is it possible that some simple proteins, like enzymes, can be built by some other process so this whole thing can boot strap itself? Is it possible that some proteins can be built by reading the DNA directly?
    • Rats can manufacture all 20 specified amino acids required for protein synthesis. Humans can only manufacture 12 amino acids which means that the missing 8 must come from our diet.
    • The CPU-like machine called the ribosome had to be manufactured some how. So how and when? In my opinion, grouping millions to billions of transistors into instruction processing circuits might be child's play compared to building one of these.
    • Wikipedia Links:
    • Ask anyone today "who cracked the genetic code?" and you will hear the names James Watson and Francis Crick. While it is true that these are two names of three to receive a Nobel Prize for their work in this area, they only discovered how information was encoded in nucleic acids. Marshall Nirenberg and Heinrich J. Matthaei are the two scientists credited with cracking the Genetic Code
       

Comparative Technology

Hey, if comparative anatomy is allowed then why not this?

  • Computers
    • a bit represents the known state of a transistor or signal
    • a group of bits represents either an instruction for the CPU, or data
    • bits might be simple, CPUs are not
    • instructions usually run forward for a time. However, some instructions tell the computer to conditionally branch (or jump) forward or backward to memory addresses which contain alternative segments of the program. This is what gives them their decision making properties.
      • Examples:
        1. IF condition THEN jump there
        2. IF condition 1 OR condition 2 THEN jump there
        3. IF condition 1 AND condition 2 THEN jump there
  • Genes
    • one base pair of DNA represents 1/3 of a codon
    • a codon represents the desired amino acid instruction for a ribosome protein assembler
    • base pairs might be simple, ribosomes are not.
    • could it be that the presence of a certain amount of manufactured enzyme acts like a conditional test (GOTO) which then causes a different part of the DNA to be enabled and then transcribed? This would be the basis of a mechanism where different routines and subroutines are conditionally enabled as the cell lives (program executes)
      • to build on this idea further, it probably is true that certain hormones can enable or disable the expression of genes. It now looks like certain elements in our environment (including food) may enhance or suppress these hormones. In some instances it may be possible that certain substances may act upon the DNA directly (environmentally induced cancer?)
    • could it be that in the case of cancer, that a lung cell makes a conditional branch error and starts executing a routine that belongs to a quickly dividing and functionally immortal epithelial cell? There are rare forms of (dermoid) cancer where teeth, hair, and fully formed fingers, are found inside tumor inside the body.

Hmm... I wonder...

  • Man is a base 2 programmer (and we still have to reboot Windows 95 systems a couple of times a day)
  • The creator of this realm appears to be a base 4 programmer.
  • While there does seem to be value in bacteria (nitrogen fixers etc.), I don't know of any value attached to viruses (Measles, Small Pox, Influenza) other than evolutionary pressure.

End Notes:

  1. When I refer to Genetic IT (Information Technology), I am not referring to that branch of computer technology known as Genetic programming or Genetic algorithms. What I am referring to is Genetic Biology, which I consider to be the ultimate technology.
  2. When I refer to ribosomes as CPU's I am not claiming that ribosomes can add and subtract like silicon CPU's. I only claim that there seems to be a lot of similarities between computing technology and protein synthesis. Now if we could find the spot in DNA where brain morphology is defined, then we would have to really give this genetic technology idea some further consideration
  3. Some of this stuff is continued here: genes as technology #2
  4. In 2004 I discovered a very cool online resource called  https://en.wikipedia.org which I have begun to reference in various spots on this page.

Wikipedia Links:

Other Links

Protein Data Bank Links

Local Links

Back to Home
Neil Rieck
Waterloo, Ontario, Canada.