How many Genes? A Short History of Off-Target Estimates

The late Lewis Thomas referred to medicine as the youngest science. In essence, medicine is a combination of sciences and techniques,  an ensemble which mutates faster than the key concepts of more mature fields like physics and chemistry. In biology, while most of our knowledge about the types of cell organelles or the number of human bones won’t change radically in the future, the same cannot be said about the branch of genetics.

NyTimesI was reminded of that while looking at the front page article of the New York Times from 1978, “Doctors Isolate a Gene, Allowing Birth Defect-Detection”. The article mentions a scientists’ estimate of 3 to 4 million genes per cell.  About 13 years earlier, the “genetic code” had been cracked. But in discovering the triplet codes used by DNA and messenger RNA in transcribing and translating, respectively, biologists had only figured out how the cell knew which amino acids to use and how to bond them in the correct order while assembling proteins.They had no idea of how long each set of instructions(gene) was or how many genes  were programming for those large versatile macromolecules known as proteins, essentially about 50% of the cell’s dry weight.

Taken from

The first wild guesses of  the 1960s about the number of genes in humans was based on the number of nucleotide bases and it was in the order of 6 to 7 million. Once they started to isolate genes (at the time of that NY Times article) and by the time they initiated the human genome project in 1990, which eventually did a base sequence of all of our DNA, the estimate had come down to 100 000. Over a decade later, a refined analysis led to a revised total of no more than one fifth of that. In fact, the lowest estimate has now come down to about 17000.

To see another reason why the gene total was exaggerated, we have to look at introns. Interestingly in 1977, a year before the printing of this article, introns had been discovered independently by a pair of future Nobelists, Phillip Sharp and Richard Roberts. It turned out that these introns, 98% of the DNA base sequences, did not actually show up on the messenger RNA. Only a small fraction was actually coding for proteins. Some scientists even made the blunder of dubbing it “junk DNA”. But nothing could be further from our current approximation of the truth. Now called intergenic DNA, here’s what it does:

(1) It plays a key role in regulation of reactions in a cell, controlling which genes are turned “on” or “off” at any given time. In essence intergenic DNA takes on a role previously assumed to belong only to protein regulators.

(2) It’s also responsible for “alternative splicing,” This involves combining different coding areas of a gene that are in between the non-coding zones, allowing more than one protein to be made from a single gene. Since there isn’t a gene behind every version of a protein, it reduces the required number of coding ones.

Because of the way the topic of genes is still covered in most high schools, the public still has some serious misconceptions about genes. Most human traits are not controlled by Mendelian inheritance. There are only a handful of characteristics that can be explained by dominant and recessive alleles. Even for something as relatively superficial as height, it is a trait controlled by many genes. Equally important is that the expression of these genes is strongly influenced by diet. The inheritance of eye color is also complicated. Eye colors have been divided into nine categories and a pair of genes on chromosome 15 play a major role in determining their color. However, it’s also influenced by a variation in at least 10 other genes, plus complicated interactions between these genes.(reviewed in Sturm and Larsson 2009, with more recent results in Liu et al. 2010 and Pospiech et al. 2011). 

Returning to the NY Times article, the anemias referred to were thalassemias. According to an New England Journal of Medicine article published on that same day, the gene identified led to the detection of a thalassemia-type from amniotic fluid. Thalassemias are molecular diseases—either a pair of genes for two of hemoglobin’s four protein chains is missing (alpha form) or one to two genes is altered (beta form). These diseases occur most often among people of Italian, Greek, Middle Eastern, Southern Asian, and African descent, in areas endemic to malaria. The concentration of the disorders is connected to the fact that that thalassemia genes offer some protection against the malaria parasite. Currently, if parents are predisposed to the disease, they are given genetic counselling. For women who already pregnant tests done on amoniotic fluid or tissue reveal whether the baby has a form of thalassemia and the potential severity.

The gene therapy and correction of the disorder that was envisioned in the 1980s has yet to materialize. Nature’s genetic secrets run deeper than we imagined. And as Eric Lander formerly of the Human genome Project said, “Going from the germ theory of disease to antibiotics that saved people’s lives took 60 years. We might beat that. But anybody who thought in the year 2000 that we’d see cures in 2010 was smoking something.”

Or believing what he read in the media.


Stanford Tech: Understanding Genetics 

BioMed Central

MIT Technology Review

New England Journal of Medicine

α+ -Thalassemia and Protection from Malaria

National Institute of Health 

A Burglar Mystery

A home invader was masked, gloved and meticulous enough to vacuum and scrub the victim’s apartment so as not to leave any hair or skin cells for forensic analysis. The vacuum bag was not left behind. He had used the toilet to urinate but brushed the bowl for five minutes. And he subsequently flushed two or three more times.
There is usually no DNA present in urine, but it was possible that the burglar was unaware of the fact. And few men realize how much splashing urinating actually causes when one urinates while standing. The side of the counter next to the bowl was stained with a few black spots. Laboratory analysis revealed that the spots were not mold but contained urea and dark brown products derived fromhomogentisic acid.”Well, something is definitely known about the crook”, said the forensic expert at the lab.

What had they stumbled upon?

The burglar suffered from a metabolic disorder known as alcaptonuria. (The scenario is fictional.)

In people without the condition, an enzyme converts a derivative(homogentisic acid) of the amino acids phenylalanine or tyrosine into maleyl acetoacetate, which eventually is broken down into colorless ions.
But in people who suffer from the condition, homogentisic acid accumulates and is excreted in the urine. On standing for a couple of hours, homogentisic acid is eventually oxidized and turns into a dark melanin-like substance. The long-term effects of the disorder include spinal and joint damage.

Archibald Garrod, the physician who also shed light on gout, realized that the condition was not due to bacteria but to some metabolic disorder. Known as black diaper syndrome, it was observed in newborns who lack bacterial colonies. Garrod also noticed that alcaptonuria is rare in the general population (1 in 100 000 to 1 in 250 000) but more common in Dominicans, Slovakians and among children of first cousins. In 1995, Spanish scientists found a gene in fungus that codes for homogentisate dioxygenase. In patients with alcaptonuria a single gene has one of a few possible mutations that lead to the production of the dysfunctional enzyme. It is one of those molecular diseases where two copies of a defective gene must be carried for the disease to manifest itself.


Alcaptonuria diagnosed in a 4-month-old baby girl: a case report. Cases Journal 2008, 1:308

Natural History of Alcaptonuria N Engl J Med 2002; 347:2111-2121 December 26, 2002

Up ↑