How many Genes? A Short History of Off-Target Estimates

The late Lewis Thomas referred to medicine as the youngest science. In essence, medicine is a combination of sciences and techniques,  an ensemble which mutates faster than the key concepts of more mature fields like physics and chemistry. In biology, while most of our knowledge about the types of cell organelles or the number of human bones won’t change radically in the future, the same cannot be said about the branch of genetics.

NyTimesI was reminded of that while looking at the front page article of the New York Times from 1978, “Doctors Isolate a Gene, Allowing Birth Defect-Detection”. The article mentions a scientists’ estimate of 3 to 4 million genes per cell.  About 13 years earlier, the “genetic code” had been cracked. But in discovering the triplet codes used by DNA and messenger RNA in transcribing and translating, respectively, biologists had only figured out how the cell knew which amino acids to use and how to bond them in the correct order while assembling proteins.They had no idea of how long each set of instructions(gene) was or how many genes  were programming for those large versatile macromolecules known as proteins, essentially about 50% of the cell’s dry weight.

Taken from

The first wild guesses of  the 1960s about the number of genes in humans was based on the number of nucleotide bases and it was in the order of 6 to 7 million. Once they started to isolate genes (at the time of that NY Times article) and by the time they initiated the human genome project in 1990, which eventually did a base sequence of all of our DNA, the estimate had come down to 100 000. Over a decade later, a refined analysis led to a revised total of no more than one fifth of that. In fact, the lowest estimate has now come down to about 17000.

To see another reason why the gene total was exaggerated, we have to look at introns. Interestingly in 1977, a year before the printing of this article, introns had been discovered independently by a pair of future Nobelists, Phillip Sharp and Richard Roberts. It turned out that these introns, 98% of the DNA base sequences, did not actually show up on the messenger RNA. Only a small fraction was actually coding for proteins. Some scientists even made the blunder of dubbing it “junk DNA”. But nothing could be further from our current approximation of the truth. Now called intergenic DNA, here’s what it does:

(1) It plays a key role in regulation of reactions in a cell, controlling which genes are turned “on” or “off” at any given time. In essence intergenic DNA takes on a role previously assumed to belong only to protein regulators.

(2) It’s also responsible for “alternative splicing,” This involves combining different coding areas of a gene that are in between the non-coding zones, allowing more than one protein to be made from a single gene. Since there isn’t a gene behind every version of a protein, it reduces the required number of coding ones.

Because of the way the topic of genes is still covered in most high schools, the public still has some serious misconceptions about genes. Most human traits are not controlled by Mendelian inheritance. There are only a handful of characteristics that can be explained by dominant and recessive alleles. Even for something as relatively superficial as height, it is a trait controlled by many genes. Equally important is that the expression of these genes is strongly influenced by diet. The inheritance of eye color is also complicated. Eye colors have been divided into nine categories and a pair of genes on chromosome 15 play a major role in determining their color. However, it’s also influenced by a variation in at least 10 other genes, plus complicated interactions between these genes.(reviewed in Sturm and Larsson 2009, with more recent results in Liu et al. 2010 and Pospiech et al. 2011). 

Returning to the NY Times article, the anemias referred to were thalassemias. According to an New England Journal of Medicine article published on that same day, the gene identified led to the detection of a thalassemia-type from amniotic fluid. Thalassemias are molecular diseases—either a pair of genes for two of hemoglobin’s four protein chains is missing (alpha form) or one to two genes is altered (beta form). These diseases occur most often among people of Italian, Greek, Middle Eastern, Southern Asian, and African descent, in areas endemic to malaria. The concentration of the disorders is connected to the fact that that thalassemia genes offer some protection against the malaria parasite. Currently, if parents are predisposed to the disease, they are given genetic counselling. For women who already pregnant tests done on amoniotic fluid or tissue reveal whether the baby has a form of thalassemia and the potential severity.

The gene therapy and correction of the disorder that was envisioned in the 1980s has yet to materialize. Nature’s genetic secrets run deeper than we imagined. And as Eric Lander formerly of the Human genome Project said, “Going from the germ theory of disease to antibiotics that saved people’s lives took 60 years. We might beat that. But anybody who thought in the year 2000 that we’d see cures in 2010 was smoking something.”

Or believing what he read in the media.


Stanford Tech: Understanding Genetics 

BioMed Central

MIT Technology Review

New England Journal of Medicine

α+ -Thalassemia and Protection from Malaria

National Institute of Health 


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s