The poetry of genetics: or reading a genetic sequence — a literary model for cellular mechanisms

by Johannes Borgstein


The human-genome project makes the subtle promise that once all the human chromosomes are mapped we will be in a position to determine the genetic make-up of each individual, and, as a natural consequence, be able to “correct” many of the genetic errors encountered (while carefully avoiding any allusion to the possibilities of misuse).
However, the human genome, as the infinite variety and expression of characteristics demonstrates, is vastly more complex than the sequence of codons would imply, for they can be read in different sequences depending on where the reading starts, which sequences are read, and which ones are suppressed—as a book that has several stories intermingled. To follow only one story, words or passages must be skipped in different places, whereas in other parts, continuous sequences are read.
We may conveniently make an analogy with a sequence of letters, rather than of words, which are followed in variable order, with variable starting sequence. A complex code is thus required to interpret it.
Most classic literary works, furthermore, may be read at multiple levels; generally speaking, the better the book, the more levels may be read in it. A Shakespeare play, for example, may be interpreted as a simple story, suitable for children; a complex story, interpreted by adults; a collection of aphorisms and sayings; or a source of life’s wisdom. Similarly, by analogy, there are multiple levels to the human genome, whose expression varies in response to environmental factors, so as to weave a complex fabric of life at a number of levels and layers which make it extremely complex to interpret.

How the genetic sequences may be read

Through a simple model or analogy, we can explore how a series of genetic sequences may be read in the cell. It is likely that, in reality, it is far more complex at all levels, with a larger number of intertwined “messages”, and that further higher levels of complexity exist in the expression within the cell, leaving aside for now all the possible extracellular effects of the proteins formed. Nevertheless, the analogy gives us some idea of what we are dealing with, and how difficult an interference or “correction” would be at any of these levels.

Let us take the following sequence of letters:

Ikeeptoseeaworldsixhonestservingmen(theytaughtmeall Iknew)inagrainofsandtheirnamesarewhatandaHeavenin awildflowerwhyandholdinfinitywheninthepalmofyourha ndandhowandwhereandwhoeternityinanhour (level 1)

Level 1—letter sequence in Latin script (genetic sequence)
Level 2—language (English)
Level 3—separate words
Level 4—indication of sequence in which mixed messages should be read
Level 5—separate poems (or proteins?)
Level 6—meaning: elementary
Level 7—complex, abstract concepts

With the knowledge that the sequence is written in the English language (level 2), I may begin, with some difficulty, to make out the words:

I keep to see a world six honest serving men (they taught me all I knew) in a grain of sand their names are what and a Heaven in a wild flower why and hold infinity when in the palm of your hand and how and where and who eternity in an hour (level 3)

I then need some knowledge of literature and poetry to be able to separate the phrases, which belong together and are to be read sequentially:

I keep to see a world six honest serving men (they taught me all I knew) in a grain of sand their names are what and a Heaven in a wild flower why and hold infinity when in the palm of your hand and how and where and who eternity in an hour (level 4)

Until, finally, the two quatrains are set down separately:

To see a world in a grain of sand and a Heaven in a wild flower hold infinity in the palm of your hand and eternity in an hour. (William Blake)1

I keep six honest serving men (they taught me all I knew) their names are What and Why and When and How and Where and Who. (Rudyard Kipling)2

It is then largely a matter of maturity, education, and environment that determines what these poems mean to me, and how I capture the different levels and use or transmit the implied concepts.

Thus, at least seven levels (panel) may be distinguished in this very simple model of a DNA sequence. The first level is the interpretation of the individual sequence of Latin letters or bases. (One could conceive, perhaps, of one lower level in which the signs need to be interpreted as letters.) The second level requires us to be conversant with the language in which the letters are written, so that the third level permits identification of whole words out of the continuous sequence of letters; from this sequence, in the fourth level, we attempt to make out the phrases that under certain circumstances belong together, but which have been intermingled (some knowledge of the authors involved is probably necessary, and the genetic code must carry instructions as to which sequences should be read and which ones are suppressed). The fifth level of interpretation is to select the separate poems or protein instructions, which then go through a number of subsequent steps, just as a poem may be read on various levels. The purely visual imagery that a child might capture of sand and flowers and the rhythm of the language, and the adult interpretation of the complex abstract ideas, sensations, and emotions that the poem induces, makes them different for everybody—though with adequate emotional similarities for us to identify with the poet and with our fellow reader.
The actual DNA contains a large number of intermingled messages that not only control protein synthesis but also the expression or suppression of other messages.
With our present knowledge, we are only just beginning to interpret the letter sequence. To extrapolate from our model to human genetic engineering (as is too readily assumed and, at times, probably even practised) has further implications.
To insert a viral-linked sequence of genetic material into the correct section of the right chromosome—as has been suggested and attempted for “correction” of genetic defects encountered—is tantamount to throwing a dart at a small distant target, blindfolded. Moreover, it raises questions such as: how can we be sure the sequence will be accommodated into the right place? How do we know it will be expressed correctly? How can we be certain it will not have undesirable side-effects? And how can we be sure the viral “carrier” does not affect the sequence or have other side-effects?

A virus will merge into the genetic sequence at a predestined site (for the virus), which is unlikely to coincide with our chosen site. It will be a matter of chance that it is expressed at all, and even if not expressed, it may interfere with the expression or suppression of other sequences with unpredictable results. Then, we should enquire what the function of the virus is in the first place, and what its other sequences are able to affect.

In theory, astonishing results may be obtained, but there are too many uncertainties, too many unanswered questions and variables, and probably hazardous consequences that are inadequately considered. To trust to chance is perhaps too simplistic, and even then it may work against us with unforeseen complications (and how can we foresee all the possible complications of a process so little understood?).

The expression of the genetic code may thus be viewed as a language with almost limitless possibilities of expression within the framework of a fixed alphabet (four base pairs and a zero making five possible symbols?) and a structured grammar. Were it otherwise, physical expression would repeat and duplicate itself rather than giving rise to circumstances in which, despite overpopulation, there are not two people alike in the world; or two leaves of a tree for that matter.

Genetic expression is modulated, as a language, by the environment (a language only developed in a social context). The surrounding cells somehow determine the expression and differentiation initally, followed by the addition of neural and more centralised humoural mechanisms as the organism grows in complexity, and, finally, by external environmental factors (think only of the calusses on the hands of a gymnast, where a purely mechanical stimulus induces thickened skin layers). Some environmental stimuli induce a whole series of “programmed” changes, as occurs in the developing embryo, whereas others may induce only minor modulations. All these factors contribute to a unique physical expression—even among identical twins, despite a variable resemblance at some levels. Although the leaves are all different, they are similar enough for us to identify the tree they came from. One of the striking conditions of living systems is that nothing is ever exactly the same; nothing can be static or in equilibrium. To state that evolution is the result of random mutations is akin to assuming that random typing by a monkey will produce the complete works of Shakespeare if we wait long enough—an overly simplistic concept that takes no account of grammar and language, let alone of meaning at its different levels. The poetry of genetics runs a lot deeper than we suspect; perhaps deeper than we can suspect.


Borgstein J. The poetry of genetics: or reading a genetic sequence--a literary model for cellular mechanisms. Lancet 1998 May 2;351(9112):1353-4.