Junk DNA: A Journey Through the Dark Matter of the Genome (33 page)

BOOK: Junk DNA: A Journey Through the Dark Matter of the Genome
11.48Mb size Format: txt, pdf, ePub

Figure 16.5
A sequence in the untranslated region at the end of a messenger RNA attracts an enzyme (shown by the scissors) that binds at a specific site and then cuts the molecule a little further along. Lots of A bases are added to the cut end of the messenger RNA molecule, even though these were not coded for in the original DNA sequence.

There is a critical motif in the untranslated region at the end of the messenger RNA. This is shown by the triangle in Figure 16.5 and is called the polyadenylation signal (the A base is adenosine, so adding lots of A bases is called polyadenylation). This is a sequence of six bases (AAUAAA) within the junk of the untranslated region. It acts as a signal for a messenger RNA-processing enzyme. The enzyme recognises the six-base motif, and cuts the messenger RNA a little distance away, usually ten to 30 bases further downstream. Once the messenger RNA has been cut in this way, another enzyme can add the multiple A bases.
e

This six-base motif often occurs many times in the same untranslated region. It’s not particularly clear how a cell ‘chooses’ which motif to use at any one time. It is probably influenced by other factors in the cell. But because there are multiple motifs that can be used, there may be multiple messenger RNAs that code for exactly the same protein, but which contain different lengths of the untranslated region before the multiple As. These different-length messenger RNAs will have different stabilities and so produce different amounts of protein from each other. This creates additional opportunity for fine-tuning the amount of protein that is produced.
21

There’s a very unusual genetic condition in humans called IPEX syndrome.
f
It’s a fatal autoimmune disease in which the body attacks and destroys its own tissues. Cells lining the intestine are attacked, resulting in severe diarrhoea in young infants and a failure to thrive. The glands that produce hormones can also be attacked, leading to conditions that include type 1 diabetes, where patients can’t produce insulin. The thyroid gland may also be targeted, resulting in underactivity.
22

Rare cases of IPEX syndrome are caused by a mutation in the polyadenylation signal. Instead of the normal AAUAAA sequence, there is a single base change. As a consequence, the six-base sequence becomes AAUGAA and no longer acts as a target for the cutting enzyme.
23

The gene where this change occurs codes for a protein that switches on other genes.
g
This protein is required to control a particular type of immune cell.
h
In some genes the change in a single six-base motif might not be that serious a problem, because the cell would use other, nearby, normal six-base sequences in the same
untranslated region. This might disrupt fine-tuning a little, but we wouldn’t expect to see anything as severe as IPEX syndrome. The problem arises in IPEX because the untranslated region of this gene contains hardly any other suitable six-base motifs to act as signals for polyadenylation. The mutation in the untranslated region means that the messenger RNA isn’t cut properly, A bases aren’t added and the messenger RNA is very unstable. Because of this, the cells produce hardly any of this protein. Essentially, the effects of the mutation in this junk motif are as bad as if the protein-coding region itself had been disrupted.

It’s only fairly recently, as sequencing technologies have become cheaper, that researchers have really started analysing the untranslated regions of messenger RNA molecules to identify mutations that cause rare instances of serious diseases. We can be pretty confident that over the next few years we will see many more examples of this. One of the reasons we can be bullish about this prediction is that researchers may have already identified another such example.

Amyotrophic lateral sclerosis (ALS), also known as motor neuron disease or Lou Gehrig’s disease, is a devastating disorder. Neurons in the brain and spinal cord which control muscle movement die off progressively. Sufferers become increasingly wasted and paralysed, unable to talk, swallow or breathe properly.
24
The cosmologist Stephen Hawking suffers from ALS, although his case is rather atypical. He was first diagnosed at the age of 21, whereas most people with ALS develop their first symptoms in middle age. Professor Hawking has survived for over 50 years with the condition, but sadly most patients die within five years of diagnosis, although this period may be increasing with better medical intervention.

There is much that we still don’t understand about ALS. Less than 10 per cent of cases run in families. In the other 90 per cent there may be variations in DNA that predispose someone to the condition if they encounter environmental triggers (which we can’t
yet identify). Some patients may also have a mutation that is sufficient on its own to cause the condition, even without a family history of the disorder. This mutation may have arisen in the eggs or sperm of their parents, for example.
25

One of the genes involved in ALS is believed to be responsible for 4 per cent of cases that run in families, and 1 per cent of cases that occur without a family history.
i
,
26
,
27
,
28
In all the original cases involving this gene, the mutations were in the protein-coding regions. Researchers have now identified four different variants in the untranslated region at the end of this gene. These were found in patients with ALS who didn’t have any other known mutations. Although these could just be harmless variations, the distribution of the protein and its expression levels were abnormal in the cells from these patients. These findings are at least suggestive that the changes in the untranslated region led to abnormalities in the processing and translation of the protein itself, leading to disease.
29

Footnotes

a
Osteogenesis imperfecta type 5.

b
These are often referred to in the literature as UTRs, for untranslated regions. The one at the beginning of the messenger RNA is called the 5′UTR and the one at the end of the messenger RNA is called the 3′UTR.

c
The gene is called IFITM5.

d
This protein is called Muscleblind-like protein 1, or MBNL1.

e
This is known as a non-templated change because there is no underlying DNA template for these A bases in the genome.

f
IPEX stands for Immunodysregulation, Polyendocrinopathy, Enteropathy, X-linked.

g
FOXP3, a transcription factor.

h
Regulatory T cells.

i
The gene is called FUS – Fused in Sarcoma.

17. Why LEGO is Better Than Airfix

Most children, and quite a few adults, enjoy making models. There are various ways of doing this but let’s just look at the extremes. One of the most popular formats in the UK for over 30 years was the Airfix kit. Small plastic parts specific to an aircraft, ship, tank or just about anything else you can think of (Bengal Lancer, anyone?) were supplied with detailed instructions. The user glued the parts together, painted them, applied transfers and admired the finished article for years after.

At the other extreme is that universal Danish toy of which I am so fond, LEGO. Although there are lots of specialist LEGO kits now, the concept remains the same as ever. A relatively limited number of components that can be joined together in any combination the user wants. And the model can always be split back down into its original bricks and reused to create something else.

Simple organisms like bacteria tend more to the Airfix way of life. Their genes are fairly set, coding for just one protein. The more complex an organism becomes, the more the genome begins to resemble LEGO, with a much greater degree of flexibility in how the components are used. And when we think how extraordinary we humans are, it seems reasonable to say, in a nod to a certain movie, that at the genomic level, ‘everything is awesome’.
a

An extreme version of this phenomenon is the splicing that our cells can use to create multiple related proteins from one gene, as
in Figure 2.5 (
page 18
). This ability to use the components of a gene in multiple different ways creates enormous flexibility and added opportunities for an organism. We can get some idea of the amount of variability that is possible by looking at some of the numbers involved. Human genes contain an average of eight amino acid-coding regions, each separated by an intervening stretch of junk DNA.
b
At least 70 per cent of human genes have been shown to create at least two proteins.
1
This is achieved by joining up different amino acid-coding stretches. Using our DEPARTING example (as in Figure 2.5), this allows us to produce the protein DART but also the protein TIN. The ability to create different proteins this way is known as alternative splicing.

The regions that code for amino acids are short compared with the intervening junk regions. The stretches that code for amino acids have an average length of about 140 base pairs, but they can be surrounded by junk regions that are several thousand base pairs in length.
2
About 90 per cent of the base pairs in a gene are from the intervening sequences, not from the amino acid-coding ones. If we think of this in terms of the English language, this immediately shows us some of the problems the cell faces.

Imagine you meet someone and are extraordinarily smitten by them. You have heard that they love poetry, and you want to sweep them off their feet, but you always skipped literature class in school. A friend gives you a piece of paper with a killer first line from a poem on it. But for some reason your vaguely sociopathic friend has split up the words of the first line among a load of gibberish, and you only have a couple of seconds to find the poetry, say it out loud and win someone’s heart (or at least their attention). Can you do it? Take a very quick look at Figure 17.1 and find out.

Figure 17.1
A glorious line of poetry is in here somewhere. Have a quick look: can you find the words that will win someone’s heart?

This is what our cells do all the time, every moment of every day of our lives. Machinery in the cell analyses a long stretch of apparent gibberish and almost instantaneously finds the hidden words and joins them together. You can take a look at Figure 17.2 to see if you managed to compete with the non-sentient proteins that keep you alive.

Other books

Cat's Meow by Melissa de la Cruz
Crossing Borders by Z. A. Maxfield
The Drowning Lesson by Jane Shemilt
The Ghost of Christmas Never by Linda V. Palmer
Night Arrant by Gary Gygax
Waiting for a Girl Like You by Christa Maurice
Deadly Stillwater by Stelljes, Roger