Junk DNA: A Journey Through the Dark Matter of the Genome (15 page)

BOOK: Junk DNA: A Journey Through the Dark Matter of the Genome
2.7Mb size Format: txt, pdf, ePub

Figure 8.1
Representation of how two single-stranded long non-coding RNA molecules with different base sequences can form the same shape as each other. The shapes are determined by pairing of the A and U or C and G bases, which are represented by the differently shaded/patterned boxes. The representation is an over-simplification. In reality, the long non-coding RNAs may have multiple regions that can form complex structures. They will also be three-dimensional, rather than the flat shape shown here.

Logs or chips?

Because of the complications that arise if we try to identify long non-coding RNAs from the human genome sequence, most researchers lean towards the more pragmatic approach of identifying long non-coding RNAs by detecting the molecules themselves
in cells. But there is a considerable degree of conflict in the scientific community about how to interpret the results. Hardcore junk aficionados might claim that if a sequence is expressed as a long non-coding RNA molecule then that molecule is being expressed for a reason. Other scientists are much more sceptical, positing that the expression of the long non-coding RNAs is essentially what we call a bystander event. This means that the long non-coding RNAs are expressed, but just as a by-product of switching on a ‘proper’ gene.

To understand what’s meant by a bystander event, let’s imagine we are cutting up tree branches with a chainsaw. The major aim of our activity is to create logs that we can use to build a cabin or to provide fuel for a stove. We aren’t trying to create woodchips or sawdust, but this happens anyway as a result of the chainsaw function. It’s not worth our while trying to avoid creating the woodchips. They don’t really interfere with our main aim, and if we do find a way to avoid generating them, it might be at the expense of efficient production of the logs. Just occasionally, we may even find that we have a use for the woodchip by-product, using it to mulch a flowerpot, or provide bedding for our pet snake.

In a similar model, the junk sceptics postulate that expression of long non-coding RNA simply reflects a loosening of repression when genes in a particular region are expressed. In this model, the production of long non-coding RNAs is simply an inevitable consequence of an important process, but essentially harmless and insignificant. The believers counter that that fails to address certain aspects of long non-coding RNA expression. For example, different types of long non-coding RNAs are expressed if we examine samples from different brain regions.
11
Enthusiasts for long non-coding RNAs claim this supports their model for the importance of these molecules, because why else would different brain regions switch on different long non-coding RNAs? The sceptics claim
that the different long non-coding RNAs are detected simply because various brain regions switch on different classical protein-coding genes. In our chainsaw analogy, this is equivalent to getting different woodchips depending on whether we are sawing up oak branches or pine.

It’s early days but current data suggest that the extremists on both sides should probably relax a little because the reality is likely to lie somewhere between their two positions. The only way we can really test the hypothesis that long non-coding RNAs have functions in the cell is to test each one, in the correct cell type. Although perfectly sensible as an approach, this isn’t as straightforward as it sounds. Partly this is down to sheer numbers. If we detect hundreds or thousands of different long non-coding RNAs in a cell or tissue, we have to make a decision about which one we want to test. But to do that, we already need to have developed a hypothesis about what that specific long non-coding RNA might do in the cell. Without that hypothesis, we won’t know what effects we should be looking for if we interfere with the expression or function of that molecule.

Another complication is that many of the long non-coding RNAs are found in the same region as classical protein-coding genes. Sometimes they may be in exactly the same position, but encoded on the opposite strand, just as we saw for Xist and Tsix in Chapter 7. Others may be found within the stretches of junk that lie between two amino acid-coding regions in a single gene, which we first encountered in Friedreich’s ataxia in Chapter 2 (
see page 18
). There are lots of ways in which the long non-coding RNAs may be co-located in the same region as protein-coding genes and this creates substantial experimental difficulties if trying to investigate function.

Usually the functions of genes are tested by mutating them. There are all sorts of mutations that can be introduced but the most commonly used will either switch the gene off or will lead
to it being expressed at a higher level than normal. But because so many of the long non-coding RNAs overlap with protein-coding genes, it’s hard to mutate one without mutating the other at the same time. We then face the problem of knowing whether the effects we see are due to the change in the long non-coding RNA or in the protein-coding gene.

A frivolous analogous example may help to visualise this problem. A PhD student was investigating how frogs hear. He had developed an experimental system where he surgically removed certain parts of a frog and then monitored if it could hear a loud noise, in this case a gunshot. One day he rushed in to his supervisor’s office, yelling that he had worked out how frogs hear. ‘They hear with their legs!’ he told his bemused supervisor. When she asked how he could be so sure he said, ‘It’s simple. Normally if I fire the gun, the frog hears it and jumps in fright. But when I remove the frog’s legs it doesn’t jump anymore when I fire the gun, so it must hear through its legs.’
a

Theoretically, of course, it’s also possible that some of the unexpected effects sometimes encountered when we mutate protein-coding genes have been due to unrecognised changes in co-located long non-coding RNAs which we hadn’t even realised were present at the time the experiment was carried out.

Because of this potential collateral damage to protein-coding genes, many researchers are focusing their efforts on a subset of long non-coding RNAs which don’t overlap these regions. There’s plenty of choice, as there are at least 3,500 long non-coding RNAs in this category. There is a tendency in the literature to refer to these more distant long non-coding RNAs as a special class, and they have been given a separate name.
b
12
But it’s worth remembering that if we do this, we are classifying these molecules by what they are not, i.e. they aren’t co-located with protein-coding genes. This could mean that we lump together large numbers of long non-coding RNAs in one class when really they may turn out to be functionally quite distinct from each other.

The rush to create categories and nomenclature has been, and continues to be, a real problem in the whole field of genome analysis because it tends to lock us in to definitions before we really have enough biological understanding to create relevant categories. Imagine if you had never seen a movie, and then you were treated to a week of films. Let’s imagine you see
Top Hat; Singin’ in the Rain; The Good, the Bad and the Ugly; High Noon; The Sound of Music; The Magnificent Seven; Cabaret; True Grit; Unforgiven
and
West Side Story
. If asked to categorise movies, you would say they come in two flavours: musicals and westerns. That’s fine, but what happens in the following week if you are shown
Bridget Jones’s Diary
and
Gravity
? Or
Paint Your Wagon, Seven Brides for Seven Brothers
and
Calamity Jane
, all of which are song-and-dance films involving cowboys? You’ll be stuck trying to shoehorn movies into genre definitions you developed before you understood the cinematic landscape. For a similar reason, we’ll try to avoid too many definitions of individual classes of long non-coding RNAs and just focus on what we really know experimentally.

The importance of a good start in life

Appropriate control of gene expression is required throughout life, but it’s critically important in very early development, because even the slightest shift in events during the first few cell divisions can have dramatic effects. This is particularly true in the zygote, the single cell formed from the fusion of an egg and a sperm.
The zygote, and the first few cells generated by division from this progenitor, are known as totipotent. They are able to create all the cells of the embryo and placenta. Researchers would love to work with these cells, but they are tiny in number. Instead, most research is carried out in embryonic stem cells, also known as ES cells. These were originally derived from embryos, many years ago, but we don’t need to access embryos any more to get them, as they can be grown in cell culture. ES cells are from a slightly later stage in development and aren’t quite as unconstrained as the zygote. They are known as pluripotent, as they have the potential to form any cell type in the body, but not placental cells.

In the correct, carefully controlled culture conditions, ES cells divide to generate yet more pluripotent stem cells. But relatively minor changes to the culture conditions lead to a loss of pluripotency. The ES cells begin to differentiate into more specialised cell types. One of the most dramatic changes is when ES cells differentiate into heart cells, which beat spontaneously and in synchrony in a Petri dish. But essentially the ES cells can move down many different development routes, depending on the ways that they are treated.

Researchers manipulated ES cells in culture by knocking down the expression of nearly 150 of the long non-coding RNAs that are located far from any known protein-coding genes. They knocked down the expression of just one long non-coding RNA in each experiment. They found that in dozens of cases, knockdown of just one long non-coding RNA was enough to change the ES cells from being pluripotent to starting to differentiate into other cells. The authors also analysed which genes were expressed before and after they knocked down the long non-coding RNAs. They found that over 90 per cent of the long non-coding RNAs controlled expression of protein-coding genes either directly or indirectly. In many cases, the expression of hundreds of protein-coding genes was affected. These were nearly always genes that were far away on
the genome, not the ones that were closest to the long non-coding RNAs that they had knocked down.

The scientists also performed the reciprocal experiment. They treated ES cells with a chemical that is known to cause them to differentiate and then analysed the expression of the specific long non-coding RNA class in which they were interested. They found that expression of about 75 per cent of the long non-coding RNAs dropped as the cells moved from being pluripotent to being committed to a development pathway. The two sets of data are consistent with the idea that the levels of expression of certain long non-coding RNAs act as gatekeepers to maintain ES cells in a pluripotent state.
13
This created confidence that these non-protein-coding RNAs do have a function in the cell, at least during early development.

Some long non-coding RNAs may also affect later developmental stages. We met the HOX genes in Chapter 4. These are the genes that are important for correct patterning of body parts. They’re the ones where mutations in fruit flies can lead to bizarre effects such as legs on the head. HOX genes are found in clusters in the genome, and these regions are extraordinarily rich in long non-coding RNAs. This is in contrast to their lack of ancient viral repeats. Scientists were keen to investigate if the long non-coding RNAs influenced the activity of the HOX genes in the same place in the genome. To test this, researchers used a technique to decrease the expression of a specific long non-coding RNA from the HOX gene region in chick embryos. When they did this, limb development went wrong. The bones towards the ends of the limbs were abnormally short.
14
Similarly, knocking out expression of another long non-coding RNA from this genome region in mice resulted in animals with malformations of the bones of the spine and wrists.
15
Both sets of data are consistent with the long non-coding RNAs being important regulators of HOX gene expression, and consequently of limb development.

Long RNAs and cancer

Cancer can in some ways be thought of as the flip side of development. One of the problems in cancer is that mature cells may change and revert to having some of the characteristics of less specialised cells, with a higher capacity to divide uncontrollably. Given that long non-coding RNAs are important in pluripotency and in development, it’s perhaps not surprising that some have now been implicated in cancer.

One large study analysed the expression of long non-coding RNAs in over 1,300 individual tumours from four different cancer types (prostate, ovarian, a type of brain tumour called glioblastoma and a specific form of lung cancer). There were about 100 long non-coding RNAs where high levels of expression were most commonly found in patients who died quickly from the disease. Nine of these long non-coding RNAs showed this association no matter the class of cancer that was assessed, which suggests they may be useful as more general markers for predicting survival chances in a patient.
16

For three of the cancer types (prostate cancer was the exception), the same study reported that they could detect long non-coding RNAs that differentiated one sub-class of tumour from another. Although we refer to ovarian cancer, for example, there are different types of ovarian cancer depending on the cell types involved, and this affects the natural history of the tumour in a patient. This in turn can have implications for the disease prognosis and the treatment that a patient should receive. Analysing the expression of specific long non-coding RNAs in a tumour sample may help clinicians in the future to select the most appropriate therapies for an individual patient.

Other books

Dead Nolte by Borne Wilder
Crane Fly Crash by Ali Sparkes
Roses of Winter by Morrison, Murdo
Die Once Live Twice by Dorr, Lawrence
A Christmas Tail by Trinity Blacio
Shades of Treason by Sandy Williams
Turn Up the Heat by Kimberly Kincaid
Skyport Virgo 1 - Refuge by Lolita Lopez