Read Junk DNA: A Journey Through the Dark Matter of the Genome Online
Authors: Nessa Carey
JUNK
DNA
Also by Nessa Carey
The Epigenetics Revolution
JUNK
DNA
A Journey Through the
Dark Matter of the Genome
NESSA CAREY
Published in the UK in 2015 by
Icon Books Ltd, Omnibus Business Centre,
39–41 North Road, London N7 9DP
email:
[email protected]
www.iconbooks.com
Sold in the UK, Europe and Asia
by Faber & Faber Ltd, Bloomsbury House,
74–77 Great Russell Street,
London WC1B 3DA or their agents
Distributed in the UK, Europe and Asia
by TBS Ltd, TBS Distribution Centre, Colchester Road,
Frating Green, Colchester CO7 7DW
Distributed in Australia and New Zealand
by Allen & Unwin Pty Ltd,
PO Box 8500, 83 Alexander Street,
Crows Nest, NSW 2065
Distributed in South Africa by
Jonathan Ball, Office B4, The District,
41 Sir Lowry Road, Woodstock 7925
Distributed in India by Penguin Books India,
7th Floor, Infinity Tower – C, DLF Cyber City,
Gurgaon 122002, Haryana
ISBN: 978-184831-826-7
Text copyright © 2015 Nessa Carey
The author has asserted her moral rights.
No part of this book may be reproduced in any form, or by any means, without prior permission in writing from the publisher.
Typeset in Janson Text by Marie Doherty
Printed and bound in the UK
by Clays Ltd, St Ives plc
For Abi Reynolds, who is always by my side
And for Sheldon – good to see you again
Contents
Acknowledgements
Notes on Nomenclature
An Introduction to Genomic Dark Matter
2. When Dark Matter Turns Very Dark Indeed
3. Where Did All the Genes Go?
4. Outstaying an Invitation
5. Everything Shrinks When We Get Old
6. Two is the Perfect Number
7. Painting with Junk
8. Playing the Long Game
9. Adding Colour to the Dark Matter
10. Why Parents Love Junk
11. Junk with a Mission
12. Switching It On, Turning It Up
13. No Man’s Land
14. Project ENCODE – Big Science Comes to Junk DNA
15. Headless Queens, Strange Cats and Portly Mice
16. Lost in Untranslation
17. Why LEGO is Better Than Airfix
18. Mini Can Be Mighty
19. The Drugs Do Work (Sometimes)
20. Some Light in the Darkness
Appendix: Human Diseases in which Junk DNA Has Been Implicated
Acknowledgements
I am lucky that for my second book I continue to have the support of a great agent, Andrew Lownie, and of lovely publishers. At Icon Books I’d particularly like to thank Duncan Heath, Andrew Furlow and Robert Sharman, but not forgetting their former colleagues Simon Flynn and Henry Lord. At Columbia University Press I’m very grateful to Patrick Fitzgerald, Bridget Flannery-McCoy and Derek Warker.
As always, entertainment and enlightenment have been obtained from some unusual quarters. Conor Carey, Finn Carey and Gabriel Carey all played a role in this, and outside the genetic clan I’d also like to thank Iona Thomas-Wright. Endless support and lots of biscuits have been provided by my ever-patient, delightful mother-in-law, Lisa Doran.
I’ve had a blast delivering lots of science talks to non-specialist audiences since my first book was published. The various organisations that have invited me to speak are too many to namecheck but they know who they are and I’ve enjoyed the privilege immensely. It’s been very inspiring. Thank you all.
And finally Abi. Who is mercifully forgiving of the fact that, despite my promises, I still haven’t had that ballroom dancing lesson yet.
Notes on Nomenclature
There’s a bit of a linguistic difficulty in writing a book on junk DNA, because it is a constantly shifting term. This is partly because new data change our perception all the time. Consequently, as soon as a piece of junk DNA is shown to have a function, some scientists will say (logically enough) that it’s not junk. But that approach runs the risk of losing perspective on how radically our understanding of the genome has changed in recent years.
Rather than spend time trying to knit a sweater with this ball of fog, I have adopted the most hard-line approach. Anything that doesn’t code for protein will be described as junk, as it originally was in the old days (second half of the twentieth century). Purists will scream, and that’s OK. Ask three different scientists what they mean by the term ‘junk’, and we would probably get four different answers. So there’s merit in starting with something straightforward.
I also start by using the term ‘gene’ to refer to a stretch of DNA that codes for a protein. This definition will evolve through the course of the book.
After my first book
The Epigenetics Revolution
was published, I realised the readership was quite binary with respect to gene names. Some people love knowing which gene is being discussed, but for other readers it disrupts the flow horribly. So this time I have only used specific gene names in the text where absolutely necessary. But if you want to know them, they are in the footnotes, and the citations for the original references are at the back of the book.
An Introduction to Genomic Dark Matter
Imagine a written script for a play, or film, or television programme. It is perfectly possible for someone to read a script just as they would a book. But the script becomes so much more powerful when it is used to produce something. It becomes more than just a string of words on a page when it is spoken aloud, or better yet, acted.
DNA is rather similar. It is the most extraordinary script. Using a tiny alphabet of just four letters it carries the code for organisms from bacteria to elephants, and from brewer’s yeast to blue whales. But DNA in a test tube is pretty boring. It does nothing. DNA becomes far more exciting when a cell or an organism uses it to stage a production. The DNA is used as the code for creating proteins and these proteins are vital for breathing, feeding, getting rid of waste, reproducing and all the other activities that characterise living organisms.
Proteins are so important that in the twentieth century scientists used them to define what they meant by a gene. A gene was described as a sequence of DNA that codes for a protein.
Let’s think about the most famous scriptwriter in history, William Shakespeare. It can take a while for us to tune in to Shakespeare’s writings because of the way the English language has changed in the centuries since his death. But even so, we are always confident that the bard only wrote the words he needed his actors to speak.
Shakespeare did not, for example, write the following:
vjeqriugfrhbvruewhqoerahcxnqowhvgbutyunyhewqicxhjafvurytnpemxoqp[etjhnuvrwwwebcxewmoipzowqmroseuiednrcvtycuxmqpzjmoimxdcnibyrwvytebanyhcuxqimokzqoxkmdcifwrvjhentbubygdecftywerftxunihzxqwemiuqwjiqpodqeotherpowhdymrxnamehnfeicvbrgytrchguthhhhhhhgcwouldupaizmjdpqsmellmjzufernnvgbyunasechuxhrtgcnionytuiongdjsioniodefnionihyhoniosdreniokikiniourvjcxoiqweopapqsweetwxmocviknoitrbiobeierrrrrrruorytnihgfiwoswakxdcjdrfuhrqplwjkdhvmogmrfbvhncdjiwemxsklowe
Instead, he just wrote the words which are underlined:
vjeqriugfrhbvruewhqoerahcxnqowhvgbutyunyhewqicxhjafvurytnpemxoqp[etjhnuvrwwwebcxewmoipzowqmroseuiednrcvtycuxmqpzjmoimxdcnibyrwvytebanyhcuxqimokzqoxkmdcifwrvjhentbubygdecftywerftxunihzxqwemiuqwjiqpodqeotherpowhdymrxnamehnfeicvbrgytrchguthhhhhhhgcwouldupaizmjdpqsmellmjzufernnvgbyunasechuxhrtgcnionytuiongdjsioniodefnionihyhoniosdreniokikiniourvjcxoiqweopapqsweetwxmocviknoitrbiobeierrrrrrruorytnihgfiwoswakxdcjdrfuhrqplwjkdhvmogmrfbvhncdjiwemxsklowe
That is, ‘A rose by any other name would smell as sweet’.
But if we look at our DNA script it is not sensible and compact, like Shakespeare’s line. Instead, each protein-coding region is like a single word adrift in a sea of gibberish.
For years, scientists had no explanation for why so much of our DNA doesn’t code for proteins. These non-coding parts were dismissed with the term ‘junk DNA’. But gradually this position has begun to look less tenable, for a whole host of reasons.
Perhaps the most fundamental reason for the shift in emphasis is the sheer volume of junk DNA that our cells contain. One of the biggest shocks when the human genome sequence was completed in 2001 was the discovery that over 98 per cent of the DNA in a human cell is junk. It doesn’t code for any proteins. The Shakespeare analogy used above is in fact a simplification. In genome terms, the ratio of gibberish to text is about four times as high as shown. There are over 50 letters of junk for every one letter of sense.
There are other ways of envisaging this. Let’s imagine we visit a car factory, perhaps for something high-end like a Ferrari. We would be pretty surprised if for every two people who were building a shiny red sports car, there were another 98 who were sitting around doing nothing. This would be ridiculous, so why would it be reasonable in our genomes? While it’s a very fair point that it’s the imperfections in organisms that are often the strongest evidence for descent from common ancestors – we humans really don’t need an appendix – this seems like taking imperfection rather too far.
A much more likely scenario in our car factory would be that for every two people assembling a car, there are 98 others doing all the things that keep a business moving. Raising finance, keeping accounts, publicising the product, processing the pensions, cleaning the toilets, selling the cars etc. This is probably a much better model for the role of junk in our genome. We can think of proteins as the final end points required for life, but they will never be properly produced and coordinated without the junk. Two people can build a car, but they can’t maintain a company selling it, and certainly can’t turn it into a powerful and financially successful brand. Similarly, there’s no point having 98 people mopping the floors and staffing the showrooms if there’s nothing to sell. The whole organisation only works when all the components are in place. And so it is with our genomes.
The other shock from the sequencing of the human genome was the realisation that the extraordinary complexities of human anatomy, physiology, intelligence and behaviour cannot be explained by referring to the classical model of genes. In terms of numbers of genes that code for proteins, humans contain pretty much the same quantity (around 20,000) as simple microscopic worms. Even more remarkably, most of the genes in the worms have directly equivalent genes in humans.
As researchers deepened their analyses of what differentiates humans from other organisms at the DNA level, it became apparent that genes could not provide the explanation. In fact, only one genetic factor generally scaled with complexity. The only genomic features that increased in number as animals became more complicated were the regions of junk DNA. The more sophisticated an organism, the higher the percentage of junk DNA it contains. Only now are scientists really exploring the controversial idea that junk DNA may hold the key to evolutionary complexity.
In some ways, the question raised by these data is pretty obvious. If junk DNA is so important, what is it actually doing? What is its role in a cell, if it isn’t coding for proteins? It’s becoming apparent that junk DNA actually has a multiplicity of different functions, perhaps unsurprisingly given how much of it there is.
Some of it forms specific structures in the chromosomes, the enormous molecules into which our DNA is packaged. This junk prevents our DNA from unravelling and becoming damaged. As we age, these regions decrease in size, finally declining below a critical minimum. After that, our genetic material becomes susceptible to potentially catastrophic rearrangements that can lead to cell death or cancers. Other structural regions of junk DNA act as anchor points when chromosomes are shared equally between different daughter cells during cell division. (The term ‘daughter cell’ means any cell created by division of a parental cell. It doesn’t imply that the cell is female.) Yet others act as
insulation regions, restricting gene expression to specific regions of chromosomes.