To do: read ENCODE papers

ENCODE will change the way we analyse genomes. The comparison of long non-coding RNA and transcription factor binding sites will require more CPU time. Anything else? I don’t know, I am only writing this because Ingo asked me to. It’ll take time to study the 30+ papers, sift through the data and discuss it with colleagues. Only then, something like that understanding we hear so much about can happen and I am sure it will in journal clubs around the globe in the next weeks. But smaller things might already be interesting.

A common variant described in the a recent GWAS study on genetic generalized epilepsies lies within what now appears as a regulatory region of SCN1A and not of the adjacent sodium channel SCN9A. ENCODE describes a insulator between the SNP and the transcription start site of SCN9A as well as new genes in this regions. These are all interesting tidbits but don’t necessary produce valuable hypotheses for future experimental study. At least we know that the region is active in human umbilical cord endothelial cells (HUVEC) and HEK293 cells. And now we sit here and would have liked to know how this would have looked in cell lines that matter to neurologists. Michael Eisen describes his disappointment with the data more poignantly.

No more junk DNA? Michael Eisen is not the biggest fan of that term either. I always found it adequate as significant proportions of the human genome is composed of  dysfunctional transposons. As far as I can tell, ENCODE has not changed that view although describing that 80% are biologically active. Sean Eddy has nicely summarized this and describes a thought experiment of performing ENCODE on a random genome: Undoubtably, much of it would be described as biochemically active.

That 80% activity made big news. But I have not seen a major achievement in the biological sciences that appeared in the main stream media in a way that would satisfy the authors. Science papers are a very condensed way of communication. Retelling it with 10% of the words of the original publication as well as adding context is not possible without unsatisfactory compromises from at least one perspective. You can criticize Ewan Birney for over-selling the results but his own blog post provides a modest explanation of what 80%-functional means and how to break it down to definitions you might accept. Together with the open data release policies, the virtual machine to download to rerun all analyses and open access publications, the ENCODE consortium leads the way in publications towards other researchers. Everything else: we’ll see.

Roland Krause

Roland is a bioinformatician at the Luxembourg Centre for Systems Biomedicine. He received his undergraduate degree in biotechnological engineering and a PhD in biochemistry from the University of Heidelberg. His postdoc was in computational biology at the MPI for Molecular Genetics, Berlin, shared with the computer science and math department of the Free University Berlin.