Here is why CADD has become the preferred variant annotation tool

Variant annotation. In both clinical practice and within existing research projects, we’re often faced with the issue of telling whether a given variant is benign or whether it is pathogenic. In silico prediction tools are designed to help this decision making process. However, there are so many of them and it is often hard to assess which tool works best. In a 2014 publication in Nature Genetics, the CADD score was introduced as comprehensive tool that aims to take the results of many known prediction tools into account. Follow me on a journey that takes us on hyperplanes, support vector machines and every possible variant in the human genome. Continue reading

The day I fell in love with Varbank

De novo. Three months ago, I performed a trio exome de novo analysis in a patient-parent trio. From my iPad, in a hotel room in Paris. When I got home a few days later, I was excited to tell my students that the analysis worked. They looked at me slightly confused: “What’s the big deal? We had the analysis complete already a week or so ago.” Last year at this time, I was proud that our lab had established a fully functional de novo analysis pipeline. Suddenly, it’s not a big deal anymore. What happened? Let me tell you about Varbank. Continue reading

A look back at the Leuven NGS bioinformatics meeting

Program completed. On Sunday, we finished our EuroEPINOMICS next generation sequencing (NGS) bioinformatics meeting. After working through the command line, running scripts, and staring at black screens with white cursors, we completed our four day course by looking at the more user friendly, web-based tools that the NGS world has to offer, including Galaxy, Varbank, and Ingenuity. I think it was the general consensus among the participants that this was the bioinformatics meeting that we needed in order to understand the data that we generate and deal with on day-to-day basis. These were my favorite sound bites of our meeting. Continue reading

9 things you didn’t know about bioinformatics

Lessons. Today was the first day of our bioinformatics workshop in Leuven, Belgium. We started out with some basic command line programming and eventually moved on to working with R Studio. What is this all about? It’s about getting some basic understanding of what your computer does and how your computer handles files. It’s about good data and bad data and losing the fear of the command line. We collected responses from the participants today about today’s take home messages. Continue reading

Program or be programmed – the EuroEPINOMICS bioinformatics workshop 2014

Join the genome hacking league.  We are preparing a EuroEPINOMICS bioinformatics workshop in Leuven and I really, really encourage you to join us, as there are handful of place left. This will be the workshop that I always wanted to attend, but never got a chance to take part in. And yes, there is a final exam, but there is a chance that you might pass it.  If you’re worried, skip ahead two paragraphs.

Continue reading

Big data now, scientific revolutions later

Sequence databases are not the only repositories that see exponential growth. The internet helps companies to collect information in unprecedented orders of magnitude, which has spurned the development of new software solutions. “Big data” is the term that stuck with it and blew life into the data analysis. Widespread coverage ensued, including a series of blog posts published by the New York Times. Data produced by sequencing is big: Current hard drives are too slow for raw data acquisition in modern sequencers and we have to ship the discs because we lack the bandwidth to transmit the data via the internet. But we process them only once and in a couple of years from now they can be reproduced with ease.

Large-scale data collection is once again hailed as the next big thing and spiced with calls for a revolution in science. In 2008, Wired even announced the end of theory. Experimental scientists make good use of hypotheses and targeted experiments under the scientific method the last time I checked though. A TEDMED12 presentation by Atul Butte, bioinformatician at Stanford is symptomatic in it’s revolutionary language and caused concern with Florian Markowetz, bioinformatician at the Cancer Center in Cambdridge, UK (and a Facebook friend of mine). Florian complains and explains that the quantitative changes in the data does not lead to a new quality of science and calls for better theories and model development. He’s right, although the issue of data acquisition and source material had deserved more attention (what can you expect from a mathematician).

Big data

The part of the data we care about in biology is quite moderate but note that the computing resources of the BGI are in league with the Large Hadron Collider.

We don’t know what to expect from e.g. exome sequencing for a particular disease and the only way to find out is to do the experiment, look at the data, come up with guestimates and confirm your finding in the next round. Current data gathering and analysis projects in the life sciences won’t be classified as big data by the next sweep of scientists anyway. They are mere community technology exploration projects using ad hoc solutions.