Program completed. On Sunday, we finished our EuroEPINOMICS next generation sequencing (NGS) bioinformatics meeting. After working through the command line, running scripts, and staring at black screens with white cursors, we completed our four day course by looking at the more user friendly, web-based tools that the NGS world has to offer, including Galaxy, Varbank, and Ingenuity. I think it was the general consensus among the participants that this was the bioinformatics meeting that we needed in order to understand the data that we generate and deal with on day-to-day basis. These were my favorite sound bites of our meeting.
“Why didn’t we do this two years ago?” Several participants commented that our meeting was very good, but that we should have already had this meeting a long time ago. So, why did it take us so long? Almost two years ago, we met in Luxembourg for a meeting that we declared the “meeting of the 1000 exomes”. Since then we realized that that generating exomes is not the main issue anymore. Analysis is the bottleneck. Things were a little different back then, and we didn’t really expect that basic exome skills would be required by pretty much every researcher in the field at some point. It is a little sad that this joint bioinformatics course could only take place in the final year of our funding period, but I believe that we have set a good precedent for the future. Be it with or without future funding through large grants, we now understand the need for such courses and will repeat this in one or two years’ time.
“Maybe samtools is not in your $PATH.” That’s a super nerdy comment that didn’t really sound all that nerdy anymore in the end. Samtools is a command-line based software tool for the analysis of NGS data. At some point during our course, some participants couldn’t start this program in the Unix shell due to a missing definition. I would like to use this statement to demonstrate the before-and-after effect: prior to our meeting, a significant fraction of participants would have struggled to understand this problem. However, when we encountered the $PATH issue with samtools on our third day, this was pretty much a routine operation for everybody. It’s like learning a new language. At some point you don’t realize your progress anymore if you just keep speaking it.
“He is a typical bioinformatician – I asked him NOT to come”. Please don’t get this statement wrong. We really appreciated everybody with solid bioinformatics experience in our course. However, the overall scope of our course was translational. We didn’t want to have experts teaching experts, but rather train researchers to deal with NGS data using bioinformatic tools. This statement furthermore points out that bioinformatics sometimes takes on a life of its own and that not all these things are relevant to exome analysis.
“Is this an exome of a patient with epilepsy? No, I just manipulated the FASTQ files for the tutorial”. For our Galaxy tutorial that led us through an entire analysis of a mini exome starting from FASTQ files to the final annotated output, our tutor Geert van der Weyer actually let us work on data that was as real as possible: he introduced an epilepsy-related mutation in the very raw NGS data, the FASTQ files, by manipulating them. This was probably one of the most didactic moves of the tutorial and provided us with a realistic target to aim for.
“…and, as many people have asked for this, you can open it with Excel.” What do bioinformaticians do when their colleagues leave the room? They open up Excel. Just kidding. Nevertheless, Excel jokes were abundant this weekend, stemming from the fact that NGS data analysis soon shows you the inherent limitations of using the program for large datasets and datasets that differ from a clear x/y table format. We have had a few tutorials on manipulating and visualizing data tables with different formats in Unix. Nevertheless, Excel is also somehow symbolic of our native limitations in grasping data. Eventually, you want to have some parts of the data that can be conveniently displayed in a table format. Again, we should be aware that these steps are deliberate steps for data reduction and might not reflect what the data want to tell us.
Let me thank all the participants and the ESF and the VIB for support for this meeting. Again, I would like to thank the organizers of this course, including Arvid Suls, Kamel Jabbari, Roland Krause, Patrick May, Holger Thiele, and Silke Appenzeller.