Predicting the effects of mutations

From left to right: Sean Mooney, Rachel Karchin, Andre Frank, Shamil Sunyaev, Emidio Capriotti.

How well can we predict the effects of mutations that change the protein sequence? Framed by the the ISMB conference, the largest bioinformatics conference the SNP special interest group met on Saturday the 14th, 2012,  moderated by Emidio Capriotti and Yana Bromberg to discuss current state of the art. Here’s a summary with links a set of tools to try if you study variants.

Zemin Zhang from Genentech opened the session with the sobering note that even the industry leader in cancer genomics relies on community tools. Gustavo Parisi, a structural biologist (like most speakers) from Buenos Aires (the first scientist from that Argentia that I heard) argued for looking at conformers for the evaluation of mutational effects and Maricel Kann showed her results of mapping mutations to conserved positions in protein domaines, implemented in the webserver DMDM.

A recurring theme was the combination of using soft data like the effects on a biological processwith hard data – multiple sequence alignments and three-dimensional structures – which is better tractable. To this end Lei Xie presented an approach to model hypoxia in Drosophila.

Frank Schacherer presented MutationTaster and the Human Genome Mutation Database, a commercial database of SNPs, which is manually curated and looks quite useful. A free version is available but three years behind the current state of curation. In the following  keynote, Olivier Lichtarge presented his stunningly simple Evolutionary Trace method to relate phenotype to changes of genotype, which proved  competitive in comparison.

The highlight talk after the well deserved lunch break by  David Haussler went from hardware via cancer genetics to complex models. It was great to see a leader in machine learning to proposed that we need more understanding of the data rather than more complex models.

Janita Thusberg from Sean Mooneys lab put the focus on pharmacogenetics, which was picked up by Russ Altman, who presented applications  around the pharmacogenomics resource PharmGKB, which has the long term goal of treatment advise according the patients genotype. Today, only a handful SNPs are clearly actionable and doctors can rely on the pharmacogenomic advice for treatment but those few cases might be life savers.

SNAP is currently the best tool for assessment of functional variants and
Yana Brombergs focussed her talk on the problem of separating neutral and severe SNPs. The assessment of variants as observed in genome sequences appear to differ strongly from the expectation. The hypothesis that we all carry plenty of seemingly deleterious mutations is supported by her data.

How to compare methods effectively? Some of the contestant tools focused on specific proteins and usually, the training and test data have to be constructed from the same body of data.  To objectively compare the different methods, the CAGI contest poses challenges by asking researchers to release experimental data in a fashion they can be used for prediction but not for fine-tuning the parameter of a method to a data set. As the experimental context is given, biological knowledge can still be applied. CAGI was presented by Steven Brenner.

Conclusion. A panel discussion closed the day. The big frontier is still the lack of reliable experimental data. Combination of SNPs will be an active research field as well as question whether the burden of rare and/or common variants. Everyone agreed that it will be interesting to look into the non-coding regions and assess the effects on transcription factor binding sites changes but the reliable data is even scarce. Sean Mooney postulated a new trend of reverse genetics – moving from identified variant back to phenotypes, which might not be called.

Which tools should we use for EuroEPINOMICS? All of them? Will it be relevant to develop tools like KvSNP, which specialize in predictions for ion channels? The answer will require research but simply running PolyPhen will no be sufficient. Metapredictors seem to have no particular advantage in CAGI but knowledge had. So integrating structural aspects, network biology and functional knowledge allowed the winners to deliver better predictions. So should we.

The ISMB 2013 will be held in Berlin. In particular the Special Interest Group meetings provide very up-to-date view of fields also to non-bioinformaticians.

Roland Krause

Roland is a bioinformatician at the Luxembourg Centre for Systems Biomedicine. He received his undergraduate degree in biotechnological engineering and a PhD in biochemistry from the University of Heidelberg. His postdoc was in computational biology at the MPI for Molecular Genetics, Berlin, shared with the computer science and math department of the Free University Berlin.