The cat in the bag

And the hairball. What is the value of network analysis of genetic data except for being an undefined label for any work including the use of external data sources for the evaluation of hmm, some genetic data? Let’s be specific: what is the value of this recent high-profile paper in Nature Neuroscience describing the distribution of variants in a schizophrenia network?

Schizophrenia is not exactly the best characterized disorder genetically. Several genome-wide association studies (GWAS) reported several genes, but these variants not really explain the disorder the way that deleterious variants in SCN1A explain Dravet syndrome. The authors place the genes identified in various studies – primarily  those affected by copy number variations – in a network analysis and look for enrichment using the NETBAG method, a previously published algorithm for a gene network in autism. Note added in proof: A large-scale GWAS in Schizophrenia fails to discover any relevant hits.

Significance ensues. The authors identify significant clusters  and p-values are scattered throughout the paper like histone marks on a stretch of the genome. At some point during the paper, the algorithm actually displays a single significant cluster involved in axon guidance and then some modifications and manipulations provide other clusters. But the significance of the clusters is irrelevant. The network contains biases that a randomization are cannot remove simply as generative models are hard in biology and cannot even model genes DNA satisfingly.  Anyway, we are interested in the highest ranking clusters for follow-up analysis. At least, many of the genes are expressed in the prenatal brain, which is the most surprising finding of this paper. A complex figure of the identified network can be found in the paper. But how relevant are genes like the actin-remodeling complex Arp2/3 to schizophrenia, which are likewise implied in the autism network presented by the same group?

APR2/3 in the STRING network

ARP2/3 in the STRING network. Many of the connections known to be expressed in any human cell are missing from the simplified display in the NETBAG outputs.

A better hairball. Is NETBAG+ presented by the authors superior to other network-based enrichment methods? I can’t really say, as this would require hands-on work and an objective function to optimize and call something superior, which we don’t really have. The value of such work is to provide an overview rather than an in-depth analysis. In this respect, very mundane issues make the presented work less than ideal: The NETBAG network is not available and one cannot reproduce the results with slightly different parameters to assess the stability of the findings – a major drawback if you want to use this to explore data.

Late additions. Network information should not be included in a post-mortem analysis but be included in genetic analysis as an extended candidate gene list with a little better statistics. I am doubtful that mixing a set of genes in complex disease from a diverse group of patients will ever come out meaningful.

Roland Krause

Roland is a bioinformatician at the Luxembourg Centre for Systems Biomedicine. He received his undergraduate degree in biotechnological engineering and a PhD in biochemistry from the University of Heidelberg. His postdoc was in computational biology at the MPI for Molecular Genetics, Berlin, shared with the computer science and math department of the Free University Berlin.