There is no escape from big data – it haunts you on the web, in social media, enables self-driving cars and supposedly revolutionizes health care. Big Data in Healthcare is a meeting in Luxembourg today and tomorrow which brings together a colorful mix of people from all domains and should convince the last of us that this is not at all an IT specialist issue.
Data management is boring and quite likely, you are not happy about how you do it today. Research data collected in practice typically evokes complaints when it finally reaches the statisticians or bioinformaticians as they claim that it is not being properly organized. That’s not because you made a mistake: it’s simply hard to do when you set up your small scale study. Tools like Excel will give you practically infite freedom with little guidance and every study seems to be so different from the last. Basic data management isn’t really taught in most research environments unless you’re talking about clinical trials that require elaborate and specialized software systems. Luckily, there is a Coursera course starting on June 2nd, 2014 that teaches the basics of data management. While it is focussed on the tool the organizers provide, the contents of the course should allow you to build better data structures.
Time flies by. Last week, we have had the final General Assembly of the EuroEPINOMICS project in Tuusula, Finland. All four projects of the EuroEPINOMICS consortium presented the current, ongoing projects and it’s good to hear that there are multiple publications in various stages coming up. Over the three years of the consortium, the diverse groups grew closer together. During this meeting many unpublished results were shown, including extension of studies on genes such as HCN1, CHD2, GRIN2A, GRIN2B or RBFOX1 as well as more data on epigenetics in acquired epilepsy.
Do you still draw your pedigrees by hand? Or generate them using some website, take a screenshot of it (with Photoshop) and paste it into a Powerpoint file that you convert to PDF, send it by mail to a colleague, who then tries to extract the information into a text file representing the pedigree structure in a computer readable format? Continue reading
MOOC. People have been hailing Massive Open Online Courses (MOOCs) as the next big thing in higher education. Accordingly, the number of people complaining about their failures is now substantial. MOOCs are following the usual hype cycle and we could close the post here. Then again, I recently became a MOOC disciple and need to vent some praise of a course on the Coursera platform that people reading this blog should be aware of: Medical Neuroscience presented by Leonard White (Duke).
Why are some brain disorders so common? Schizophrenia, autism and epilepsy each affect about 1% of the world’s population, over their lifetimes. Why are the specific phenotypes associated with those conditions so frequent? More generally, why do particular phenotypes exist at all? What constrains or determines the types of phenotypes we observe, out of all the variations we could conceive of? Why does a system like the brain fail in particular ways when the genetic program is messed with? Here, I consider how the difference between “concrete” and “emergent” properties of the brain may provide an explanation, or at least a useful conceptual framework. Continue reading
And the hairball. What is the value of network analysis of genetic data except for being an undefined label for any work including the use of external data sources for the evaluation of hmm, some genetic data? Let’s be specific: what is the value of this recent high-profile paper in Nature Neuroscience describing the distribution of variants in a schizophrenia network? Continue reading
The biggest European meeting on Science online – policy, outreach, tools – started this Sunday. SpotOn brings open source coders, librarians, scientists from a variety of fields, and publishers together in London.
You can follow the keynotes and sessions online and evaluation and comments can be followed in real time on Twitter. #solo12 is hashtag of the overall conference, the individual sessions have their own tags. Continue reading
Everybody wins. The scientific publication process is not ideal to find the best bioinformatics methodology for a given problem. Most predictions are not performed blind as our data sets are so small that separating them in to several disjoints sets for training and testing purposes is not possible or sensible. The structural biology community has started to tackle the problems by establishing a competition called Critical Assessment of protein Structure Prediction (CASP). For example, the solution of the 3D structure of a protein is announced but the data withheld for a couple of months to give computational groups time to submit a prediction which is then evaluated by an independent team. A concluding conferences crowns the best prediction groups. In recent years, systems biology and sequence interpretation produce sufficient data to make similar challenges possible. Continue reading
ENCODE will change the way we analyse genomes. The comparison of long non-coding RNA and transcription factor binding sites will require more CPU time. Anything else? I don’t know, I am only writing this because Ingo asked me to. It’ll take time to study the 30+ papers, sift through the data and discuss it with colleagues. Only then, something like that understanding we hear so much about can happen and I am sure it will in journal clubs around the globe in the next weeks. But smaller things might already be interesting.