reproducibility

100% Effective: the unrepeated studies

A few weeks back, Ben Goldacre wrote about the reproduction crisis that science is suffering from. As a very descriptive example, he addresses the large-scale use of deworming medication in developing countries which is based on a single, but very extensive study from 2004.

When Godacre described the outcome of a re-evaluation of the data from 2004, which was done in 2013, he listed all the problems found in that study – starting from missing data to wrong instructions provided by the analysis software package that was used back then. It is really no surprise that the new evaluation came to very different results about the effectiveness of deworming medication in schools.

I really appreciate that Goldacre does not take the credit from the authors of the 2004 study, acknowledging that they did a difficult and hard work in all conscience. Instead, he points out how unusual it was that those scientists provided all their raw data for a re-evaluation. And this is indeed astonishing. Goldacres comparison with the probe passing Pluto is well chosen:

Conducting a trial, and then refusing to let anyone see the data, is like claiming you’ve flown a spaceship to Pluto, but refusing to let anyone see the photos.

As a matter of fact, this happens frequently in science. As a chemist, I sometime roll my eyes when I see hundreds of numbers in the supplementary information of a paper, describing every atom coordinate obtained from a crystal structure of a molecule. But at least this tells me that I really get all the data.

When medicine is based on a single study, its effect might have been by chance. credit: BloodyMary  / pixelio.de

When medicine is based on a single study, its effect might have been occured by chance.
Credit: BloodyMary / pixelio.de

The other and even larger problem is indeed the reproducibilty. To be sure that a results is real and well-founded, it actually needs confirmation from different scientists. It is not unusual that scientists find a protocol published, and try to build their work on that. When I go through an interesting paper, I find myself looking for loopholes of missing information that might prevent me from reproducing the result on the first place. When I do a published synthesis and I succeed on the first try, I am surprised. On the other hand, a failure might mean that I am either not skilled enough, or that some piece of information is missing in that paper.

Not to give away all the information can be essential for a scientist under the increasing pressure to “publish or perish”. Since it delays others in reproducing the work, it ensures that the scientist keeps an advantage. Authors have to fear that their manuscripts are rejected, because of a peer-reviewer who reproduces that work in his own lab, and then publishs it first.

So, hoarding data is used as an insurance of the authors, or let’s say as a “copy protection”. As understandle as this might be, this is desastrous for science, as Goldacre clearly emphasizes. Irreproducible science is basically worthless, and in the worst case harmful. I agree that this has no influence on the fact that treatment of children against worms is an urgent and important issue. But it undermines the reliability of science in our society and promotes pseudo-scientific or religious beliefs that claim to be equally justified.

When you are likely to get schizophrenia, you might be creative – or not.

Are creative people likely to carry a genetic risk for mental diseases? A recent study published in Nature Neuroscience, indeed concluded that the genetically carried “risk scores for schizophrenia and bipolar disorder predict creativity”. This study is based on a widely carried collection of genetic samples of the population in Iceland. Living in Iceland, I experienced the discussion about that “data mining” and I also found a sample-donation kit in my mailbox.

My DNA donation kit, 2014.

I don’t want to address the fact that deCODE, the company collecting the DNA samples from about 100,000 Icelanders (which is a bit less than 1/3 of the population) used staff members from the national rescue team to go from door to door and ask people if they want to donate their DNA to a private company for unclear reasons. But in my opinion, there are several things wrong in that partial study derived from the donations:

First, this study addresses about 3% of the population, comparing them with ca. 30% of the countries’ total number. It seemlingly only takes into account Icelanders working in a profession considered by the authors to be “creative”. The icelandic population is quite remotely located, which is exactly why it is so attractive for massively genetic analyses as done by deCODE. Other populations might not be comparable with the icelandic one, since Iceland is a special case.

Second, the definition of creativity that is made by the authors is (and has to be) arbitrary:

Creative individuals were defined as those belonging to the national artistic societies of actors, dancers, musicians, visual artists and writers (n = 1,024 […]).

Are genius and madness genetically connected? Image: Vincent van Gogh

Are genius and madness genetically connected? Image: Vincent van Gogh

I fully agree to that in a study about creativity, that term needs to be defined. The choice of considering members of the artistic societies also seems to be also quite reasonable, but I see there one major issue: Not every creative person might work in a creative job. And vice versa, not every person working in a creative area might be actually creative. This point is also very nicely addressed in the hyperallergic blog, together with an interpretation of the observed correlation between the genetic risk and working in a creative environment:

If the distance between me, the least artistic person you are going to meet, and an actual artist is one mile, these variants appear to collectively explain 13 feet of the distance – David Cutler, geneticist at Emory University

Especially Iceland is famous for its vivid independent music scene and Icelanders are famous for writing and making music – besides working in all kinds of professions. About 10% of all Icelanders are likely to publish a book in their lives, but the study considers only 194 writers. On the other hand, creativity might not be the only prerequisite to work in a creative profession. Artists, actors, musicians and dancers also need a high degree of discipline, passion and self-confidence to persist in those fields.

That being said, I think that this study is a perfect case for positivistic interpretation of a result, suggesting that creativeness is linked to schizophrenia and bipolar disorder. This result can only be valid for people who 1) are Icelanders, 2) participated in the DNA collection and 3) are member of a national artistic society. For all other creative people that primarily work as bus drivers, teachers, farmers, etc., this study does not draw any conclusion. My opinion.

Confessional science**

In the beginning of this month, a blog hosted by the Frankfurter Allgemeine Zeitung* reported about a scientific april fool hoax: On the pre-publishing platform arXiv.org, the author Ali Frolop published what was called “A Farewell to Falsifiability“, where one of the main criteria for scientific theories is questioned. The publication date and the author name (“Ali Frolop” = “April Fool”) make clear that this is actually a joke. However, having read that quite amusing text, it appeared to me more than a seriously meant satire, than just a cheap fooling joke.

Astrology is falsifiable, and there is nothing magic about this demarcation criteria.

The concept of Falsifiability is, in my opinion, on of the most important aspects that separates science from religion. In brief, it means that a theory must allow a prove that can contradict it. Is the sun always going down every day? (Living in Iceland with the summer coming closer, I would argue about that.) The Frolop paper gives also a good overview about other criteria for scientific theories, of course putting them into question: repeatability, simplicity and a testable correctness.

So what is going on in science? The trigger for this ongoing discussion is the string theory and the resulting multiverse theory. While the string theory can be used to explain observations, it can neither be falsified, nor predict observations, which are major disadvantages for a good theory. The same is true for universes other than ours, which also are not observable yet and allow any explanation to describe a maybe not-yet-known reality.

I agree with George Ellis and Joe Silk, who rise serious concerns about the reputation of science, when the criteria for theories are weakened. For example, the theory that there is a god, is also not falsifiable, nor is it sufficient to make predictions. Which is exactly is the purpose of religion. One danger that arises from mixing these aspects of science and religion is already there: Very often, defenders of a creationist god refer to the evolutional theory as just a theory. This is absolutely correct, since evolution is testable, repeatable and simple. The hypothesis that a concious super-powerful being willingly created and altered life, is neither of them. In so far, the Frolop paper might be less a hoax than a serious concern.

This battle for the heart and soul of physics is opening up at a time when scientific results – in topics from climate change to the theory of evolution – are being questioned by some politicians and religious fundamentalists.


* I have to excuse myself for referring so frequently to the german media. I also follow international news, but my native language is closer to me.

** My acknowlegdment goes to Philipp Scharf, who showed me the article in the Planckton blog.

From crisis to crisis

In September this year, David Crotty wrote an blog post about two colliding crises – each in the context with negative results. The first crisis is described as a “reproducibility crisis, based on the assumption that a majority of published experiments is in fact unreproducible. The second crisis is referred to as “negative results crisis”, describing that a large amount of correct results remains unpublished, due to its null-result character. Both crises are described to cause a considerable waste of time for scientists – either in performing published experiments that however cannot succeed, or by repeating unsuccessful experiments that have not been published.

One attempt to overcome the problem of negative results was suggested by Annie Franco, Neil Malhotra and Gabor Simonovits, namely by “creating high-status publication outlets for these studies”. Bu I have to agree that this is easily said.

How willing are researchers to publicly display their failures? How much career credit should be granted for doing experiments that didn’t work?

Even though theses problems are clearly not new (I decidated this blog to negative results for a reason), I was surprised to see them actually described as “crises”. I do think that there is a problem of science losing trust by the public, caused by the omnipresent publish-or-perish paradigm.

Show me your data sets!

Are authors of a scientific publication really more unwilling to share their raw data when their reported evidence is not too strong? This question was recently addressed in the subject of psychology, unsurprisingly published in the open-access journal PLoS ONE. Jelte Wicherts, Marjan Bakker and Dylan Molenaar from the Psychology Department of the University of Amsterdam, indeed came to that conclusion. Their study included 1149 results from 49 papers. It is interesting that in 28 of the considered 49 papers, the respective co-authors did not share their research data, even if they had agreed on that before.

Distribution of reporting errors per paper for papers from which data were shared and from which no data were shared. From DOI 10.1371/journal.pone.0026828

Distribution of reporting errors per paper for papers from which data were shared and from which no data were shared. From DOI 10.1371/journal.pone.0026828

However, one might argue that the authors of this interesting “meta”-study walk on a difficult terrain, as they are trying to draw a correlation about the accuracy of other scientists’ correlations. But I think, their paper makes it clear enough that they were very much aware of that issue.

How to publish null?

In one of my past entries I made an exemplary and incomplete list of journals that are dedicated to negative outcomes of research. The observation that most of those journals suffer from a very low number of article submission is maybe not surprising, but must look confusing. In my opinion, it is still undisputed that unsuccessful experiments, unexpected observations and contradindicative findings are crucial for the progress in science.

However, there are plenty of reasons why scientists would not unveil their failures openly, and I would do the same. So the question is: How could a platform be like that helps scientists to communicate about obstacles, questions and uncertainities? And why would scientists want to contribute?
An interesting example is the open access journal PLOS One that explicitely publishes every article, as long as it is scientifically sound. Due to its open access nature, the authors have to pay upon publication, instead of the reader. There are many other similar open access journals, but to my knowlegde, PLOS One is the successful one. I think, PLOS One is indeed a shelter for findings that contradict commonly acknowledged theories or research areas that are not considered to be „sexy“ by the scientific community. To my knowledge, it took the journals many difficult years to get established and even now, it is not known to too many scientists.
However, I think that for difficult projects like this, it was very helpful to cover a wide spectrum of sciences. Also, the combination of quick publishing and the connection with the audience is a an asset that distinguishes a project like PLOS One from the typical journals. It is maybe the certainity for the authors to get published in a serious way (PLOS One is established), while the journal it is wide-spread enough to ensure that there are enough submissions.
Another interesting example is the review function of the scientific social network ResearchGate. On this platform, Dr. Kenneth Lee from the university of Hong Kong and also Dr. Mohendra Rao from the NIH published their efforts to reproduce STAP, coming both the conclusion that the original work is not reproducible. Dr. Lee tried to also submit this review to Nature, where the original STAP work was published. However, the review was rejected for not-so-clear reasons. Later on, Nature retracted the original STAP publications, nonetheless.
Intransparency and lack of reproducibility of experiments is an ongoing threat that undermines the reliability of science at all. I think, to seriously report about negative experimental outcomes, the reproducibility of those must be ensured. But in fact, it must also be ensured for the well-selling, positive outcomes. So, having an eye on transparent and reproducible experimental procedures is in fact just a sign of good scientific quality at all.

Considering this, a platform focusing on negative results should (1.) be broad in scope, and (2.) leave no doubt about the scientific craft. Further, I tend more and more to believe that a „classic“ medium like a journal might not be the ideal platform for such results. A good publication type might be communications, supporting a quick and responsive feedback. Another important criterion is that publishing those results must be rewarded. In a simple case, it should help improve the authors h-hindex. Here, ResearchGate’s approach to invent a new score might be useful, since its „RG score“ is not solely coupled to the sheer number of publications and citations.

Yet another retracted Nature publication

As announced in the Nature Blog this week, the RIKEN Centre for Developmental Biology (CDB) in Kobe, Japan is going to be renamed and reduced in size. This is so far the latest development in maybe the science scandal in 2014, where two publications in Nature about “stress-induced” growing of stems cells [1, 2] were retracted. The reason was the lack of reproducibility. Very tragically, this situation was accompanied by a suicide.

The amount of retracted papers is impressively shown by RetractionWatch, and this is not limited to highly prestigious publications, like Nature. The reasons for the publication of those inreproducible papers are manifold. In my opinion, the most likely case might be simple mistakes, as in the publication of Doo Ok Jang et al. in the Journal of the American Chemical Society, which was retracted five years after its publication.

These “false positive” results are in my opinion the most dangerous perils in science, since every scientists is eager to publish everything positive, (almost) no matter what. Once a hypothesis was proven in an exeriment, the chance is rather low that this will be double- or triple-checked.