jamiemcmorrin: Novelty, interest and replicability

So at last, your paper is written. It represents the culmination
of many years’ work. You think is an important advance for the field. You write
it up. You carefully format it for your favoured journal. You grapple with the
journal’s portal, tracking down details of recommended reviewers and then sit
back. You anticipate a delay of a few weeks before you get reviewer comments.
But, no. What’s this? A decision letter within a week: “Unfortunately we
receive many more papers than we can publish or indeed review and must make
difficult decisions on the basis of novelty and general interest as well as
technical correctness.” It’s the publishing equivalent of the grim reaper: a
reject without review.

It happens increasingly often, especially if you send work
to journals with high impact factors. I’ve been an editor and I know there are
difficult decisions to make. It can be kinder to an author to reject
immediately if you sense that the paper isn’t going to make it through the review
process. One thing you learn as an author is that there’s no point protesting
or moaning. You just try again with another journal. I’m confident our paper is
important and will get published, and there’s no reason for me to single this
journal out for complaint. But this experience has made me reflect more
generally on factors affecting publication, and I do think there are things
about the system that are problematic.

So, using this blog as my soapbox, there are two points I’d
like to make: A little one and a big one. Let’s get the little one out of the
way first. It’s simply this: if a journal commonly rejects papers without
review, then it shouldn’t be fussy about the format in which a paper is
submitted. It’s just silly for busy people to spend time getting the references
correctly punctuated, or converting their figures to a specific format, if
there’s a strong probability that their paper will be bounced. Let the
formatting issues be addressed after the first round of review.

The second point concerns the criteria of “novelty and
general interest”. My guess is that our paper was triaged on the novelty
criterion because it involved replication. We reported a study that involved measuring electrical brain responses to
sounds. We compared these responses in children with developmental language
impairments and typically-developing children. The rationale is explained in a blogpost
I wrote for the Wellcome Trust.

We’re not the first people to do this kind of research.
There have been a few previous studies, but it’s a fair summary to say the
literature is messy. I
reviewed part of it a few years back and I was shocked at how bad things
were. It was virtually impossible to draw any
general conclusions from 26 studies. Now these studies are really hard to do.
Just recruiting people is difficult and it can take months if not years to get
an adequate sample. Then there is the data analysis which is not for the
innumerate or faint-hearted. So a huge amount of time and money had gone into
these studies, but we didn’t seem to be progressing very far. The reason was
simple: you couldn’t generalise because nobody ever attempted to replicate
previous research. The studies were focussed on the same big questions, but
they differed in important ways. So if they got different results, you couldn’t
tell why.

In response to this, part of my research strategy has been
to take those studies that look the strongest and attempt to replicate them. So
when we found strikingly similar results to a study by Shafer
et al (2010) I was excited. The fact that two independent labs on different
sides of the world had obtained virtually the same result gave me confidence in
the findings. I was able to build on this result to do some novel analyses that
helped establish direction of causal influences, and felt we at last we were
getting somewhere. But my excitement was clearly not shared by the journal
editor, who no doubt felt our findings were not sufficiently novel. I wasn’t
particularly surprised by this decision, as this is the way things work. But is
the focus on novelty good for science?

The problem is that unless novel findings are replicated, we
don’t know which results are solid and reliable. We ought to know: we apply statistical methods with the sole goal of
establishing this. But in practice, statistics are seldom used appropriately.
People generate complex datasets and then explore different ways of analysing
data to find statistically significant results. In electrophysiological studies,
there are numerous alternative ways in which data can be analysed, by examining
different peaks in a waveform, different methods of identifying peaks,
different electrodes, different time windows, and so on. If you do this, it is
all too easy for “false positives” to be mistaken as genuine effects (Simmons,
Nelson, & Simonsohn, 2011). And the problem is compounded by the “file
drawer problem” whereby people
don’t publish null results. Such considerations led Ioannidis
(2005) to conclude that most published research findings are false.

This is well-recognised in the field of genetics, where it
became apparent that most early studies linking genetic variants to phenotypes
were spurious (see Flint
et al). The reaction, reflected in a recent
editorial in Behavior Genetics has been to insist that authors
replicate findings of associations between genes and behaviour. So if you want
to say something novel, you have to demonstrate the effect in two independent
samples.

This is all well and good, but requiring that authors
replicate their results is unrealistic in a field where a study takes several
years to complete, or involves a rare disorder. You can, however, create an
expectation that researchers include a replication of prior work when designing
a study, and/or use existing research to generate a priori predictions about
expected effects.

It wouldn’t be good for science if journals only published
boring replications of things we already knew. Once a finding is established as
reliable, then there’s no point in repeating the study. But something that has
been demonstrated at least twice in independent samples (replicable) is far
more important to science than something that has never been shown before (novel),
because the latter is likely to be spurious. I see this as a massive challenge
for psychology and neuroscience.

In short, my view is that top journals should reverse their
priorities and treat replicability as more
important than novelty.

Unfortunately, most scientists don’t bother to attempt
replications because they know the work will be hard to publish. We will only
reverse that perception if journal editors begin to put emphasis on
replicability.

A few individuals are speaking out on this topic. I
recommend a blogpost by Brian
Knutson who argued, “Replication should be celebrated rather than
denigrated.” He suggested that we need a replicability index to complement the
H-index. If scientists were rewarded for doing studies that others can
replicate, we might see a very different rank ordering of research stars.

I leave the last word to Kent
Anderson: “Perhaps we’re measuring the wrong things … Perhaps we should
measure how many results have been replicated. Without that, we are pursuing a
cacophony of claims, not cultivating a world of harmonious truths.”

Simmons, J., Nelson, L., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant Psychological Science, 22 (11), 1359-1366 DOI: 10.1177/0956797611417632

jamiemcmorrin

Novelty, interest and replicability

Blog Archive