Data Is Beautiful @lemmy.ml Ragdoll X @lemmy.world 6 mo. ago

Distribution of more than 64,000 Z-values from PLOS ONE. Despite being a journal that welcomes null results, there's a huge hole of non-significant studies.

Created by Adrian Barnett: https://twitter.com/aidybarnett/status/1572006426167619585

12 comments

The linked tweet in turn links to Adrian Barnett's blog post: https://medianwatch.netlify.app/post/z_values/

The two large spikes in Z-values are just below and above the statistically significant threshold of ± 1.96, corresponding to a p-value of less than 0.05. The plot looks like a Normal distribution that’s caved in.

From my limited statistical understanding, the Z value measures standard deviations compared to the null hypothesis. This page on statistical analysis says:

Often, you will run one of the pattern analysis tools, hoping that the z-score and p-value will indicate that you can reject the null hypothesis

To reject the null hypothesis, you must make a subjective judgment regarding the degree of risk you are willing to accept for being wrong (for falsely rejecting the null hypothesis). Consequently, before you run the spatial statistic, you select a confidence level.

So from that, I understand the takeaway from the Z value graph is that if researchers are truly willing to publish studies which don't reach a definitive conclusion, then the huge gap in the middle should be filled in. But it's not.

And the danger is that valuable data from studies straddling the arbitrary p=0.05 line is simply being discarded by researchers, before ever reaching the journal. Such data -- while not conclusive on its own -- could have been aggregated in a metastudy to prove or disprove the effectiveness of medicines and procedures that have non-obvious or long-term impacts. That is a loss to all of humanity.

(Image credit: https://pro.arcgis.com/en/pro-app/3.1/tool-reference/spatial-statistics/what-is-a-z-score-what-is-a-p-value.htm)

A while ago, I read a book about how researchers inadvertently misuse statistical tests, along with how to understand what statistics can and cannot do, from the perspective of scientists who will have to work with datasets. It's not terribly long, and is accessible with no prerequisite of any statistical experience. https://nostarch.com/statsdonewrong

EDIT: the author of that book has published its entire text as a website: https://www.statisticsdonewrong.com
- When I was in academia, I always thought there should be a journal for publishing things that go wrong or do not work. I can only imagine there are some experiments that were repeated many times in human history because no one published that they did not work.
  
  My understanding -- again, just from that book; I've never worked in academia -- is that some journals now have a procedure for "registering" a study before it happens. That way, the study's procedure will have been pre-vetted and the journal commits to -- and the researchers promise to -- publish the data irrespective of any conclusive results. Not perfect, but could certainly help.
  
  I can't tell you how many times I had some exciting idea, dug around in the literature, found someone 10, 20, even 30 years ago who'd published promising work along exactly the line I was thinking, only to completely abandon the project after one or two publications. I've come to see that pattern as "this didn't actually work, and the first paper was probably bullshit."
  
  It's really hard to write an interesting paper based on "this didn't work," unless you can follow up to the point where you can make a positive statement of why it didn't work, and at that point, you're going to write a paper based on the positive conclusion and demote the negative finding to some kind of control data. You have to have the luxury of time, resources, and interest to go after that positive statement, and that's usually incompatible with professional development.
- the danger is that valuable data from studies straddling the arbitrary p=0.05 line is simply being discarded by researchers
  
  Or maybe experimenters are opting to do further research themselves rather than publish ambiguous results. If you aren't doing MRI or costly field work, fine tuning your experimental design to get a conclusive result is a more attractive option than publishing a null result that could be significant, or a significant result that you fear might need retracting later.
  
  maybe experimenters are opting to do further research themselves rather than publish ambiguous results
  
  While this might seem reasonable at first, I feel it is at odds with the current state of modern science, where results are no longer the product of individuals like Newton or the Curie's, but rather whole teams and even organizations, working across universities and across/out of this world. The thought of hoarding a topic to oneself until it's ripe seems more akin to commercial or military pursuits rather than of academia.
  
  But that gut feeling aside, withholding data does have a cost, be it more pedestrians being hit by cars or bunk science taking longer to disprove. At some point, a prolonged delay or shelving data outright becomes unethical.
  
  fine tuning your experimental design to get a conclusive result is a more attractive option than publishing a null result that could be significant
  
  I'm not sure I agree. Science is constrained by human realities, like funding, timelines, and lifespans. If researchers are collecting grants for research, I think it's fair for the benefactors to expect the fruits of the investment -- in the form of published data -- even if it's not perfect and conclusory or even if the lead author dies before the follow-up research is approved. Allowing someone else to later pick up the baton is not weakness but humility.
  
  In some ways, I feel that "publish or perish" could actually be a workable framework if it had the right incentives. No, we don't want researchers torturing the data until there's a sexy conclusion. But we do want researchers to work together, either in parallel for a shared conclusion, or by building on existing work. Yes, we want repeat experiments to double check conclusions, because people make mistakes. No, we don't want ten research groups fighting against each other to be first to print, wasting nine redundant efforts.
  
  or a significant result that you fear might need retracting later.
  
  I'm not aware of papers getting retracted because their conclusion was later disproved, but rather because their procedure was unsound. Science is a process, honing towards the truth -- whatever it may be -- and accepting its results, or sometimes its lack of results.
-2 < Z < 2 :

I set up my experiment wrong.

You've viewed 12 comments.