Trust, But Verify

Yufen Chun 2019-11-29 6 min read {Academic writing} [Research]

Introduction

The title came from the 2013 cover story of The Economist, How Science Goes Wrong.

Reproducibility and replicability are hallmarks of good science. In 2019, the Committee on Reproducibility and Replicability in Science published their report on the subject. The report is entitled Reproducibility and Replicability in Science. Regarding the importance of reproducibility and replicability, it stated:

Reproducibility and replicability are often cited as hallmarks of good science. Being able to reproduce the computational results of another researcher starting with the same data and replicate a previous study to test its results or inferences both facilitate the self-correcting nature of science. A newly reported discovery may prompt retesting and confirmation, examination of the limits of the original result, and reconsideration, affirmation, or extension of existing scientific theory.

Definitions

The meanings of Reproducibility and replicability are not uniform in various discipline.

In her research Terminologies for Reproducible Research in 2018, Lorena A. Barba outlined 3 types of usage of the 2 words “reproducibility” and “replicability” by discipline. The 3 types of usage are:

A: The terms are used with no distinction between them.

B1: Reproducibility refers to instances in which the original researcher’s data and computer codes are used to regenerate the results, while “replicability” refers to instances in which a researcher collects new data to arrive at the same scientific findings as a previous study.

B2: Reproducibility refers to independent researchers arriving at the same results using their own data and methods, while “replicability” refers to a different team arriving at the same results using the original author’s artefacts.

B1 and B2 are in opposition of each other with respect to which term involves reusing the original authors’ digital artefacts of research (“research compendium”) and which involves independently created digital artefacts.

Barba also collected data on the usage of these terms across a variety of disciplines. The distribution is:

The Committee on Reproducibility and Replicability in Science gave a definition for the 2 words:

Reproducibility is obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis. This definition is synonymous with “computational reproducibility,” and the terms are used interchangeably in this report.

Replicability is obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.

Research misinterpretations

Two articles demonstrates the risk of bias in research findings.

In 2005, John P.A. Ioannidis published his paper Why Most Published Research Findings Are False.

There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser pre-selection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

Ioannidis also explained his paper in 2 videos.

S. Goodman and S. Greenland wrote a paper in response, Assessing the Unreliability of the Medical Literature: a Response to “Why Most Published Research Findings Are False”.

A recent article in this journal (Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2: e124) argued that more than half of published research findings in the medical literature are false. In this commentary, we examine the structure of that argument, and show that it has three basic components:

  1. An assumption that the prior probability of most hypotheses explored in medical research is below 50%.
  1. Dichotomization of P-values at the 0.05 level and introduction of a “bias” factor (produced by significance-seeking), the combination of which severely weakens the evidence provided by every design.
  1. Use of Bayes theorem to show that, in the face of weak evidence, hypotheses with low prior probabilities cannot have posterior probabilities over 50%.

Thus, the claim is based on a priori assumptions that most tested hypotheses are likely to be false, and then the inferential model used makes it impossible for evidence from any study to overcome this handicap. We focus largely on step (2), explaining how the combination of dichotomization and “bias” dilutes experimental evidence, and showing how this dilution leads inevitably to the stated conclusion. We also demonstrate a fallacy in another important component of the argument –that papers in “hot” fields are more likely to produce false findings.

We agree with the paper’s conclusions and recommendations that many medical research findings are less definitive than readers suspect, that P-values are widely misinterpreted, that bias of various forms is widespread, that multiple approaches are needed to prevent the literature from being systematically biased and the need for more data on the prevalence of false claims. But calculating the unreliability of the medical research literature, in whole or in part, requires more empirical evidence and different inferential models than were used. The claim that “most research findings are false for most research designs and for most fields” must be considered as yet unproven.

Epilogue

An article entitled Repeating Experiments is Not Enough in the Nature magazine in January 2018 argued that routine replication might actually make matters worse. Some papers are like “grand mansions of straw”. Verifying results requires disparate lines of evidence — a technique called triangulation.

The paper formulated a check list from Triangulation in aetiological epidemiology by Debbie A Lawlor, Kate Tilling and George Davey Smith: