Is the Scientific Method Broken? The Need to Take Our Own Advice

(Essay found in Nesselroade & Grimm, 2019; pg. 355)

Nesselroade & Grimm, 2019

Using the 1960 volume of the Journal of Abnormal and Social Psychology, Cohen (1962) conducted an interesting study. Although the authors of the articles in that volume did not use power analyses, Cohen computed the power of the statistical tests used in each of the studies. According to Cohen’s early guidelines, a small effect size is about 0.20; a medium effect size is around 0.50; and a large effect size is around 0.80. Assuming that the researchers would want to detect a medium treatment-effect size, the average power of the tests in that volume was 0.46. This means that, on the average, there was only a 46 percent chance of detecting a medium effect size! This realization prompts us to contemplate just how many other studies might have been potentially included in the literature but were abandon, perhaps prematurely, because not enough power was marshaled to detect a treatment effect. Cohen’s admonition to use power analyses became widely known among researchers (Cohen, 1977; Sedlmeier & Gigerenzer, 1989). Yet, 24 years later, a study similar to Cohen’s (1962), using the 1984 volume of the Journal of Abnormal Psychology, found that the average statistical power for detecting a medium effect size had actually gone down to only 0.37 (Sedlmeier & Gigerenzer, 1989; Rossi, Rossi, & Cottrill, 1990)! There is little reason to believe the situation is much different today. Despite the urging of statisticians, power analyses have not become a standard practice among researchers.

If a performed and reported power analysis were to become a standard step in the research process, it would both give studies that are investigating a genuine treatment effect a better chance of finding supporting evidence, and give studies that end up failing to reject the null hypothesis more validity. For instance, if a study was set up to detect a small difference and the appropriate statistical power was generated for the test, then a finding of a failure to reject the null might be seen as theoretically and practically important to other researchers. For one thing, it might keep others from spending time and energy asking the same question, and secondly, it might stimulate the development of different ideas about how that part of the world works. Unfortunately, power analyses tend not to be performed and null findings tend not to be published (see Box 8.1). Scientists can be a stubborn bunch, and the standards of the scientific process can be hard to change. However, with each new generation of scientists comes a new opportunity to do things differently. Will a new generation of researchers choose to use the tools of power?

Find this and other essays regarding “Is the Scientific Method Broken?” in the Nesselroade & Grimm textbook.

Cohen, J. (1962). The statistical power of abnormal psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145–153.

Cohen, J. (1977). Statistical power analysis for the behavioral sciences. (Rev. ed.). New York: Academic Press.

Rossi, J. S., Rossi S. R., & Cottrill, S. D. (1990). Statistical power in research in social and abnormal psychology, Journal of Consulting and Clinical Psychology, 58, 646 – 656.

Sedlmeier, P., & Gigerenzer, G. (1989). Do statistical studies of power have an effect on the power of studies. Psychological Bulletin, 105(2), 309–316.