How Big Is “Big”? Interpreting Effect Sizes in L2 Research

TLDR

Effect size use has surged in L2 research, yet interpretations remain scarce and rely on generic Cohen benchmarks that are not tailored to the field. This article aims to advance field‑specific interpretation of d and r by compiling effect sizes from 346 primary studies and 91 meta‑analyses. The authors analyze 346 primary studies and 91 meta‑analyses comprising over 604,000 participants to derive a field‑specific scale. The analysis shows Cohen's benchmarks underestimate L2 effects, leading to a proposed field‑specific scale and eight considerations for evaluating magnitude and practical significance.

Abstract

The calculation and use of effect sizes—such as d for mean differences and r for correlations—has increased dramatically in second language (L2) research in the last decade. Interpretations of these effects, however, have been rare and, when present, have largely defaulted to Cohen's levels of small ( d = .2, r = .1), medium (.5, .3), and large (.8, .5), which were never intended as prescriptions but rather as a general guide. As Cohen himself and many others have argued, effect sizes are best understood when interpreted within a particular discipline or domain. This article seeks to promote more informed and field‐specific interpretations of d and r by presenting a description of L2 effects from 346 primary studies and 91 meta‐analyses ( N > 604,000). Results reveal that Cohen's benchmarks generally underestimate the effects obtained in L2 research. Based on our analysis, we propose a field‐specific scale for interpreting effect sizes, and we outline eight key considerations for gauging relative magnitude and practical significance in primary and secondary studies, such as theoretical maturity in the domain, the degree of experimental manipulation, and the presence of publication bias.

References

Page 1

	Year	Citations

Page 1