Wednesday, May 11, 2011

Applied linguists just aren't serious about statistics

I'm not a statistician myself, and I've never published a quantitative study, so I'm not claiming any higher moral ground here, but it's sad to see just how lax our field is when it comes to statistical reporting. In the most recent issue of Language Learning, Plonsky and Gass consider 174 studies investigating the result of students interacting with others in the target language.
"Reliability estimates are one area that, although not perfect, have been reported very well (compared to other areas of SLA research; see Norris & Ortega, 2000; Plonsky, in press). Most other statistics have been reported either insufficiently or unevenly across the sample of studies. Although almost all of the studies reported using statistical tests (see Table 5), only 25% reported setting a pre- determined level of statistical significance, 2% reported the results of a power analysis, and only 3% (five studies, three by the same author, McDonough) reported checking the assumptions of their statistical tests. A somewhat larger portion of studies reported statistical significance as an exact p-value (44%) as opposed to greater or less than a particular p-value such as .05 (61%). However, these figures appear low, again, in light of the very high percentage of studies in the sample that employed statistical tests. Furthermore, reports in this area are not only inconsistent in the aggregate; 46 studies (26%) reported both exact and relative (i.e., < or > ) p-values. Means and standard deviations were presented in 64% and 52% of the studies, respectively. These figures are also somewhat low considering the frequency of studies employing mean-based statistical tests. Moreover, those data also indicate that 12% of the studies reporting means did so without reporting the standard deviations of those means. Along these same lines, we also see that the percentage of studies reporting t values and f values was only 26% and 32% (compared to 40% of studies reporting t tests and 39% reporting ANOVAs, ANCOVAs, and/or MAN[C]OVAs). Other statistics coded for were confidence intervals, reported in only five studies (3%), effect sizes (including d values and η2 for mean differences, phi coefficients for χ2, r2 and Cramer’s V for correlations; 18%), and whether an effect size (Cohen’s d) could be calculated from data in the report (41%). Finally and perhaps most surprisingly, 5% of the studies in the sample did not report sample size."

No comments: