Why do we make statistics so hard for our students?

Why do we make statistics so hard for our students?

(Warning: long and slightly wonkish)

If you’re like me, you’re continually frustrated by the fact that undergraduate students struggle to understand statistics. Actually, that’s putting it mildly: a large fraction of undergraduates simplyrefuse to understand statistics; mention a requirement for statistical data analysis in your course and you’ll get eye-rolling, groans, or (if it’s early enough in the semester) a rash of course-dropping.

This bothers me, because we can’t do inference in science without statistics*. Why are students so unreceptive to something so important? In unguarded moments, I’ve blamed it on the students themselves for having decided, a priori and in a self-fulfilling prophecy, that statistics is math, and they can’t do math. I’ve blamed it on high-school math teachers for making math dull. I’ve blamed it on high-school guidance counselors for telling students that if they don’t like math, they should become biology majors. I’ve blamed it on parents for allowing their kids to dislike math. I’ve even blamed it on the boogie**.

All these parties (except the boogie) are guilty. But I’ve come to understand that my list left out the most guilty party of all: us. By “us” I mean university faculty members who teach statistics – whether they’re in Departments of Mathematics, Departments of Statistics, or (gasp) Departments of Biology. We make statistics needlessly difficult for our students, and I don’t understand why.

The problem is captured in the image above – the formulas needed to calculate Welch’s t-test. They’re arithmetically a bit complicated, and they’re used in one particular situation: comparing two means when sample sizes and variances are unequal. If you want to compare three means, you need a different set of formulas; if you want to test for a non-zero slope, you need another set again; if you want to compare success rates in two binary trials, another set still; and so on. And each set of formulas works only given the correctness of its own particular set of assumptions about the data.

Given this, can we blame students for thinking statistics is complicated? No, we can’t; but we can blame ourselves for letting them think that it is. They think so because we consistently underemphasize the single most important thing about statistics: that this complication is an illusion. In fact, every significance test works exactly the same way.

Every significance test works exactly the same way. We should teach this first, teach it often, and teach it loudly; but we don’t. Instead, we make a huge mistake: we whiz by it and begin teaching test after test, bombarding students with derivations of test statistics and distributions and paying more attention to differences among tests than to their crucial, underlying identity. No wonder students resent statistics.

What do I mean by “every significance test works exactly the same way”? All (NHST) statistical tests respond to one problem with two simple steps.

 The problem:

  • We see apparent pattern, but we aren’t sure if we should believe it’s real, because our data are noisy.

 The two steps:

  • Step 1. Measure the strength of pattern in our data.
  • Step 2. Ask ourselves, is this pattern strong enough to be believed?

Teaching the problem motivates the use of statistics in the first place (many math-taught courses, and nearly all biology-taught ones, do a good job of this). Teaching the two steps gives students the tools to test any hypothesis – understanding that it’s just a matter of choosing the right arithmetic for their particular data. This is where we seem to fall down.

Step 1, of course, is the test statistic. Our job is to find (or invent) a number that measures the strength of any given pattern. It’s not surprising that the details of computing such a number depend on the pattern we want to measure (difference in two means, slope of a line, whatever). But those details always involve the three things that we intuitively understand to be part of a pattern’s “strength” (illustrated below): the raw size of the apparent effect (in Welch’s t, the difference in the two sample means); the amount of noise in the data (in Welch’s t, the two sample standard deviations), and the amount of data in hand (in Welch’s t, the two sample sizes). You can see by inspection that these behave in the Welch’s formulas just the way they should: t gets bigger if the means are farther apart, the samples are less noisy, and/or the sample sizes are larger. All the rest is uninteresting arithmetical detail.

Step 2 is the P-value. We have to obtain a P-value corresponding to our test statistic, which means knowing whether assumptions are met (so we can use a lookup table) or not (so we should use randomization or switch to a different test***). Every test uses a different table – but all the tables work the same way, so the differences are again just arithmetic. Interpreting the P-value once we have it is a snap, because it doesn’t matter what arithmetic we did along the way: the P-value for any test is the probability of a pattern as strong as ours (or stronger), in the absence of any true underlying effect. If this is low, we’d rather believe that our pattern arose from real biology than believe it arose from a staggering coincidence (Deborah Mayo explains the philosophy behind this here, or see her excellent blog).

Of course, there are lots of details in the differences among tests. These matter, but they matter in a second-order way: until we understand the underlying identity of how every test works, there’s no point worrying about the differences. And even then, the differences are not things we need to remember; they’re things we need to know to look up when needed. That’s why if I know how to do one statistical test – any one statistical test – I know how to do all of them.

Does this mean I’m advocating teaching “cookbook” statistics? Yes, but only if we use the metaphor carefully and not pejoratively. A cookbook is of little use to someone who knows nothing at all about cooking; but if you know a handful of basic principles, a cookbook guides you through thousands of cooking situations, for different ingredients and different goals. All cooks own cookbooks; few memorize them.

So if we’re teaching statistics all wrong, here’s how to do it right: organize everything around the underlying identity. Start with it, spend lots of time on it, and illustrate it with one test (any test) worked through with detailed attention not to the computations, but to how that test takes us through the two steps. Don’t try to cover the “8 tests every undergraduate should know”; there’s no such list. Offer a statistical problem: some real data and a pattern, and ask the students how they might design a test to address that problem. There won’t be one right way, and even if there was, it would be less important than the exercise of thinking through the steps of the underlying identity.

Finally: why do instructors make statistics about the differences, not the underlying identity? I said I don’t know, but I can speculate.

When statistics is taught by mathematicians, I can see the temptation. In mathematical terms, the differences between tests are the interesting part. This is where mathematicians show their chops, and it’s where they do the difficult and important job of inventing new recipes to cook reliable results from new ingredients in new situations. Users of statistics, though, would be happy to stipulate that mathematicians have been clever, and that we’re all grateful to them, so we can get onto the job of doing the statistics we need to do.

When statistics is taught by biologists, the mystery is deeper. I think (I hope!) those of us who teach statistics all understand the underlying identity of all tests, but that doesn’t seem to stop us from the parade-of-tests approach. One hypothesis: we may be responding to pressure (perceived or real) from Mathematics departments, who can disapprove of statistics being taught outside their units and are quick to claim insufficient mathematical rigour when it is. Focus on lots of mathematical detail gives a veneer of apparent rigour. I’m not sure that my hypothesis is correct, but I’ve certainly been part of discussions with Math departments that were consistent with it.

Whatever the reasons, we’re doing real damage to our students when we make statistics complicated. It isn’t. Remember, every statistical test works exactly the same way. Teach a student that today.

Note: for a rather different take on the cookbook-stats metaphor, see Joan Strassmann’s interesting post here. I think I agree with her only in part, so you should read her piece too.

Another related piece by Christie Bahlai is here: “Hey, let’s all just relax about statistics” – but with a broader message about NHST across fields.

Finally, here’s the story of two ecologists who learned to love statistics– and it’s lots of fun.

© Stephen Heard ([email protected]) October 6, 2015



*In this post I’m going to discuss frequentist inferential statistics, or traditional “null-hypothesis significance testing”. I’ll leave aside debates about whether Bayesian methods are superior and whether P-values get misapplied (see my defence of the P-value). I’m going to refrain from snorting derisively at claims that we don’t need inferential statistics at all.

**OK, not really, but slipping that in there lets me link to this. Similarly I’m tempted to blame it on the rain, to blame it on Cain, to blame it on the Bossa Nova, and to blame it on Rio. OK, I’ll stop now; but if you’ve got one I missed, why not drop a link in the Replies?

***I’d include transforming the data as “switch to a different test”, but if you’d rather draw a distinction there, that’s fine.

时间: 2024-10-25 19:15:58

Why do we make statistics so hard for our students?的相关文章

URAL 1613. For Fans of Statistics(STL 数学啊 )

题目链接:http://acm.timus.ru/problem.aspx?space=1&num=1613 1613. For Fans of Statistics Time limit: 1.0 second Memory limit: 64 MB Have you ever thought about how many people are transported by trams every year in a city with a ten-million population whe

使用Statistics命令查看Netapp存储实时性能统计数据

Cluster-mode下,引入了新的命令行工具,可以查看详细的存储系统性能指标,在进行存储系统诊断时候尤其有用. 注意:使用该命令需要进入高级模式,SSH登录存储CLI环境之后,执行命令:set  –privilege  advanced.参考下图1: 图1 命令使用方法 总览 可以使用命令statistics show-periodic查看当前系统实时的性能指标概览. 图2 样本收集 也可以使用命令statistics start |stop 收集一段时间内的性能参数,然后通过statist

Measuring PostgreSQL Checkpoint Statistics

Checkpoints can be a major drag on write-heavy PostgreSQL installations. The first step toward identifying issues in this area is to monitor how often they happen, which just got an easier to use interface added to the database recently. Checkpoints

SQL SERVER 统计信息概述(Statistics)

前言 查询优化器使用统计信息来创建可提高查询性能的查询计划,对于大多数查询,查询优化器已经为高质量查询计划生成必要的统计信息,但是在少数情况下,您需要创建附加的统计信息或者修改查询设计以得到最佳结果.因此理解和合理使用统计信息是数据库优化的方式之一.   统计信息的分类 根据创建源的不同,统计信息分为两种表统计信息和索引统计信息,除非你自定义它们,否则它们之间没有本质的区别. 索引统计信息建立在索引上面,因此对于创建已经存在的数据上,在创建索引的时候会扫描全部数据,这些数据也会创建在索引的统计信

【转自mos文章】使用单条sql来查询出awr中的syatem statistics

使用单条sql来查询出awr中的syatem statistics 参考自: How to monitor system statistics from AWR snapshot by single SQL? (Doc ID 1320445.1) 适用于: Oracle Server - Enterprise Edition - Version: 10.2.0.1 and later   [Release: 10.2 and later ] Information in this documen

SET STATISTICS IO

SET STATISTICS IO (Transact-SQL) https://technet.microsoft.com/zh-cn/library/ms184361(SQL.90).aspx 如何识别SQL Server中的IO瓶颈 http://blog.csdn.net/dba_huangzj/article/details/7773744 SQLSERVER读懂语句运行的统计信息 http://www.cnblogs.com/lyhabc/archive/2013/01/13/285

一站式学习Wireshark(七):Statistics统计工具功能详解与应用

Wireshark一个强大的功能在于它的统计工具.使用Wireshark的时候,我们有各种类型的工具可供选择,从简单的如显示终端节点和会话到复杂的如Flow和IO图表.本文将介绍基本网络统计工具.包括:捕捉文件摘要(Summary),捕捉包的层次结构(Protocol Hirarchy), 会话(Conversations), 终端节点(Endpoints), HTTP. 更多信息 Summary: 从statistics菜单,选择Summary: 如下图的截屏所示,你会看到: File: 捕捉

FAQ: Automatic Statistics Collection (文档 ID 1233203.1)

In this Document   Purpose   Questions and Answers   What kind of statistics do the Automated tasks collect   How do I revert to a previous set of statistics?   Does the automatic statistic collection jobs populate CHAIN_CNT?   11g+ Automatic Mainten

PLSQL_性能优化系列15_Oracle Statistics统计信息

2014-12-18 BaoXinjian 一.摘要 Statistic 对Oracle 是非常重要的. 它会收集数据库中对象的详细信息,并存储在相应的数据字典里. 根据这些统计信息, optimizer 可以对每个SQL 去选择最好的执行计划. Statistic 对Oracle 是非常重要的,它会收集数据库中对象的详细信息,并存储在相应的数据字典里. 根据这些统计信息, optimizer 可以对每个SQL 去选择最好的执行计划. Oracle Statistic 的收集,可以使用analy