Perhaps you’ve heard this expression:
Most people use statistics as a drunk uses a lamppost – more for support than for illumination.
I do not doubt that there is some truth in this clever saying, but I try to convince my students that statistics can shed light on important questions. I have given talks to high school students with the title “Statistics of Illumination,” in which I present several examples to make this point. In this post I will present one of the examples, and I will present other examples in later posts. Questions that I pose to students will appear in italics.
Consider the following table of counts, based on data from the University of California at Berkeley’s graduate admissions process in 1973:
Why is it not reasonable to simply consider the counts 533 and 113 in order to compare admissions decisions of men and women?
This question leads students to consider the importance of proportional reasoning. Because many more men than women applied to these programs, we need to calculate proportions (or percentages, or rates).
Calculate the proportion of male applicants who were accepted. Also calculate the proportion of female applicants who were accepted.
These proportions can be calculated as:
- Men: 533/1198 ≈ .445 were accepted
- Women: 113/449 ≈ .252 were accepted
Comment on how these proportions compare. Does this difference appear to be substantial?
The acceptance rate* for men is almost 20 percentage points higher than the acceptance rate for women (44.5% vs. 25.2%). This certainly seems like a big enough difference to indicate that something worth investigating further is going on here.
* Saying “acceptance rate” is much simpler language than saying “proportion who were accepted” or even “proportion of acceptance,” but I suggest taking the time to explain to students that the term “acceptance rate” refers to a proportion here.
Let’s proceed to dig a little deeper. The counts in the table above came from combining data from two programs that we’ll call A and F. The following tables show the counts for these two programs separately:
|Men accepted||Men denied||Women accepted||Women denied|
Before analyzing these data, first convince yourself that there’s no cheating here: The bottom row reveals that counts for programs A and F really do add up to the counts given earlier
Within each program, calculate the proportion of male applicants who were accepted and the proportion of female applicants who were accepted. Comment on how the proportions compare within each program.
This requires students to think a bit harder than the earlier calculation of proportions did, because they need to calculate for themselves the total number of applicants for each (program, sex) pair. These acceptance proportions can be calculated as:
- Program A, men: 511/(511+314) = 511/825 ≈ .619
- Program A, women: 89/(89+19) = 89/108 ≈ .824
- Program F, men: 22/(22+351) = 22/373 ≈ .059
- Program F, women: 24/(24+317) = 24/341 ≈ .070
Now when we compare the acceptance rates between men and women, we see a very different picture than before: Women have a higher acceptance rate than men in both programs! The difference is slight in program F (7.0% vs. 5.9%) and considerable in program A (82.4% vs. 61.9%).
Based on this more in-depth analysis, is there evidence of discrimination against women in the graduate admissions process?
No. At the program level, where admissions decisions are made, there’s no evidence to suggest that men have a higher acceptance rate than women. If anything, the acceptance rates appear to favor women in both programs. Perhaps program A should have done more to recruit women applicants (only 108 of their 933 applicants were women), but they accepted a substantially higher proportion of women applicants than men.
Some students will comment on the weird thing that has happened here. If not, you can ask them what’s odd about their calculations. If that prompt does not work, go ahead and point out the oddity: Women have a higher acceptance rate than men in both programs, but men have a higher acceptance rate than women when you combine the two programs together.
Explain, based on the data provided, how this oddity occurs.
This is the hard part. This question requires students to think through what’s happening here. Typically, the first response I hear is: More men than women applied. To which I respond: Yes, but that’s why we calculated proportions in the first place. I hasten to add: That’s not completely off-track, but there’s more to it. I often need to give a hint, so I ask students:
Think about two ways in which programs A and F differ from each other, with regard to applicants’ sex and acceptance rates.
Many students still struggle to discern what’s going on at this point. But I resist telling them the explanation, because I think their struggle is worthwhile. I also encourage them to work with nearby students to figure this out together. Eventually students come to realize that :
- Most men applied to program A, and most women applied to program F.
- Program A had much higher acceptance rates than program F.
These two points, taken together, explain the oddity. We can summarize this explanation more succinctly in one sentence: Men applied mostly to the program that’s easy to get into, whereas women applied mostly to the program that’s very hard to get into. This explains how it happens that women have a higher acceptance rate than men in both programs but a lower acceptance rate than men when the programs are combined.
I believe that this example/activity illustrates statistical thinking, which is the first recommendation in the GAISE (link) report. The math/arithmetic involved here is quite straightforward, but the thinking required to explain this phenomenon is fairly sophisticated. Moreover, this example/activity illustrates the new emphasis in the 2016 GAISE report about giving students experience with multivariable thinking. The observational units here are the applicants, and the three variables are sex, admission decision, and program. All three of these variables are related to each other, and understanding the oddity* requires understanding those relationships. You might refer to sex as the explanatory variable, admission decision as the response variable, and program as the confounding variable.
* This oddity is often known as Simpson’s paradox. When I started teaching 30 years ago, I joked with my students that it’s unclear whether Simpson’s paradox was named for Lisa or Bart. I would not have guessed that Lisa and Bart would still be appearing in new episodes 30 years later!
I have used this example on the first day of class in a statistical literacy course. In that setting I do not bother to introduce any terminology but instead focus on the statistical thinking involved. You could also use it when discussing the analysis of two-way tables, or really at any point in a course. The key idea to emphasize is that a relationship between two variables might be explained by considering how both could be related to a third variable. And that statistics can be illuminating!
P.S. For a fun but difficult follow-up challenge, ask students to create their own made-up example for which Simpson’s paradox occurs. For example, ask them to create an example with two softball players (call them Amy and Barb), where Amy has a higher proportion of successes (hits) than Barb in June and also in July, but Barb has a higher proportion of hits than Amy when June and July are combined. This sounds weird, perhaps impossible, but it could happen. To succeed in creating such an example, you need to think through the two conditions needed to make this paradox happen. Encourage students to think of how this example could be analogous to the admissions example, because if they just start making up some numbers and hope that the paradox will occur, they will be at it for a very long time!
The key here is that Amy has to get most of her attempts when it’s easy to get a hit, and Barb must get most of her attempts when it’s hard to get a hit. Here’s one way to make this happen:
- June: Amy gets 9 hits in 10 attempts (90%), Barb gets 80 hits in 100 attempts (80%)
- July: Amy gets 20 hits in 100 attempts (20%), Barb gets 1 hit in 10 attempts (10%)
Sure enough, Amy does better than Barb in both months. But when we combine the two months:
- June and July combined: Amy gets 29 hits in 110 attempts (26.4%), Barb gets 81 hits in 110 attempts (73.6%)
Why does Barb do (much) better than Amy overall despite doing worse in each month? Because she was lucky enough to get most of her attempts when it was easy to get a hit. (The pitching was really lousy in June!!)
If you’d rather avoid the sports context, you could say that Amy and Barb are college roommates, with the oddity that Amy has a higher proportion of A grades than Barb in humanities courses and also in science courses, but Barb has a higher proportion of A grades than Barb when these two kinds of courses are combined.
Further reading: The Berkeley graduate admissions data are from a well-known example that has been presented in many textbooks. I’ve used only programs A and F in order to keep things simpler. The original article is here. An interesting follow-up is here. The Berkeley data, more examples, and more information about Simpson’s paradox are also presented in a Wikipedia entry here.