My exams have been open-book, open-notes for as long as I can remember. I tell students from the outset that they should focus on understanding rather than memorization, and I think they take comfort in knowing that they can look up formulas and definitions during exams.
As we approach the end of the fall term*, I have often told students that some terms, symbols, and facts should have become so familiar that they need not refer to their notes, even though they never set out to memorize the term, symbol, or fact.
* I am teaching the first in a two-course sequence for business majors. We study inference for a single mean and for a single proportion at the end of the course. The next course will begin with inference for comparing two groups.
For this blog post I decided to make a list of things that I want students to know without looking at their notes*. In the spirit of fairness, I am going to do this without looking at any of my course notes. In the spirit of fun, I encourage you to compile your own list before reading mine.
* This is very different from my list of thirteen important topics for students to learn. See post #52, here.
- The term observational unit
- The term variable
- The term categorical variable
- The term numerical variable
- The term explanatory variable
- The term response variable
These are the building blocks of designing a study, analyzing data, and drawing conclusions. I don’t want my students to memorize definitions for these terms, but I ask about these so often* that I hope they can answer my questions without looking anything up.
* See post #11, Repeat after me, here.
- The symbol n
If students need to look up in their notes that n is our symbol for sample size, then they’re missing out on a lot.
- The term population
- The term sample
- The term parameter
- The term statistic
Again, I don’t want my students to memorize definitions of these terms, and I certainly won’t ask them to define these on an exam. But I also don’t want students to have to stop and look up the definitions whenever they encounter these words. In the last week or two, I have often said something like “remember that statistical inference is about inferring something about a population parameter based on a sample statistic,” and I sure don’t want students to need to look up those four terms in a glossary to understand my point.
- The symbol p-hat
- The symbol x-bar
- The symbol π
- The symbol μ
I have told my students several times in the past few weeks that understanding what these symbols mean needs to be second-nature for them. It’s hard enough to understand a statement such as E(X-bar) = μ without having to look up what each symbol means. I agree that we can and should express this result in words as well as symbols: If you select a very large number of random samples from a population, then the average of the sample averages will be very close to the population average. But that’s a lot of words, and it’s very handy to use symbols. I was very tempted to include the symbols σ and s on this list also.
- The term random sampling
- The term random assignment
- The term confounding variable
As I’ve written before*, I really want students to understand the difference between random sampling and random assignment. In particular, I’d like students to understand the different kinds of conclusions that follow from these different uses of randomness. Random sampling allows for findings about the sample to be generalized to the population, and random assignment opens the door to drawing cause-and-effect conclusions. Confounding variables in observational studies provide an alternative to cause-and-effect explanations. I hope that students learn these ideas well enough that they think of them as they read about statistical studies in their everyday lives. Of course, I know that they will not refer to their notes from my class as they go about their everyday lives.
- How to calculate an average
I always tell my students that they do not need to memorize formulas. But I can expect them to know that calculating an average involves adding the values and dividing by the number of values, right? I’d also like students to know that the median is the ordered value in position (n+1)/2, but that looks like a formula, so I’ll leave that off this list.
- The idea that standard deviation is a measure of variability
I do not expect my students to know the formula for calculating standard deviation, and I rarely ask them to calculate a standard deviation by hand. But I do want them to know, without referring to their notes, that a larger standard deviation indicates more variability.
- How to calculate proportions (marginal, conditional, joint) from a two-way table of counts
My students have performed these calculations when analyzing categorical data and also for calculating conditional probabilities. I hope that they feel confident with such calculations without using their notes.
- The idea that a difference between two percentages is not a percentage difference.
I don’t care if students need to look up how to calculate a percentage difference, but I do want them to know that that a difference in percentage points is not the same as a percent difference. I don’t mean that I want to them to be able to state that fact, but I want them to recognize it when they encounter it. For example, I’d like student to realize that increasing your success rate from 10% to 15% is not a 5% improvement in the success rate*.
* I wrote an entire essay about this in post #28, A pervasive pet peeve, here.
- How to interpret a z-score
- How to calculate a z-score
Perhaps I am violating my policy about not requiring students to learn formulas here. But notice that I listed the interpretation first. I want students to know, without looking it up, that a z-score reveals how many standard deviations a value is from the mean*. This interpretation tells you how to calculate the z-score: [(value – mean) / standard deviation]. Granted, I suspect that most students learn the formula rather than think it through from the interpretation, but I think this one is important enough to know without referring to notes, because the idea is so useful and comes up so often.
* See post #8, End of the alphabet, here, for more about z-scores.
- That probabilities cannot be less than zero or greater than one.
- That the probability of an event is one minus the probability of its complement
These two do not require looking anything up, right? If I ask what’s wrong with the statements that Pr(E) = -0.6 or Pr(E) = 1.7, I sure hope that a student does not need to refer to any rules to answer the question. Similarly, if I say that the probability of rain tomorrow is 0.2 and then ask for the probability that it does not rain tomorrow, I’m counting on students to answer without using their notes.
- How to interpret an expected value
This is the one of the first items that came to mind when I decided to create this list. If I had been given a dime for every time I’ve reminded a student that expected value means long-run average, then I would have accrued a very large average number of dimes per year over my long teaching career.
- The term mutually exclusive
- The term independent events
The meaning of these terms in probability closely mirrors their mean in everyday use, so I hope students can answer questions about these terms without consulting their notes. I am tempted to include the addition rule for mutually exclusive events and the multiplication rule for independent events on this list, but I’ll resist that temptation.
- The idea that about 95% of the data from a normal distribution fall within two standard deviations of the mean
I’m not asking that students know the 68% and 99.7% aspects of the empirical rule by heart, only the part about 95% falling within two standard deviations of the mean*. Several times in the past few weeks I have said something like: The value of the test statistic is 3.21 (or perhaps 1.21). Is that very far out in the tail of a normal curve? How do you know? At a minimum I’d like students to realize that a z-score of greater than 2 (in absolute value) is far enough in the tail to be worth noting.
* I am tempted to include knowing, without looking it up, that the more precise multiplier is 1.96, but I won’t go that far. I do reserve the right to say things like “you know what the value 1.96 means in our course, right?” to my students.
- The idea that averages vary less than individual values
- The idea that the variability in a sample statistic (proportion or mean) decreases as the sample size increases
Now I’m asking a lot. Being back to full-time teaching after a year off has led me to rethink many things, but I have not wavered on my conviction that sampling distributions comprise the most challenging topic for students*. I am trying to keep my expectations modest with these two items, starting with the basic idea that averages vary less than individual values. Even that is challenging for students, because the even more fundamental idea that averages vary from sample to sample is non-trivial to wrap one’s mind around.
- That a confidence interval estimates the value of a population parameter
- That a larger sample size produces a smaller margin-of-error, a narrower confidence interval
- That a confidence interval for a population mean is not a prediction interval for a single observation
I’m not expecting students to know any confidence interval formulas off the top of their heads. When it comes to confidence intervals, I only ask for these three things. I consider the last of these three to be the most important misconception that we should address about confidence intervals*.
* See post #15, How confident are you, part 2, here.
- That null and alternative hypotheses are about population parameters
- That a smaller p-value indicates stronger evidence against the null hypothesis
- That the null hypothesis is rejected when the p-value is smaller than the significance level
Similarly, these are three things I’d like students to know about hypothesis testing without consulting their notes. The first of these is part of my frequent reminder to students that part of statistics involves making inferences about a population parameter based on a sample statistic. I hope that relying on simulation-based inference* leads students to internalize the second of these points. I try not to over-emphasize making test decisions, as compared to assessing strength of evidence, but I do want students to know how to determine whether to reject a null hypothesis.
* See post #12, Simulation-based inference, part 1, here.
- The idea that statistical inference depends on random (or at least representative) samples from the population
- The idea that confidence intervals and hypothesis tests give consistent results
- The distinction between statistical significance and practical importance
Here’s a final set of three aspects of statistical inference for which I hope that students do not have to check their notes. I’m not mentioning random assignment with the first one because my students have not yet studied inference for comparing two groups. For the middle one, I want students to realize that when a hypothesis test rejects a hypothesized value for a parameter, then the corresponding confidence interval should not include that value. And when the hypothesis test fails to reject the value, then the corresponding confidence interval should include that value. I don’t expect students to know the subtleties involved here, for example that the test needs to be two-sided and that this doesn’t always hold exactly for inference about a proportion. I just want the basic idea of this consistency to make sense and not require looking up.
Whew, this list is far longer than I anticipated when I began. Remember that my students and I are only halfway through a two-course sequence! I also strongly suspect that I’ve omitted several things that will cause me to shake my head vigorously when they come to me.
But also remember that my exams are open-notes, so my students can always look these things up. But it would certainly save them a lot of time if these 40 items truly come as second-nature to them. More importantly, I want them to know these things well enough to apply what they’ve learned far beyond their brief time in my course.