Skip to content

#73 No notes needed

My exams have been open-book, open-notes for as long as I can remember.  I tell students from the outset that they should focus on understanding rather than memorization, and I think they take comfort in knowing that they can look up formulas and definitions during exams.

As we approach the end of the fall term*, I have often told students that some terms, symbols, and facts should have become so familiar that they need not refer to their notes, even though they never set out to memorize the term, symbol, or fact.

* I am teaching the first in a two-course sequence for business majors.  We study inference for a single mean and for a single proportion at the end of the course.  The next course will begin with inference for comparing two groups.

For this blog post I decided to make a list of things that I want students to know without looking at their notes*.  In the spirit of fairness, I am going to do this without looking at any of my course notes.  In the spirit of fun, I encourage you to compile your own list before reading mine.

* This is very different from my list of thirteen important topics for students to learn. See post #52, here.

  1. The term observational unit
  2. The term variable
  3. The term categorical variable
  4. The term numerical variable
  5. The term explanatory variable
  6. The term response variable

These are the building blocks of designing a study, analyzing data, and drawing conclusions.  I don’t want my students to memorize definitions for these terms, but I ask about these so often* that I hope they can answer my questions without looking anything up.

* See post #11, Repeat after me, here.

  1. The symbol n

If students need to look up in their notes that n is our symbol for sample size, then they’re missing out on a lot.

  1. The term population
  2. The term sample
  3. The term parameter
  4. The term statistic

Again, I don’t want my students to memorize definitions of these terms, and I certainly won’t ask them to define these on an exam.  But I also don’t want students to have to stop and look up the definitions whenever they encounter these words.  In the last week or two, I have often said something like “remember that statistical inference is about inferring something about a population parameter based on a sample statistic,” and I sure don’t want students to need to look up those four terms in a glossary to understand my point.

  1. The symbol p-hat
  2. The symbol x-bar
  3. The symbol π
  4. The symbol μ

I have told my students several times in the past few weeks that understanding what these symbols mean needs to be second-nature for them.  It’s hard enough to understand a statement such as E(X-bar) = μ without having to look up what each symbol means.  I agree that we can and should express this result in words as well as symbols: If you select a very large number of random samples from a population, then the average of the sample averages will be very close to the population average.  But that’s a lot of words, and it’s very handy to use symbols.  I was very tempted to include the symbols σ and s on this list also.

  1. The term random sampling
  2. The term random assignment
  3. The term confounding variable

As I’ve written before*, I really want students to understand the difference between random sampling and random assignment.  In particular, I’d like students to understand the different kinds of conclusions that follow from these different uses of randomness.  Random sampling allows for findings about the sample to be generalized to the population, and random assignment opens the door to drawing cause-and-effect conclusions.  Confounding variables in observational studies provide an alternative to cause-and-effect explanations.  I hope that students learn these ideas well enough that they think of them as they read about statistical studies in their everyday lives.  Of course, I know that they will not refer to their notes from my class as they go about their everyday lives.

* See posts #19 and #20, Lincoln and Mandela, here and here.

  1. How to calculate an average

I always tell my students that they do not need to memorize formulas.  But I can expect them to know that calculating an average involves adding the values and dividing by the number of values, right?  I’d also like students to know that the median is the ordered value in position (n+1)/2, but that looks like a formula, so I’ll leave that off this list.

  1. The idea that standard deviation is a measure of variability

I do not expect my students to know the formula for calculating standard deviation, and I rarely ask them to calculate a standard deviation by hand.  But I do want them to know, without referring to their notes, that a larger standard deviation indicates more variability.

  1. How to calculate proportions (marginal, conditional, joint) from a two-way table of counts

My students have performed these calculations when analyzing categorical data and also for calculating conditional probabilities.  I hope that they feel confident with such calculations without using their notes.

  1. The idea that a difference between two percentages is not a percentage difference.

I don’t care if students need to look up how to calculate a percentage difference, but I do want them to know that that a difference in percentage points is not the same as a percent difference.  I don’t mean that I want to them to be able to state that fact, but I want them to recognize it when they encounter it.  For example, I’d like student to realize that increasing your success rate from 10% to 15% is not a 5% improvement in the success rate*.

* I wrote an entire essay about this in post #28, A pervasive pet peeve, here.

  1. How to interpret a z-score
  2. How to calculate a z-score

Perhaps I am violating my policy about not requiring students to learn formulas here.  But notice that I listed the interpretation first.  I want students to know, without looking it up, that a z-score reveals how many standard deviations a value is from the mean*.  This interpretation tells you how to calculate the z-score: [(value – mean) / standard deviation].  Granted, I suspect that most students learn the formula rather than think it through from the interpretation, but I think this one is important enough to know without referring to notes, because the idea is so useful and comes up so often.

* See post #8, End of the alphabet, here, for more about z-scores.

  1. That probabilities cannot be less than zero or greater than one.
  2. That the probability of an event is one minus the probability of its complement

These two do not require looking anything up, right?  If I ask what’s wrong with the statements that Pr(E) = -0.6 or Pr(E) = 1.7, I sure hope that a student does not need to refer to any rules to answer the question.  Similarly, if I say that the probability of rain tomorrow is 0.2 and then ask for the probability that it does not rain tomorrow, I’m counting on students to answer without using their notes.

  1. How to interpret an expected value

This is the one of the first items that came to mind when I decided to create this list.  If I had been given a dime for every time I’ve reminded a student that expected value means long-run average, then I would have accrued a very large average number of dimes per year over my long teaching career.

  1. The term mutually exclusive
  2. The term independent events

The meaning of these terms in probability closely mirrors their mean in everyday use, so I hope students can answer questions about these terms without consulting their notes.  I am tempted to include the addition rule for mutually exclusive events and the multiplication rule for independent events on this list, but I’ll resist that temptation.

  1. The idea that about 95% of the data from a normal distribution fall within two standard deviations of the mean

I’m not asking that students know the 68% and 99.7% aspects of the empirical rule by heart, only the part about 95% falling within two standard deviations of the mean*.  Several times in the past few weeks I have said something like: The value of the test statistic is 3.21 (or perhaps 1.21).  Is that very far out in the tail of a normal curve? How do you know?  At a minimum I’d like students to realize that a z-score of greater than 2 (in absolute value) is far enough in the tail to be worth noting.

* I am tempted to include knowing, without looking it up, that the more precise multiplier is 1.96, but I won’t go that far.  I do reserve the right to say things like “you know what the value 1.96 means in our course, right?” to my students.

  1. The idea that averages vary less than individual values
  2. The idea that the variability in a sample statistic (proportion or mean) decreases as the sample size increases

Now I’m asking a lot.  Being back to full-time teaching after a year off has led me to rethink many things, but I have not wavered on my conviction that sampling distributions comprise the most challenging topic for students*.  I am trying to keep my expectations modest with these two items, starting with the basic idea that averages vary less than individual values.  Even that is challenging for students, because the even more fundamental idea that averages vary from sample to sample is non-trivial to wrap one’s mind around.

* See posts #41 and #42, Hardest topic, here and here.

  1. That a confidence interval estimates the value of a population parameter
  2. That a larger sample size produces a smaller margin-of-error, a narrower confidence interval
  3. That a confidence interval for a population mean is not a prediction interval for a single observation

I’m not expecting students to know any confidence interval formulas off the top of their heads.  When it comes to confidence intervals, I only ask for these three things.  I consider the last of these three to be the most important misconception that we should address about confidence intervals*.

* See post #15, How confident are you, part 2, here.

  1. That null and alternative hypotheses are about population parameters
  2. That a smaller p-value indicates stronger evidence against the null hypothesis
  3. That the null hypothesis is rejected when the p-value is smaller than the significance level

Similarly, these are three things I’d like students to know about hypothesis testing without consulting their notes.  The first of these is part of my frequent reminder to students that part of statistics involves making inferences about a population parameter based on a sample statistic.   I hope that relying on simulation-based inference* leads students to internalize the second of these points.  I try not to over-emphasize making test decisions, as compared to assessing strength of evidence, but I do want students to know how to determine whether to reject a null hypothesis.

* See post #12, Simulation-based inference, part 1, here.

  1. The idea that statistical inference depends on random (or at least representative) samples from the population
  2. The idea that confidence intervals and hypothesis tests give consistent results
  3. The distinction between statistical significance and practical importance

Here’s a final set of three aspects of statistical inference for which I hope that students do not have to check their notes.  I’m not mentioning random assignment with the first one because my students have not yet studied inference for comparing two groups.  For the middle one, I want students to realize that when a hypothesis test rejects a hypothesized value for a parameter, then the corresponding confidence interval should not include that value.  And when the hypothesis test fails to reject the value, then the corresponding confidence interval should include that value.  I don’t expect students to know the subtleties involved here, for example that the test needs to be two-sided and that this doesn’t always hold exactly for inference about a proportion.  I just want the basic idea of this consistency to make sense and not require looking up.

Whew, this list is far longer than I anticipated when I began.  Remember that my students and I are only halfway through a two-course sequence!  I also strongly suspect that I’ve omitted several things that will cause me to shake my head vigorously when they come to me.

But also remember that my exams are open-notes, so my students can always look these things up.  But it would certainly save them a lot of time if these 40 items truly come as second-nature to them.  More importantly, I want them to know these things well enough to apply what they’ve learned far beyond their brief time in my course.

#72 Trade-offs

Making good decisions requires assessing trade-offs.  We encounter such situations frequently in everyday life as well as in professional settings.  As I am deciding what to do with myself at this very moment, I am weighing the trade-offs associated with writing this blog post and watching the Masters golf tournament.  If I watch golf, then I will have less time to write this post.  Its quality will suffer, and I will need to keep working on this post into Sunday evening.  Because I’m a morning person, that means that its quality will suffer even further.  But If I write this blog post now instead of watching golf, then I will miss out on a fun diversion that I look forward to every year.  What to do?  You could argue that I try to do a bit of both, watch golf with one side of my brain and write this post with the other.  But multi-tasking is not my strong suit.  Because this particular golf tournament only comes around once per year, I think I’ll focus on that for a while.  I’ll be back, I promise …

Okay, where was I?  While I was away, I realized that you are probably wondering: What does this have to do with teaching statistics?  I recently asked my students to complete an assignment based on the activity I presented in post #40, Back to normal (here).  This assignment has three goals:

  • The immediate goal is for students to develop their ability to perform fairly routine calculations from normal probability distributions, calculating both probabilities and percentiles. 
  • A secondary goal is to introduce students to the topic of classification problems.
  • The big-picture goal is to lead students to think about trade-offs and how decision-making often requires striking a balance between competing interests.

Here’s the assignment:

Suppose that a bank uses an applicant’s score based on some criteria to decide whether or not to approve a loan for the applicant.  Also suppose that these scores follow normal distributions, both for people who would repay to the loan and for those who would not:

  • Those who would repay the loan have a mean of 60 and standard deviation of 8;
  • Those who would not repay the loan have a mean of 40 and standard deviation of 12.

Consider this decision rule:

  • Approve a loan for applicants with a score above 50.
  • Deny the loan for applicants with a score of 50 or below.
  • a) Determine the z-score of the cut-off value 50 for each kind of applicant: those who would repay the loan and those who would not.  Show how to calculate these two z-scores by hand.  Also write a sentence interpreting each z-score.
  • b) Determine the probability that an applicant who would repay the loan is denied.  Also provide a shaded sketch.  (Feel free to use the applet here:
  • c) Determine the probability that an applicant who would not repay the loan is approved.  (Again provide a shaded sketch.)

Now consider changing the cut-off value in the decision rule.

  • d) Determine the cut-off value needed to decrease to 0.05 the probability that an applicant who would repay the loan is denied.  (Also report the z-score and provide a shaded sketch.)
  • e) For this new cut-off value, what is the probability that an applicant who would not repay the loan is approved?  (Again report the z-score and provide a shaded sketch.)
  • f) Comment on how these two error probabilities with the new cutoff value compare to their counterparts with the original cutoff value.

Now consider changing the cut-off value in the decision rule again.

  • g) Determine the cut-off value needed to decrease to 0.05 the probability that an applicant who would not repay the loan is approved?  (Again report the z-score and provide a shaded sketch.)
  • h) For this new cut-off value, what is the probability that an applicant who would repay the loan is denied.  (Again report the z-score and provide a shaded sketch.)
  • i) For each of the three cut-off values that have been considered, calculate the average of the two error probabilities.  Which cut-off rule is the best according to this criterion?

The following table displays all of the probabilities in this assignment:

Question (f) is the key one that addresses the issue of trade-offs.  I want students to realize that decreasing one of the two error probabilities has the inescapable consequence of increasing the other error probability.  Then question (i) asks students to make a decision that balances those trade-offs.

I think this assignment achieves its three goals to some extent.  My main concern is that many students struggle to see the big picture about trade-offs.  I think many students tend to adopt tunnel-vision, answering one question at a time without looking for connections between them.  This is especially true for students who find the meant-to-be-routine calculations to be challenging.

If you compare this assignment to the extensive activity described in post #40 (here), you’ll see that I left out a lot.  Why?  Because I had to assess trade-offs.  I give so many other quizzes and assignments, and the course goes by so quickly on the quarter system, that I thought assigning the full activity would overwhelm students.  In hindsight I do wish that I had asked two more questions in the assignment:

  • j) Suppose that you regard denying a loan to an applicant who would repay it as three times worse than approving a loan for someone who would not repay it.  For each of the three cut-off values, calculate a weighted average of the two error probabilities that assigns weights according to this criterion.  Which cut-off rule is the best?

I think this question could have helped students to realize that they need not consider two trade-offs to be equally valuable, that they can incorporate their own judgment and values into consideration.  I also think my students could have benefitted from more work with the concept of weighted average.

  • k) Now suppose (perhaps unrealistically) that you could change the two probability distributions of scores.  What two changes could you make that would enable both error probabilities to decrease?  (Hint: Think of one change about their means, another about their standard deviations.)

With this question I would want students to realize that providing more separation between the means of the score distributions would reduce both error probabilities.  Reducing the standard deviations of the score distributions would also have this desired effect.  I hope that my hint would not make this question too easy and eliminate students’ need to think carefully, but I worry that the question would be too challenging without the hint.  I may use a multiple-choice version of this question on the final exam coming up after Thanksgiving.

I also wonder whether I should have asked students to produce a graph of one error probability versus the other for many different cut-off values in the decision rule.  I have not used R with my business students, but I could have asked them to use Excel.  I have in mind something like the following R code and graph:

The issue of trade-offs also arises with other introductory statistics topics. My students learned about confidence intervals recently.  Here’s a favorite question of mine: Higher confidence is better than lower confidence, right?  So, why do we not always use 99.99% confidence intervals? 

The answer is that higher confidence levels produce wider confidence intervals.  Higher confidence is good, but wider intervals are bad.  In other words, there’s a trade-off between two desirable properties: high confidence and narrow intervals.

With a confidence interval for a population proportion, how many times wider is a 99.99% confidence interval than a 95% confidence interval?  The critical values are z* = 1.960 for 95% confidence, z* = 3.891 for 99.99% confidence.  This means that a 99.99% confidence interval is 3.891 / 1.960 ≈ 1.985 times wider than a 95% confidence interval.

How can a researcher achieve the best of both worlds – high confidence and a narrow interval?  The only way to achieve both is to use a very large sample size.  What are some trade-offs associated with a very large sample size?  Selecting a very large sample requires much more time, effort, and expense.  Also, increasing the sample size come with diminishing returns: You must quadruple the sample size in order to cut the interval’s width in half.

I’m tempted, but have never dared, to ask students to write an essay about how they have assessed trade-offs when making a decision of their own.  The COVID-19 crisis, which we are all trying to navigate as best we can, involves many, many trade-offs.  My students had to weigh trade-offs in deciding whether to live on campus this term, and they’ll have to do so again as they decide where to live next term.  They may have to evaluate trade-offs in deciding whether to go to their grandparents’ house for Thanksgiving dinner.  If those topics are too personal, they could also write about much less serious trade-offs, perhaps about their strategy for playing a board or card game, or deciding whether to have a salad or a cheeseburger for lunch. I could also invite students to write about trade-offs from another’s perspective, such as a mayor deciding whether to open or close schools during the COVID-19 crisis, or whether a football coach should “go for it” on fourth down*.

* This article (here) describes how analytics has led some football coaches to revise their conservative strategy on this question.

As you know, I was distracted by watching a golf tournament as I began writing this blog post.  While I was watching golf, I was also thinking about trade-offs.  Golfers have long debated whether it’s better to strive for distance or accuracy as the more important goal.  The trade-off is that you can hit the ball farther if you’re willing to sacrifice some accuracy.  On one hand, hitting the ball farther means that you’ll hit your next shot from closer to the hole.  But by giving up some accuracy, you’ll more often have to hit your next shot from the rough rather than from the fairway, so you’ll have less ability to control where your next shot goes.  On the other hand, prioritizing accuracy means achieving less distance.  With more accurate shots, you’ll more often hit your next shot from the smooth fairway where you can control your next shot better, but you’ll be farther from the hole and therefore diminish your chance to hit the next shot close.  Much of golf strategy involves navigating this trade-off.  The recent push toward analytics in sports has extended to golf, where statisticians gather and analyze lots of data to help players make decisions*.

* This article (here) from last week by golf writer Alan Shipnuck summarizes some of these developments.

And now, if you will excuse me, since the golf tournament is over and I have (ever-so-nearly) finished writing this blog post, I need to check on how my fantasy football team, the Domestic Shorthairs, is doing this week.

#71 An SBI quiz

During this past week, I introduced my students to simulation-based inference (SBI), as described in post #12, here.  I gave a follow-up quiz in our learning management system Canvas to assess how well they could apply what they learned to a study that they had not yet seen.  I give three of these application quizzes in a typical week, along with three quizzes that assess how well they followed the handout that we worked through*.  All of these quizzes consist of five questions that are auto-graded in Canvas.  I regard these quizzes as formative rather than summative, and I encourage students to help each other on the quizzes.

* Students could work through the handouts completely on their own, but most students either attend a live zoom session, during which I lead them through the handout, or watch videos that I prepare for each handout**. 

** For those of you who read about my ludicrous, comedy-of-errors experience with recording my first video (post #63, here), I am happy to report that I have recorded 83 more videos for my students since then.  Not many have gone smoothly, but all have gone much more smoothly than my first feeble attempts.

Writing auto-graded quizzes is a new experience for me.  For this blog post, I will present my auto-graded SBI quiz questions, describe my thinking behind each question, and discuss common student errors.  I will also discuss some questions that I did not ask, and I may very well second-guess some of my choices.  The quiz questions appear below in italics.

For the context in this quiz, I use a study that I described in an exam question presented in post #22, here.

Researchers presented young children (aged 5 to 8 years) with a choice between two toy characters who were offering stickers.  One character was described as mean, and the other was described as nice.  The mean character offered two stickers, and the nice character offered one sticker.  Researchers wanted to investigate whether children would tend to select the nice character over the mean character, despite receiving fewer stickers.  They found that 16 of the 20 children in the study selected the nice character.

1. What values would you enter for the inputs of a coin-tossing simulation analysis of this study?

  • Probability of heads
  • Number of tosses
  • Number of repetitions

I used the matching format in Canvas for this question.  The options presented for each of the three sub-parts were: 0.5, 0.8, 1, 10, 16, 20, and 10,000.  The correct answers are 0.5, 20, and 10,000, respectively.

As the sample proportion of children who selected the nice character, the value 0.8 makes a good option for the probability of heads.  Some students believe that the simulation is conducted with the sample value rather than the null-hypothesized value.  I chose the value 16 as a good distractor for the number of tosses, because it is the number of children in the sample who selected the nice character.  I threw in the values 1 and 10 for good measure.

2. Consider the following graph of simulation results:

Based on this graph, which of the following is closest to the p-value?

The options presented were: 0.005, 0.100, 0.500, 0.800.  I had to keep these options pretty far apart, because we cannot determine the p-value very precisely from the graph.

Students are to realize that the p-value is approximated by determining the proportion of repetitions that produced 16 or more heads.  Although we cannot approximate the p-value very accurately from this graph, we can see that obtaining 16 or more heads did not happen very often.  The closest option is 0.005.

I considered asking students to use an applet (here) to conduct a simulation analysis for themselves and report the approximate p-value.  But I wanted this question to focus on whether they could read a graph of simulation results correctly.

I also thought about asking students to indicate how to determine an approximate p-value from the graph of simulation results.  The correct answer would have been: count the number of repetitions that produce 16 or more heads, and then divide by the number of repetitions.  Some obvious incorrect options could have been to count the repetitions that produced 10 or more heads, or to count the number of repetitions that produced exactly 10 heads.  Perhaps that would have been better than the version I asked.  I am a bit concerned that some students might have answered my question correctly simply be selecting the smallest option presented for the p-value.  On the other hand, one of the two examples presented in the handout led to a large p-value close to 0.5, so I hope my students do not necessarily think that the smallest p-value will always be the correct answer.

3. Based on this simulation analysis, do the data from this study provide strong evidence that children have a genuine preference for the nice character with one sticker rather than the mean character with two stickers?  Why?

The options presented were:

  • Yes, because it is very unusual to obtain 16 or more heads
  • Yes, because the distribution follows a bell-shaped curve
  • Yes, because the distribution is centered around 10
  • No, because it is very unusual to obtain 16 or more heads
  • No, because the distribution follows a bell-shaped curve
  • No, because the distribution is centered around 10

I like this one.  This question directly addresses the reasoning process of simulation-based inference.   The correct answer is the first one listed here.  I think the distractors are fairly tempting, because some students focus on the shape or center of the distribution, rather than thinking about where the observed result falls in the distribution.  Those misconceptions are common and important to address.

You could fault me, I suppose, for not adding if the children actually had no preference after it is very unusual to obtain 16 or more heads to the end of the correct answer.  But I think omitting that from all of the options kept the question reasonable.  In hindsight perhaps I should have written the correct answer as: Yes, because the simulation rarely produced 16 or more heads.

4. The following graph pertains to the same simulation results, this time displaying the distribution of the proportion of heads:

Calculate the z-score for the sample proportion of children in the study who selected the nice character with one sticker.  Report your answer with one decimal place of accuracy.

This question calls for a numerical answer rather than multiple-choice.  The correct answer is: z = (.800 – .500) / .111 ≈ 2.7*.  I allowed an error tolerance of 0.05 for the auto-grading process, so as not to penalize students who ignored my direction to use one decimal place of accuracy in their answer.

* My students have not yet studied the general expression for the standard deviation of the sampling distribution of a sample proportion, so their only option is to use the standard deviation of the 10,000 simulated sample proportions, as report in the output.

This z-score calculation is not directly related to simulation-based inference, I suppose.  But I think z-scores are worth emphasizing*, and this also foreshadows the one-proportion z-test to come.

* See post #8, End of the alphabet, here.

5. Suppose that the study had found that 13 of 20 children selected the nice character with one sticker.  How would the p-value have changed, as compared to the actual result that 16 of 20 children selected that character?

The options presented here were: larger, smaller, no change.  The correct answer is larger*, because the p-value would entail repetitions that produced 13 or more heads, which will certainly be more than those that produced 16 or more heads.

* You may have noticed that I have always presented the correct answer first in this post, but Canvas shuffled the options for my students, so different students saw different orderings.

I considered asking how the strength of evidence would change, rather than how the p-value would change.  It’s certainly possible for a student to answer the p-value question correctly, without making the connection to strength of evidence.  But it’s also possible that a student could correctly answer about strength of evidence without thinking through what that means for the p-value.  In hindsight, I wish that I had asked both versions in one question, like this:

Suppose that the study had found that 13 of 20 children selected the nice character with one sticker.  How would the p-value have changed, as compared to the actual result that 16 of 20 children selected that character, and how would the strength of evidence that children genuinely prefer the nice character have changed?  [Options: larger p-value, stronger evidence; larger p-value, weaker evidence; smaller p-value, stronger evidence; smaller p-value, weaker evidence]

As I mentioned earlier, I confine myself to asking five questions on every quiz.  I like this consistency, and I hope students appreciate that too.  But I feel no such constraint with blog posts, so now I will present five other questions that I could have asked on this quiz, all based on the same study about children selecting toy characters.

6. What are the observational units and variable in this study?  I ask these questions very often in class*, and I also ask them fairly often on assessments.  This might have worked well in matching format, with options such as: children, toy characters, which character a child selected, number of children who selected nice character, proportion of children who selected nice character.

* See post #11, Repeat after me, here.

7. Which of the following describes the null model/hypothesis?  Options could have included:

  • that children have no genuine preference between these two characters,
  • that infants genuinely prefer the nice character with one sticker to the mean character with two stickers,
  • that 80% of all infants prefer the nice character with one sticker.

8. Which of the following graphs is based on a correct simulation analysis?

9. What does the p-value represent in this study?  Options could have included:

  • the probability that 16 or more children would have selected the nice character, if in fact children have no genuine preference between the two characters
  • the probability that 10 children would have selected the nice character, if in fact children have no genuine preference between the two characters
  • the probability that 10 children would have selected the nice character, if in fact children have a genuine preference for the nice character
  • the probability that children have no genuine preference between the two characters

10. How would the p-value change if the study had involved twice as many children, and the same proportion had selected the nice character with one sticker?  The options would be: smaller, larger, no change. Students would have needed to use the applet on this question, or else relied on their intuition, because we had not yet investigated the effect of sample size on p-value or strength of evidence.

The correct answers for these additional questions are: 6. children, which character a child selected; 7. no genuine preference; 8. the graph on the right, centered at 10 with a normal-ish shape; 9. the first option presented here; 10. smaller.

Confining myself to auto-graded questions on quizzes* is a new experience that requires considerable re-thinking of my assessment questions and strategies.  In this post I have given an example of one such quiz, on the topic of simulation-based inference.  I have also tried to provide some insights into my thought process behind these questions and the various answer options for multiple-choice ones.  I have also indicated some places where I think in hindsight that I could have asked better questions.

* Not all aspects of my students’ work are auto-graded.  I assign occasional investigation assignments, like the batch testing investigation that I wrote about in my previous blog post here, for which I provide a detailed rubric to a student grader.  On exams, I use a mix of auto-graded and open-ended questions that I grade myself, as I discussed in this post #66, First step of grading exams, here.

P.S. The study about children’s toy character selections can be found here.

#70 Batch testing, part 2

I recently asked my students to analyze expected values with batch testing for a disease, which I discussed in some detail in post #39, here.  Rethinking this scenario led me to ask some new questions that I had not asked in that earlier post.

I will first re-introduce this situation, present the basic questions and analysis that my students worked through, and then ask the key question that I wish I had asked previously.  If you’d like to skip directly to the new part, scroll down to the next occurrence of “key question.” As always, questions that I pose to students appear in italics.

Suppose that 12 people need to be given a blood test for a certain disease.  Assume that each person has a 10% chance of having the disease, independently from person to person.  Consider two different plans for conducting the tests:

  • Plan A: Give an individual blood test to each person.
  • Plan B: Combine blood samples from all 12 people into one batch; test that batch.
    • If at least one person has the disease, then the batch test result will be positive, and then all 12 people will need to be tested individually.
    • If nobody has the disease, then the batch test result will be negative, and no additional tests will be needed.

Let the random variable X represent the total number of tests needed with plan B (batch testing).

a) Determine the probability distribution of X. [Hint: List the possible values of X and their probabilities.]

Even with the hint, some of my students were confused about where to begin, so I tried to guide them through the implications of the two sub-bullets describing how batch testing works.

The possible values of X are 1 (if nobody has the disease) and 13 (if at least one person has the disease).  The probabilities are: Pr(X = 1) = Pr(nobody has the disease) = (.9)12 ≈ 0.2824 by the multiplication rule for independent events, and Pr(X = 13) = 1 – Pr(nobody has the disease) = 1 – (.9)12 ≈ 0.7176.  This probability distribution can be represented in the following table:

b) If you implement plan B once, what is the probability that the number of tests needed will be smaller than it would be with plan A?

This question really stumps some students.  Because plan A always requires 12 tests, the answer is simply: Pr(X < 12) ≈ 0.2824.  My goal is for students to realize that batch testing reduces the required number of tests only about one-fourth of the time, so this criterion does not reveal any advantage of batch testing.  Maybe I need to ask the question differently, or ask a different question altogether, to direct students’ attention to this point.

c) Determine the expected value of X.

This calculation is straightforward: E(X) = 1(.9)12 + 13(1 – .912) ≈ 9.61.

d) Interpret what this expected value means in this context.

My students quickly realize that I want them to focus on long-run average when they interpret expected value (see post #18, here).  But a challenging aspect of this is to describe what would be repeated a large number of times.  In this case: If the batch testing plan were applied for a very large number of groups of 12 people, then the long-run average number of tests needed would be very close to 9.61 tests.

e) Which plan – A or B – requires fewer tests, on average, in the long run?

Maybe I should have asked this differently, perhaps in terms of choosing between plan A and plan B.  The answer is that plan B is better in the long run, because it will require about 9.61 tests on average, compared to 12 tests with plan A.

Now consider a third plan:

  • Plan C: Randomly divide the 12 people into two groups of 6 people each.  Within each group, combine blood samples from the 6 people into one batch.  Test both batches.
    • As before, a batch will test positive only if at least one person in the group has the disease.
      • Any batch that tests positive requires individual testing for the 6 people in that group.
    • As before, a batch will test negative if nobody in the group has the disease. 
      • Any batch that tests negative requires no additional testing.

Let the random variable Y represent the total number of tests needed with plan C (batch testing on two sub-groups).

f) Determine the probability distribution of Y.

Analyzing plan C is more challenging than plan B, because there are more uncertainties involved.  I advise my students to start with the best-case scenario, proceed to the worst-case, and finally tackle the remaining case. The best case is that only 2 tests are needed, because nobody has the disease. The worst case is that 14 tests are needed (the original 2 batch tests plus 12 individual tests), because at least one person in each sub-group has the disease. The remaining case is that 8 tests are needed, because at least one person in one sub-group has the disease and nobody in the other sub-group has the disease.

The most straightforward probability to determine is Pr(Y = 2), because this is the probability that none of the 12 people have the disease.  This equals (.9)12 ≈ 0.2824, just as before.

The second easiest probability to calculate is Pr(Y = 14), which is the probability that both sub-groups have at least one person with the disease.  This probability is [1 – (.9)6] for each sub-group.  The assumption of independence gives that Pr(Y =14) = [1 – (.9)6]2 ≈ 0.2195.

At this point we could simply determine Pr(Y = 8) = 1 – Pr(Y = 2) – Pr(Y = 14) ≈ .4980.  But I encouraged my students to try to calculate Pr(Y = 8) directly and then confirm that the three probabilities sum to 1, as a way to check their work.  To do this, we recognize that Y = 8 when one of the sub-groups has nobody with the disease and the other sub-group has at least one person with the disease.  A common error is for students to neglect that there are two ways for this to happen, because either sub-group could be the one that is disease-free.  This gives: Pr(Y = 8) = 2 × [1 – (.9)6] × (.9)6 ≈ .4980.

The probability distribution of Y can therefore be represented in this table:

g) Determine the expected value of Y.

This calculation is straightforward: E(Y) = 2(.2824) + 8(.4980) + 14(.2195) ≈ 7.62 tests.

h) Write a sentence or two summarizing your findings, with regard to an optimal plan for minimizing how many tests will be needed in the long run.

Students who correctly determined the expected values realize that the best of these three plans is Plan C.  If this procedure is applied for a very large number of groups, then Plan C will result in an average of about 7.62 tests per group of 12 people.  This is smaller than the average number of tests needed with Plan B (9.61) or Plan A (12.00).

Now comes the key question that I did not address in my earlier post about batch testing: Can we do even better (in terms of minimizing the average number of tests needed in the long run) than using 2 sub-groups of 6 people?  I chose the number 12 here on purpose, because it lends itself to several more possibilities: 3 sub-groups of 4, four sub-groups of 3, and six sub-groups of 2.

We can imagine groans emanating from our students at this prospect.  But we can deliver them some good news: We do not need to determine the probability distributions for the number of tests in all of these situations.  We can save ourselves a lot of bother by solving one general case and then using properties of expected values.

i) Let W represent the number of tests needed when an arbitrary number of people (n) are to be tested in a batch.  Determine the probability distribution of W and expected value of W, as a function of n.

The possible values are simply 1 and (n + 1).  We can calculate Pr(W = 1) = Pr(nobody has the disease) = .9n.  Similarly, Pr(W = n + 1) = Pr(at least one person has the disease) = 1 – .9n.  The expected value is therefore: E(W) = (1 × .9n) + (n + 1) × (1 – .9n) = n + 1 – n(.9n).  This holds when n ≥ 2.

j) Confirm that this general expression gives the correct expected value for n = 12 people.

I encourage my students to look for ways to check their work throughout a complicated process. Plugging in n = 12 gives: E(W) = 12 + 1 – 12(.912) ≈ 9.61 tests. Happily, this is the same value that we determined earlier.

k) Use the general expression to determine the expected value of the number of tests with a batch of n = 6 people. 

This gives: E(W) = 6 + 1 – 6(.96) ≈ 3.81 tests

l) How does this compare to the expected value for plan C (dividing the group of 12 people into two sub-groups of 6) above?  Explain why this makes sense.

This question holds the key to our short-cut. This expected value of 3.81 is equal to one-half of the expected number of tests with plan C, which was 7.62 tests.  This is not a fluke, because we can express Y (the total number of tests with two sub-groups of 6) as Y = Y1 + Y2, where Y1 is the number of tests with the first sub-group of 6 people, and Y2 is the number of tests with the second sub-group of 6 people.  Properties of expected value then establish that E(Y1 + Y2) = E(Y1) + E(Y2).

This same idea will work, and save us considerable time and effort, for all of the other sub-group possibilities that we mentioned earlier.

m) Determine the expected value of the number of tests for three additional plans: three sub-groups of 4 people each, four sub-groups of 3 people each, and six sub-groups of 2 people each.  [Hint: Use the general expression and properties of expected value.]

With a sub-group of 4 people, the expected number of tests with one sub-group is: 4 + 1 – 4(.94) ≈ 2.3756.  The expected value of the number of tests with three sub-groups of 4 people is therefore: 3(2.3756) ≈ 7.13 tests.

With a sub-group of 3 people, the expected number of tests with one sub-group is: 3 + 1 – 3(.93) ≈ 1.813.  The expected value of the number of tests with four sub-groups of 3 people is therefore: 4(1.813) ≈ 7.25 tests.

With a sub-group of 2 people, the expected number of tests with one sub-group is: 2 + 1 – 2(.92) = 1.38.  The expected value of the number of tests with six sub-groups of 2 people is therefore: 6(1.38) = 8.28 tests.

n) Write a paragraph to summarize your findings about the optimal sub-group composition for batch-testing in this situation.

The following table summarizes our findings about expected values:

With a group of 12 people, assuming independence and a disease probability of 0.1 per person, the optimal sub-group composition is to have 3 sub-groups of size 4 people each.  This produces an expected value of 7.13 for the number of tests to be performed.  This is 39.6% fewer tests than the 12 that would have to be conducted without batch testing.  This is also 24.5% fewer tests than would be performed with just one batch.  (See post #28, here, for my pet peeve about misconceptions involving percentage differences.)

Let’s conclude with two more extensions of this batch testing problem:

o) How do you predict the optimal sub-group composition to change with a smaller probability that an individual has the disease?  Change the probability to 0.05 and re-calculate the expected values to test your prediction.

It makes sense that larger sub-groups would be more efficient with a more rare disease.  With p = 0.05, we obtain the following expected values for the total number of tests:

In this case with a more rare disease (p = 0.05), the optimal strategy is to divide the 12 people into two groups of 6 people each.  This results in 5.18 tests on average in the long run.

p) How would the optimal sub-group composition change (if at all) if there were twice as many people (24) in the group?

We can simply double the expected values above.  We also have new possibilities to consider: three sub-groups of size 8, and two sub-groups of size 12.  For the p = 0.05 case, this produces the same optimal sub-group size as before, 6 people per sub-group, as shown in the following table of  expected values:

Batch testing provides a highly relevant application of expected values for discrete random variables that can also help students to develop problem-solving skills. Speaking of relevance, you may have noticed that COVID-19 and coronavirus did not appear in this post until now.  I did not want to belabor this connection with my students, but I trust that they could not help but recognize the potential applicability of this technique to our current challenges.  I also pointed my students to an interactive feature from the New York Times here, an article in the New York Times here, and an article in Significance magazine here.

P.S. I recorded a video presentation of this batch testing for the College Board, which you can find here.

#69 More probability questions – correction

I often tell my students that I make mistakes in class on purpose as a teaching strategy, to encourage them to pay close attention, check my work regularly rather than simply copy what I say into their notes, and speak up when they notice something that they question.

This is partially true, but most of the mistakes that I make in class are, of course, genuine ones rather than purposeful.  I admit that I sometimes try to bluff my way through, with tongue firmly planted in cheek, claiming that my mistake had been intentional, an application of that teaching strategy.

Thanks very much to the careful blog reader who spotted a mistake of mine in today’s post.  In a follow-up discussion to the first example, I wrote: If the marginal percentages had been 28% and 43%, then the largest possible value for the intersection percentage would have been 28% + 43% = 71%.  This is not true, because the intersection percentage can never exceed either of the marginal percentages.  With marginal percentages of 28% and 43%, the largest possible value for the intersection percentage would be 28%. 

Perhaps I was thinking of the largest possible percentage for the union of the two events, which would indeed be 28% + 43% = 71%.  Or perhaps I was not thinking much at all when I wrote that sentence.  Or perhaps, just possibly, you might be so kind as to entertain the notion that I made this mistake on purpose, as an example of a teaching strategy, which I am now drawing to your attention?

#69 More probability questions

My students and I have spent the last three weeks studying probability*.  At the end of Friday’s class session, one of the students asked a great question.  Paraphrasing a bit, she asked: We can answer some of these questions by thinking rather than calculating, right?  I was delighted by her question and immediately replied: Yes, absolutely!  I elaborated that some questions call for calculations, so it’s important to know how to use probability rules and tools.  Those questions usually require some thinking as well as calculating.  But other questions ask you to think things through without performing calculations. Let me show you some of the questions that I have asked in this unit on probability**.

* This course is the first of a two-course introductory sequence for business students. 

** Kevin Ross’s guest post (#54, here) provided many examples of probability questions that do not require calculations.

My students saw the following questions on a quiz near the beginning of the probability unit:

1. Suppose that 78% of the students at a particular college have a Facebook account and 43% have a Twitter account.

  • a) Using only this information, what is the largest possible value for the percentage who have both a Facebook account and a Twitter account?  Describe the (unrealistic) situation in which this occurs.
  • b) Using only this information, what is the smallest possible value for the percentage who have both a Facebook account and a Twitter account?  Describe the (unrealistic) situation in which this occurs.

Even though these questions call for a numerical response, and can therefore be auto-graded, they mostly require thinking rather than plugging into a rule.  We had worked through a similar example in class, in which I encouraged students to set up a probability table to think through such questions.  The marginal probabilities given here produce the following table:

For part (a), students need to realize that the percentage of students with both kinds of accounts cannot be larger than the percentage with either account individually.  The largest possible value for that intersection probability is therefore 0.43, so at most 43% of students could have had both kinds of accounts.  If this were not an auto-graded quiz, I would have also asked for a description of this (unrealistic) scenario: that every student with a Twitter account also has a Facebook account.

Part (b) is more challenging.  A reasonable first thought is that the smallest possible probability could be 0.  But then Pr(Facebook or Twitter) would equal 0.78 + 0.43, and 1.21 is certainly not a legitimate probability.  That calculation points to the correct answer: Because Pr(Facebook or Twitter) cannot exceed 1, the smallest possible value for Pr(Facebook or Twitter) is 1.21 – 1 = 0.21.  At least 21% of students must have both kinds of accounts.  This unrealistic scenario requires that every student have a Facebook account or a Twitter account.

Notice that if the two given probabilities had not added up to more than 1, then the smallest possible value for the intersection probability would have been 0%.

The remaining three parts of the quiz provided students with a specific value (36%) for the percentage of students with both a Facebook and Twitter account and then asked for routine calculations:

  • c) What percentage of students have at least one of these accounts?
  • d) What percentage of students have neither of these accounts?
  • e) What percentage of students have one of these accounts but not both?

These percentages turn out to be 85%, 15%, and 49%, respectively.  The easiest way to determine these is to complete the probability table begun above:

The following questions appear on a practice exam that I gave my students to prepare for this coming Friday’s exam:

2. Suppose that a Cal Poly student is selected at random.  Define the events E = {randomly selected student is an Engineering major} and T = {randomly selected student is taking STAT 321 this term}.  For each of the following pairs of probabilities, indicate which probability is larger, or if the two probabilities are the same value.  You might want to consider the following information: A few thousand students at Cal Poly are Engineering majors. A few dozen students are taking STAT 321 this term.  Less than a handful of current STAT 321 students are not Engineering majors.

  • a) Pr(E), Pr(T)
  • b) Pr(E), Pr(E and T)
  • c) Pr(T), Pr(E or T)
  • d) Pr(E), Pr(E | T)
  • e) Pr(T | E), Pr(E | T)

These question requires only thinking, no calculations.  I purposefully avoided giving specific numbers at the end of this question.

Part (a) is an easy one, because there a lot more Engineering majors than there are STAT 321 students.  For part (b), students are to realize that (E and T) is a subset of E, so Pr(E) must be larger than Pr(E and T).  Similarly in part (c), T is a subset of (E or T), so Pr(E or T) must be larger than Pr(T).  For part (d), most STAT 321 students are Engineering majors, so Pr(E | T) is larger than Pr(E).  Finally, relatively few Engineering majors take STAT 321 in any one term, so Pr(E | T) is also larger than Pr(T | E).

My students completed a fairly long assignment that asked them to apply the multiplication rule for independent events to calculate various probabilities that a system functions successfully, depending on whether components are connected in series (which requires all components to function successfully) or in parallel (which requires at least one component to function successfully).  The final two parts of this assignment were:

3. Suppose that three components are connected in a system.  Two of the components form a sub-system that is connected in parallel, which means that at least one of these two components must function successfully in order for the sub-system to function successfully.  This sub-system is connected in series with the third component, which means that both the sub-system and the third component must function successfully in order for the entire system to function successfully.  Suppose that the three components function independently and that the probabilities of functioning successfully for the three components are 0.7, 0.8, and 0.9.  Your goal is to connect the system to maximize the probability that the system functions successfully.

  • i) Which two components would you select to form the sub-system, and which would you select to be connected in series with the sub-system?  Explain your choice.
  • j) Determine the probability that the system functions successfully with your choice.  Justify the steps of your calculation with the appropriate probability rules.

The first of these questions can be answered without performing calculations.  Because the component connected in series must function successfully in order for the system to function successfully, that component should be the most reliable one: the one with probability 0.9 of functioning successfully.  The remaining two components, with success probabilities 0.8 and 0.7, should be connected in parallel.

The calculation for part (j) certainly does require applying probability rules correctly.  The probability that this system functions successfully can be written as*: Pr[(C7 or C8) and C9].  The multiplication rule for independent events allows us to write this as: Pr(C7 or C8) × Pr(C9).  Applying the addition rule on the first term gives: [Pr(C7) + Pr(C8) – Pr(C7 and C8)] × Pr(C9).  Then one more application of the multiplication rule gives: [Pr(C7) + Pr(C8) – Pr(C7) × Pr(C8)] × Pr(C9).  Plugging in the probability values gives: [0.7 + 0.8 – 0.7×0.8] × 0.9, which is 0.846. 

* I’m hoping that my notation here will be clear without my having to define it.  I consider this laxness on my part a perk of blog writing as opposed to more formal writing.

Notice that a student could have avoided thinking through the answer to (i) by proceeding directly to (j) and calculating probabilities for all possible arrangements of the components.  I do not condone that strategy, but I do encourage students to answer probability questions in multiple ways to check their work.  The other two probabilities (for the system functioning successfully) turn out to be 0.776 if the 0.8 probability component is connected in series and 0.686 if the 0.7 probability component is connected in series.

Finally, here’s the in-class example that prompted my student’s question at the top of this blog post:

4. Suppose that Zane has a 20% chance of earning a score of 0 and an 80% chance of earning a score of 5 when he takes a quiz.  Suppose also that Zane must choose between two options for calculating an overall quiz score: Option A is to take one quiz and multiply the score by 10, Option B is to take ten (independent) quizzes and add their scores.

  • a) Which option would you encourage Zane to take?  Explain.
  • b) Which option do you suspect has a larger expected value, or do you suspect that the expected values will be the same?
  • c) Use properties of expected value to determine the expected value of his overall score with each option.  Comment on how they compare.
  • d) Which option do you suspect has a larger standard deviation, or do you suspect that the standard deviations will be the same?
  • e) Use properties of variance to determine the standard deviation of his overall score with each option.  Comment on how they compare.
  • f) If Zane’s goal is to maximize his probability of obtaining an overall score of 50 points, which option should he select?  Explain.
  • g) Calculate the probability, for each option, that Zane scores 50 points.  Comment on how they compare.
  • h) The following graphs display the probability distributions of Zane’s overall quiz score with these two options.  Which graph goes with which option?  Explain.

The key idea here is that multiplying a single quiz score by 10 is a much riskier, all-or-nothing proposition than adding scores for 10 independent quizzes.  A secondary goal is for students to learn how to apply rules of expected values and variances to multiples and sums of random variables.

The expected value of Zane’s score on a single quiz is 4.0, and the standard deviation of his score on a single quiz is 2.0.  The expected value of the random variable (10×Z) is the same as for the random variable (Z1 + Z2 + … + Z10), namely 40.0 quiz points.  This means that neither option is better for Zane in terms of long-run average. 

But this certainly does not mean that the two options yield identical distributions of results.  The variance of (10×Z) is 102 × 4.0 = 400, so the standard deviation is 20.0.  The variance for (Z1 + Z2 + … + Z10) is much smaller: 10 × 4.0 = 40, so the standard deviation is approximately 6.325. 

Zane has an 80% chance of obtaining an overall quiz score of 50 with option A, because he simply needs to score a 5 on one quiz.  With option B, he only achieves a perfect overall score of 50 if he earns a 5 on all 10 quizzes, which has probability (0.8)10 ≈ 0.107.

The graph on the left above shows the probability distribution for option B, and the graph on the right corresponds to option A.  The graphs reveal the key idea that option A is all-or-nothing, while option B provides more consistency in Zane’s overall quiz score.

The great mathematician Laplace reportedly said that “probability theory is nothing but common sense reduced to calculations.”  I wish I had thought quickly enough on my feet to mention this in response to my student’s comment in class on Friday.  I’ll have to settle for hoping that my probability questions lead students to develop a habit of mind to think clearly and carefully about randomness and uncertainty, along with the ability to perform probability calculations.

#68 Knowing or guessing?

I told my students at the beginning of our last class session that I was especially excited about class that day for several reasons:

  1. It was a Friday.
  2. We were about to work through our thirteenth handout of the term, a lucky number.
  3. The date was October 16, the median day for the month of October.
  4. We had reached the end of week 5 of our 10-week quarter, the halfway point.
  5. The topic for the day was my favorite probability rule, in fact my favorite mathematical theorem: Bayes’ rule.

The first two examples that we worked through concerned twitter use and HIV testing, as described in post #10, My favorite theorem, here.

The third and final example of the day presented this scenario: Suppose that Jasmine* has a 70%** chance of knowing (with absolute certainty) the answer to a randomly selected question on a multiple-choice exam.  When she does not know the answer, she guesses randomly among the five options. 

* I had always used the name Brad with this made-up example.  But I realized that I had used an example earlier in the week with the names Allan, Beth, Chuck, Donna, and Ellen, so I thought that I should introduce a bit of diversity into the names of fictional people in my made-up probability examples.  I did a google search for “popular African-American names” and selected Jasmine from the list that appeared.

** When I first rewrote this example with Jasmine in place of Brad, my first thought was to make Jasmine a stronger student than Brad, so I wrote that she has an 80% rather than a 70% chance of knowing the answer for sure.  Later I realized that this change meant that the value 20% was being used for the probability of her guessing and also for the probability of her answering correctly given that she is guessing.  I wanted to avoid this potential confusion, so I changed back to a 70% chance of Jasmine knowing the answer.

a) Before we determine the solution, make a prediction for the probability that Jasmine answers a randomly selected question correctly.  In other words, make a guess for the long-run proportion of questions that she would answer correctly.

I hope students realize that this probability should be a bit larger than 0.7.  I want them to reason that she’s going to answer 70% correctly based on her certain knowledge, and she’s also going to answer some correctly when she’s guessing just from blind luck.  I certainly do not expect students to guess the right answer, but it’s not inconceivable that some could reason that she’ll answer correctly on 20% of the 30% that she guesses on, which is another 6% in addition to the 70% that she knows for sure, so her overall probability of answering correctly is 0.76.

Next I ask students to solve this with a table of counts for a hypothetical population, just as we did for the previous two examples (again see post #10, here).  This time I only provide them with the outline of the table rather than giving row and column labels.  b) Fill in the row and column labels for the table below:

To figure out what labels to put on the rows and columns, I remind students that the observational units here are 100 multiple choice questions, and they need to think about the two variables that we record about each question.  It takes most students a little while to realize that the two variables are: 1) whether Jasmine knows the answer or guesses, and 2) whether Jasmine answers the question correctly or not.  This leads to:

c) Fill in the table of counts for a hypothetical population of 100 questions.  We proceed through the following calculations:

  1. Jasmine will know the answer for 70% of the 100 questions, which is 70.
  2. She will guess at the answer for 100 – 70 = 30 questions.
  3. For the 70 questions where she knows the answer, she will correctly answer all 70, leaving 0 that she will answer incorrectly.
  4. For the 30 questions on which she guesses, we expect her to answer correctly on one-fifth of them, which is 6.
  5. That leaves 30 – 6 = 24 questions for which she will guess and answer incorrectly.
  6. The column totals are therefore 76 correctly answered questions and 24 incorrect.

The completed table is shown here:

d) Use the table to report the probability that Jasmine answers a randomly selected question correctly.  This can read from the table to be: Pr(correct) = 76/100 = 0.76.

e) Show how this unconditional probability (of answering a randomly selected question correctly) can be calculated directly as a weighted average of two conditional probabilities.  This is more challenging for students, but I think the idea of weighted average is an important one.  I want them to realize that the two conditional probabilities are: Pr(correct | know) = 1.0 and Pr(correct | guess) = 0.2.  The weights attached to these are the probabilities of knowing and of guessing in the first place: Pr(know) = 0.7 and Pr(guess) = 0.3.  The unconditional probability of answering correctly can be expressed as the weighted average 0.7×1.0 + 0.3×0.2 = 0.76.

f) Determine the conditional probability, given that Jasmine answers a question correctly, that she actually knows the answer.  Some students think at first that this conditional probability should equal one, but they realize their error when they are asked whether it’s possible to answer correctly even when guessing.  Returning to the table, this conditional probability is calculated to be: 70/76 ≈ 0.921. 

g) Interpret this conditional probability in a sentence.  Jasmine actually knows the answer to about 92.1% of all questions that she answers correctly in the long run.

h) Show how to calculate this conditional probability directly from Bayes’ rule.  The calculation is: Pr(know | correct) = [Pr(correct | know) × Pr(know)] / [Pr(correct | know) × Pr(know) + Pr(correct | guess) × Pr(guess)] = (1×0.7) / (1×0.7 + 0.2×0.3) = 0.70 / 0.76 ≈ 0.921.  I try to impress upon students that even though this calculation looks more daunting with the formula than from filling in the table, the calculations are exactly the same, as seen by our ending up with 0.70/0.76 from the formula and 70/76 from the table.  I also emphasize that I think the table provides an effective and understandable way to organize the calculations.

Here’s a fun extension of this example: Continue to suppose that Jasmine has a 70% chance of knowing (with absolute certainty) the answer to a randomly selected question on a multiple-choice exam.  But now there’s also a 20% chance that she can eliminate three incorrect options, and then she guesses randomly between the remaining two options, one of which is correct.  For the remaining 10% chance, she has no clue and so guesses randomly among all five options.

i) Before conducting the analysis, do you expect the probability that she answers a question correctly to increase, decrease, or remain the same?  Explain.  Then do the same for the conditional probability that she knows the answer given that she answers correctly.

Most students have correct intuition for the first of these questions: If Jasmine can eliminate some incorrect options, then her probability of answering correctly must increase.  The second question is more challenging to think through: Now that she has a better chance of guessing the correct answer, the conditional probability that she knows the answer, given that she answer correctly, decreases.

j) Modify the table of hypothetical counts to determine these two probabilities.  Students must first realize that the table now needs three rows to account for Jasmine’s three levels of knowledge.  The completed table becomes:

The requested probabilities are: Pr(correct) = 82/100 = 0.82 and Pr(know | correct) = 70/82 ≈ 0.854.  Jasmine’s ability to eliminate some incorrect options has increased her probability of answering correctly by six percentage points from 76% to 82%.  But our degree of belief that she genuinely knew the answer, given that she answered correctly, has decreased by a bit more than six percentage points, from 92.1% to 85.4%.

I confess that I did not have time to ask students to work through this extension during Friday’s class.  I may give it as an assignment, or as a practice question for the next exam, or perhaps as a question on the next exam itself.

I have mentioned before that I give lots and lots of quizzes in my courses (see posts #25 and 26, here and here).  This is even more true in my first-ever online course.  I generally assign three handout quizzes and three application quizzes per week.  The handout quiz aims to motivate students to work through the handout, either by attending a live zoom session with me, or on their own, or by watching a video that I prepare for each handout.  The application quiz asks students to apply some of the topics from the handout to a new situation.  I also occasionally assign miscellaneous quizzes.  With regard to Bayes’ rule, I have asked my students to watch a video (here) that presents the idea behind Bayes’ rule in an intuitive and visually appealing way.  I wrote a miscellaneous quiz to motivate students to watch and learn from this video.

The author of this video, Grant Sanderson, argues that the main idea behind Bayes’ rule is that “evidence should not determine beliefs but update them.”  I think the Jasmine example of knowing versus guessing can help students to appreciate this viewpoint.  We start with a prior probability that Jasmine knows the answer to a question, and then we update that belief based on the evidence that she answers a question correctly.  Do I know with absolute certainty that this example helps students to understand Bayes’ rule?  Of course not, but I like the example anyway.  More to the point, the evidence of my students’ reactions and performances on assessments has not persuaded me to update my belief in a pessimistic direction.

#67 Interviews

One of my favorite professional activities has been interviewing statistics teachers and statistics education researchers for the Journal of Statistics Education.  I have conducted 26 such interviews for JSE over the past ten years.  I have been very fortunate to chat with some of the leaders in statistics education from the past few decades, including many people who have inspired me throughout my career.  I encourage you to take a look at the list and follow links (here) to read some of these interviews.

Needless to say, I have endeavored to ask good questions in these interviews.  Asking interview questions is much easier than answering them, so I greatly appreciate the considerable time and thoughtfulness that my interview subjects have invested in these interviews.  I hope that my questions have provided an opportunity to:

1. Illuminate the history of statistics education, both in years recent and back a few decades.  A few examples:

  • Dick Scheaffer describes how the AP Statistics program began. 
  • Mike Shaughnessy talks about how NCTM helped to make statistics more prominent in K-12 education. 
  • Chris Franklin and Joan Garfield discuss how ASA developed its GAISE recommendations for K-12 and introductory college courses. 
  • Jackie Dietz describes the founding of the Journal of Statistics Education
  • Dennis Pearl explains how CAUSE (Consortium for the Advancement of Undergraduate Statistics Education) came to be.
  • George Cobb describes his thought processes behind his highly influential writings about statistics education.
  • Nick Horton shares information about the process through which ASA developed guidelines for undergraduate programs in statistical science.
  • David Moore, Roxy Peck, Jessica Utts, Ann Watkins, and Dick De Veaux talk about how their successful textbooks for introductory statistics came about.

2. Illustrate different pathways into the field of statistics education.  Many of these folks began their careers with statistics and/or teaching in mind, but others started or took a detour into engineering or physics or psychology or economics.  Some even studied fields such as dance and Russian literature.

3. Indicate a variety of ways to contribute to statistics education.  Some interviewees teach in high schools, others in two-year colleges.  Some teach at liberal arts colleges, others in research universities.  Some specialize in teaching, others in educational research.  All have made important contributions to their students and colleagues.

4. Provide advice about teaching statistics and for pursuing careers in statistics education.  My last question of every interview asks specifically for advice toward those just starting out in their careers.  Many of my other questions throughout the interviews solicit suggestions on a wide variety of issues related to teaching statistics well.

5. Reveal fun personal touches.  I have been delighted that my interviewees have shared fun personal tidbits about their lives and careers.  Once again, a few examples:

  • George Cobb describes his experience as the victim of an attempted robbery, which ended with his parting company on good terms with his would-be assailant.
  • David Moore tells of losing an annual bet for 18 consecutive years, which required him to treat his friend to dinner at a restaurant of the friend’s choosing, anywhere in the world.
  • Ron Wasserstein shares that after he and his wife raised their nine children, they adopted two ten-year-old boys from Haiti.
  • Deb Nolan mentions a dramatic career change that resulted from her abandoning plans for a New Year’s Eve celebration.
  • Joan Garfield reveals that she wrote a memoir/cookbook and her life and love of food.
  • Dennis Pearl mentions a challenge that he offers to his students, which once ended with his delivering a lecture while riding a unicycle.
  • Chris Franklin relates that her favorite way to relax is to keep a personal scorebook at a baseball game.
  • Larry Lesser shares an account of his epic contest on a basketball court with Charles Barkley.

My most recent interview (here) is with Prince Afriyie, a recent Cal Poly colleague of mine who now teaches at the University of Virginia.  Prince is near the beginning of his teaching career as a college professor, and his path has been remarkable.  He started in Ghana, where he was inspired to study mathematics by a teacher whom he referred to as Mr. Silence.  While attending college in Ghana, Prince came to the United States on a summer work program; one of his roles was a paintball target at an amusement park in New Jersey.  Serendipity and initiative enabled Prince to stay in the United States to complete his education, with stops in Kentucky, Indiana, and Pennsylvania on his way to earning a doctorate in statistics.  Throughout his education and now into his own career, Prince has taught and inspired students, as he was first inspired by Mr. Silence in his home country.  Prince supplies many fascinating details about his inspiring journey in the interview.  I also asked Prince for his perspective on the two world-changing events of 2020 – the COVID-19 pandemic and the widespread protests for racial justice.

As I mentioned earlier, I conclude every interview with a request for advice aimed at those just beginning their career in statistics education.  Jessica Utts paid me a very nice compliment when she responded that teachers who read these interviews might benefit from asking themselves some of the more general questions that I ask of my interviewees.  Here are some questions that I often ask, which may lead to productive self-reflection:

  • Which came first – your interest in statistics or your interest in education?
  • What were you career aspirations at age 18?
  • What have you not changed about your teaching of statistics over the years?
  • On what do you pride yourself in your teaching?
  • What do you regard as the most challenging topic for students to learn, and how you approach this topic?
  • What is your favorite course to teach, and why?
  • In this time of data science, are you optimistic or pessimistic about the future of statistics?
  • What do you predict as the next big thing in statistics education?
  • What advice do you offer for those just beginning their career in statistics education?

You might also think about how you would answer two fanciful questions that I often ask for fun:

  • If time travel were possible, and you could travel to the past or future without influencing the course of events, what point in time would you choose?  Why?
  • If I offer to treat you and three others to dinner anywhere in the world, with the condition that the dinner conversation would focus on statistics education, whom would you invite, and where would you dine?

P.S. If you have a chance to read some of these interviews, I would appreciate hearing your feedback (here) on questions such as:

  • Who would you like me to interview in the near future?
  • What questions would you like me to ask?
  • Would you prefer shorter interviews?
  • Would you prefer to listen to interviews on a podcast?

P.P.S. For those wondering if I graded my exams last week after finally concluding the all-important procrastination step (see post #66, First step of grading exams, here): Thanks for asking, and I happily report that I did.

#66 First step of grading exams

I gave my first exam of the term, my first online exam ever, this past Friday.  As I sat down to grade my students’ responses for the first time in almost sixteen months, I realized that I had almost forgotten the crucial first step of grading exams: Procrastinate!

I have bemoaned the fact that I have so much less time available to concentrate on this blog now that I have returned to full-time teaching, as compared to last year while I was on leave.  So, what better way to procrastinate from my grading task than by engaging in the much more enjoyable activity of writing a blog post? 

What should I write about?  That’s easy: I will tell you a bit about the exam whose grading I am neglecting at this very moment.

Students took this exam through Canvas, our learning management system*.  This is a first for me, as my students in previous years took exams with paper and pencil.  I included a mix of questions that were auto-graded (multiple choice and numerical answer) and free-response questions that I will grade after I finish the all-important first step of procrastinating.  Roughly two-thirds of the points on the exam were auto-graded.  I wrote several different versions of many questions in an effort to discourage cheating.  Students had 90 minutes to complete the exam, and they were welcome to select any continuous 90-minute period of time between 7am and 7pm.  Students were allowed to use their notes.  Topics tested on the exam including basic ideas of designing studies and descriptive statistics. 

* In post #63 (My first video, here), I referred to Canvas as a course management system.  Since then I realized that I was using an antiquated term, and I have been looking for an opportunity to show that I know the preferred term is now learning management system.

Some of the questions that I asked on this exam appear below (in italics):

1. Suppose that the observational units in a study are the national parks of the United States.  For each of the following, indicate whether it is a categorical variable, a numerical variable, or not a variable.

  • the area (in square miles) of the national park
  • whether or not the national park is in California
  • the number of national parks that are to the east of the Mississippi River
  • whether there are more national parks to the east of the Mississippi River than to the west of the Mississippi River
  • the number of people who visited the national park in September of 2020

I give my students lots of practice with this kind of question (see post #11, Repeat after me, here), but some continue to struggle with this.  Especially challenging is noticing the ones that are not variables for these observational units (parts c and d).  Each student saw one of four variations on this question.  The observational units in the different version were patients who visited the emergency room at the local hospital last week, the commercial flights that left the San Luis Obispo airport last month, and customers at a local In-n-Out fast food restaurant on a particular day.  I posed this as a “matching” question in Canvas, where each of the five parts had the same three options available.

2. Suppose that the ten players on basketball team A have an average height of 76 inches, and the ten players on basketball team B have an average height of 80 inches.  Now suppose that one player leaves team A to join team B, and one player leaves team B to join team A.  How would the average heights of the two teams change?  The options that I presented were: No change, Both averages would increase, Both averages would decrease, The average would increase for A and decrease for B, The average would decrease for A and increase for B, It is impossible to say without more information.

The correct option is the last one: It is impossible to say without more information.  My goal here was for students to understand that players’ heights vary on both teams, so we cannot state any conclusions about how the averages would change without knowing more about the heights of the individual players who changed teams. 

3. San Diego State’s admission rate for Fall 2019 was 34.13%, compared to 28.42% for Cal Poly – SLO’s.  Determine the percentage difference between these admission rates.  In other words, San Diego State’s admission rate was higher than Cal Poly – SLO’s by ___ %.  Enter your answer as a number, with two decimal places of accuracy.  Do not enter the % symbol.

As I mentioned throughout post #28 (A pervasive pet peeve, here), I emphasize how a difference in proportions is not equivalent to a percentage difference.  This question assessed whether students took my emphasis to heart.  Each student answered one of four versions of this question, with different campuses being compared.  I obtained the data on admission rates from the dashboard here.

4. A series of questions referred to the following graph from a recent Gallup study (here):

The most challenging question in this series was a very basic one: How many variables are represented in this graph?  The correct answer is 2, race and preference for how much time police spend in the area.  The other options that I presented were 1, 3, 4, and 12.

5. Another series of questions was based on this study (available here): Researchers surveyed 120 students at Saint Lawrence University, a liberal arts college with about 2500 students in upstate New York.  They asked students whether or not they have ever pulled an all-nighter (stayed up all night studying).  Researchers found that students who claimed to have never pulled an all-nighter had an average GPA (grade point average) of 3.1, compared to 2.9 for students who claimed to have pulled an all-nighter.  Some basic questions included identifying the type of study, explanatory variable, and response variable.  These led to a question about whether a cause-and-effect conclusion can legitimately be drawn from this study, with a follow-up free-response question* asking students to explain why or why not.

* Oh dear, I just reminded myself of the grading that I still need to do.  This procrastination step is fun but not entirely guilt-free.

Some other free-response questions waiting for me to grade asked students to:

6. Create a hypothetical example in which IQR = 0 and the mean is greater than the median.  I think this kind of question works well on an online exam.  Different students should give different responses, so I hope this question encourages independent thinking and discourages cheating.  (See post #31, Create your own example, part 1, here, for many more questions of this type.)

7. Write a paragraph comparing and contrasting the distribution of money winnings in 2019 on three professional golf tours – men’s, women’s, and senior men’s, as displayed in the boxplots:

I am looking for students to compare center, variability, and shape across the three distributions.  They should also comment on outliers and relate their comments to the context.

8. Describe and explain the oddity concerning which hospital performed better, in terms of patients experiencing a complete recovery, for the data shown in the following tables of counts:

I expect this to be one of the more challenging questions on the exam.  Students need to calculate correct proportions, comment on the oddity that Hospital A does worse overall despite doing better for each condition, and explain that Hospital A sees most of the patients in poor condition, who are less likely to experience a full recovery than those in fair condition.

Writing my exam questions in Canvas, and preparing several versions for many questions, took considerably more time than my exam writing in the past.  But of course Canvas has already saved me some time by auto-grading many of the questions.  I should also be pleased that Canvas will also add up students’ scores for me, but I always enjoyed that aspect of grading, largely because it was the last part and provided a sense of completion and accomplishment.

Hmm, I probably should not be imagining an upcoming sense of completion and accomplishment while I am still happily immersed in the procrastination step of the exam-grading process.  I must grudgingly accept that it’s time for me to proceed to step two.  If only I could remember what the second step is …

#65 Matching variables to graphs

On Friday of last week I asked my students to engage with an activity in which I presented them with these seven graphs:

I’m sure you’ve noticed that these graphs include no labels or scales on the axes.  But you can still discern some things about these seven distributions even without that crucial information.  I told my students the seven variables whose distributions are displayed in these graphs:

  • (A) point values of letters in the board game Scrabble
  • (B) prices of properties on the Monopoly game board
  • (C) jersey numbers of Cal Poly football players
  • (D) weights of rowers on the U.S. men’s Olympic team
  • (E) blood pressure measurements for a sample of healthy adults
  • (F) quiz percentages for a class of students (quizzes were quite straight-forward)
  • (G) annual snowfall amounts for a sample of cities taken from around the U.S.

But I did not tell students which variable goes with which graph.  Instead I asked them to work in groups* with these instructions: Make educated guesses for which variable goes with which graph.  Be prepared to explain the reasoning behind your selections.

* This being the year 2020, the students’ groups were breakout rooms in Zoom.

Before I invited the students to join breakout rooms, I emphasized that it’s perfectly fine if they know nothing about Scrabble or Monopoly or rowing or even snowfall*.  For one thing, that’s why they’re working with a group.  Maybe they know about some of these things and a teammate knows about others.  For another thing, I do not expect every group to match all seven pairs perfectly, and this activity is not graded.

* Most of my students are natives of California, and some have never seen snowfall.

I think you can anticipate the next sentence of this blog post: Please take a few minutes to match up the graphs and variables for yourself before you read on*.

* Don’t worry, I do not expect you to get them all right, and remember – this is not for a grade!

Also before I continue, I want to acknowledge that I adapted this activity from Activity-Based Statistics, a wonderful collection based on an NSF-funded project led by Dick Scheaffer in the 1990s.  This variation is also strongly influenced by Beth Chance’s earlier adaptations of this activity, which included generating the graphs from data collected from her students on various variables.

I only gave my students 5-6 minutes to discuss this in their breakout rooms.  When they came back to the main Zoom session, I asked for a volunteer to suggest one graph/variable pair that they were nearly certain about, maybe even enough to wager tuition money.  The response is always the same: Graph #4 displays the distribution of football players’ jersey numbers.  I said this is a great answer, and it’s also the correct answer, but then I asked: What’s your reasoning for that?  One student pointed out that there are no repeated values, which is important because every player has a distinct jersey number.  Another student noted that there are a lot of dots, which is appropriate because college football teams have a lot of players.

Next I asked for another volunteer to indicate a pairing for which they are quite confident, perhaps enough to wager lunch money.  I received two different answers to this.  In one session, a student offered that graph #1 represents the quiz percentages.  What’s your reasoning for that?  The student argued that quizzes were generally straight-forward, so there should be predominatly high scores.  The right side of graph #1 could be quiz percentages in the 80s and 90s, with just a few low values on the left side. 

In the other session, a student suggested that graph #2 goes with point values of letters in Scrabble.  What’s your reasoning for that?  The student noticed that the spacing between dots on the graph is very consistent, so the values could very well be integers.  It also makes sense that the leftmost value on the graph could be 1, because many letters are worth just 1 point in Scrabble.  This scale would mean that the large values on the right side of the graph are 8 (for 2 letters) and 10 (also for 2 letters).  Another student even noted that there are 26 dots in graph #2, which matches up with 26 letters in the alphabet.

When I asked for another volunteer, a student suggested that graph #7 corresponds to Monopoly prices.  What’s your reasoning for that?  The student commented that Monopoly properties often come in pairs, and this graph includes many instances of two dots at the same value.  Also, the distance between the dots is mostly uniform, suggesting a common increment between property prices.  I asked about the largest value on this graph, which is separated a good bit from the others, and a student responded that this dot represents Boardwalk.

After those four variables and graphs were matched up, students got much quieter when I asked for another volunteer.  I wish that I had set up a Zoom poll in advance to ask them to express their guesses for the rest, but I did not think of that before class.  Instead I asked for a description of graph #3.  A student said that there are a lot of identical values on the low end, and then a lot of different values through the high end.  When I asked about which variable that pattern of variation might make sense for, a student suggested snowfall amounts.  What’s your reasoning for that?  The student wisely pointed out that I had said that the cities were taken from around the U.S., so that should include cities such as Los Angeles and Miami that see no snow whatsoever.

Then I noted that the only graphs left were #5 and #6, and the only variables remaining were blood pressure measurements and rower weights.  I asked for a student to describe some differences between these graphs to help us decide which is which.  This is a hard question, so I pointed out that the smallest value in graph #6 is considerably smaller than all of the others, and there’s also a cluster of six dots fairly well separated from the rest in graph #6.  One student correctly guessed that graph #6 displays the distribution of rower weights.  What’s your reasoning for that?  The student knew enough about rowing to say that one member of the team calls out the instructions to help the others row in synch, without actually rowing himself.  Why does the team want that person to be very light?  Because he’s adding weight to the boat but not helping to row!

That leaves graph #5 for the blood pressure measurements.  I suggested that graph #5 is fairly unremarkable and that points are clustered near the center more than on the extremes.

You might be wondering why I avoided using the terms skewness, symmetry, and even outlier in my descriptions above.  That’s because I introduced students to these terms at the conclusion of this activity.  Then I asked students to look back over the graphs and: Identify which distributions are skewed to the left, which are skewed to the right, and which are roughly symmetric.  I gave them just three minutes to do this in the same breakout rooms as before.  Some students understandably confused skewed to the left and skewed to the right at first, but they quickly caught on.  We reached a consensus as follows:

  • Skewed to the left: quiz percentages (sharply skewed), rower weights (#1, #6)
  • Skewed to the right: Scrabble points, snowfall amounts (#2, #3)
  • Symmetric (roughly): jersey numbers, blood pressure measurements, Monopoly prices (#4, #5, #7)

I admitted to my students that while I think this activity is very worthwhile, it’s somewhat contrived in that we don’t actually start a data analysis project by making guesses about what information a graph displays.  In practice we know the context of the data that we are studying, and we produce well-labelled graphs that convey the context to others.  Then we examine the graphs to see what insights they provide about the data in context.

With that in mind, I followed the matching activity with a brief example based on the following graph of predicted high temperatures for cities around California, as I found them in my local newspaper (San Luis Obispo Tribune) on July 8, 2012:

I started with some basic questions about reading a histogram, such as what temperatures are contained in the rightmost bin and how many cities had such temperatures on that date.  Then I posed three questions that get to the heart of what this graph reveals:

  • What is the shape of this distribution?
  • What does this shape reveal about high temperatures in California in July?
  • Suggest an explanation for the shape of this distribution, using what you know about the context.

Students responded that the temperature distribution displays a bimodal shape, with one cluster of cities around 65-80 degrees and another cluster from about 90-100 degrees.  This reveals that California has at least two distinct kinds of locations with regard to high temperatures in July. 

For the explanation of this phenomenon, a student suggested that there’s a split between northern California and southern California.  I replied that this was a good observation, but I questioned how this split would produce the two clusters of temperature values that we see in the graph.  The student quickly followed up with a different explanation that is spot-on: California has many cities near the coast and many that are inland.  How would this explain the bimodality in the graph?  The student elaborated that cities near the coast stay fairly cool even in July, while inland and desert cities are extremely hot.

My students and I then worked through three more examples to complete the one-hour session.  Next I showed them the following boxplots of daily high temperatures in February and July of 2019 for four cities*:

* I discuss these data in more detail in post #7, Two dreaded words, part 2, here.

The students went back to their breakout rooms with their task to: Arrange these four cities from smallest to largest in terms of:

  • center of February temperature distributions;
  • center of July temperature distributions;
  • variability of February temperature distributions; and
  • variability of July temperature distributions

After we discussed their answers and reached a consensus, I then briefly introduced the idea of a log transformation in the context of closing prices of Nasdaq-100 stocks on September 15, 2020:

Finally, we discussed the example of cancer pamphlets’ readability that I described in post #4, Statistics of illumination, part 2, here.

As you can tell, the topic of the class session that I have described here was graphing numerical data.  I think the matching activity set the stage well, providing an opportunity for students to talk with each other about data in a fun way.  I also hope that this activity helped to instill in students a mindset that they should always think about context when examining graphs and analyzing data.