Skip to content

Archive for

#78 Two fun (and brief) items

Thanks for reading this, my final blog post for the infamous year 2020.  In contrast to this seemingly unending year*, I will keep this post very brief.  I will conclude this decidedly not-fun year by presenting two fun items that I recently encountered.

* Even though today is December 28th, it feels more like March 303rd.  (I can’t take credit for this joke, but I regret that I cannot remember where I first saw a version of it.)


The first fun item is a quote from American educator Alice Wellington Rollins.  Even though I just learned of this quote within the past two weeks, it’s actually 122 years old, having appeared in the Journal of Education in 1898 (volume 47, issue 22, page 339, available here).  Stacey Hancock brought this to my attention, as she cites this quote in an article about teaching statistics that she has written for the March 2021 issue of the Notices of the American Mathematical Society.  I think this quote offers a valuable perspective on my “ask good questions” refrain:

The test of a good teacher is not how many questions he can ask his pupils that they will answer readily, but how many questions he inspires them to ask him which he finds it hard to answer.

Alice Wellington Rollins, Journal of Education, 1898

The second fun item is a very recent addition to the brilliant* collection of xkcd comics. 

* I like to think that I do not use the adjective brilliant casually.  If you have not seen these comics, consider taking a look.  Some particularly clever ones that address statistical ideas include: Convincing (here), Correlation (here), and Significant (here).

When I look back on this horrible but memorable year, I hope to think of this image and advice from a recent xkcd comic (available here):


Many thanks and best wishes to all who have read this blog in 2019 and 2020.  I hope that you have found something that helps you to ask good questions of your students.  My aspiration remains to write essays about teaching introductory statistics that are practical, thought-provoking, and fun*.

* And, perhaps just this once, brief.

#77 Discussing data ethics

This guest post has been contributed by Soma Roy.  You can contact her at soroy@calpoly.edu.

Soma Roy is a colleague of mine in the Statistics Department at Cal Poly – San Luis Obispo. Soma is an excellent teacher and has been so recognized with Cal Poly’s Distinguished Teaching Award.  She also served as editor of the Journal of Statistics EducationI recently learned about some of Soma’s ideas for generating student discussions in online statistics courses, and I am delighted that she agreed to write this guest blog post about one such idea, which introduced students to data ethics.


The GAISE (Guidelines for Assessment and Instruction in Statistics Education) College Report (available here) recommends the use of real data with a context and purpose in statistics classes*. One of the ways I achieve this throughout the course, regardless of what statistics topic we are studying at the time, is by always using data (either in raw or summarized form) from research studies published in peer-reviewed journals.

* Just because the recommendation comes in the college report doesn’t mean that the advice couldn’t apply to K-12 classes.

For example, a study I use to motivate the comparison of means between two groups was conducted by Gendreau et al. and published in the Journal of Abnormal Psychology in 1972 (here). In this study, 20 inmates at a Canadian prison were randomly assigned either to be in solitary confinement or to remain non-confined (that is, have contact with others around them) for seven days. Researchers measured each inmate’s EEG alpha frequency on several days* in order to investigate the effect that sensory deprivation can have on one’s EEG alpha frequency**.

* The article provides data for the 20 inmates at three different time periods, but my students only analyze the data from the final (seventh) day of the experiment.

** Alpha waves are brain waves, the predominance of which is believed to indicate that the individual is in a relaxed but aware state. High frequency of alpha waves is considered to be better than low frequency of alpha waves (Wikipedia).

Without fail, one of the first things that students do when they read about this study is ask: How could they just put someone in solitary confinement? That becomes a jumping off point for our discussion on data ethics. This discussion covers the ethics of study design, data collection, data analyses, and publication of findings.


When the COVID-19 pandemic turned my in-person class into an online class, I decided to turn our brief, in-class discussion into an asynchronous, week-long discussion in our learning management system, Canvas. Borrowing from Allan’s style, the questions that I posted appear in italics, below, accompanied by short blurbs on what I was hoping to address with each of the questions, as well as some student responses and comments.

You have read about an experiment conducted on inmates of a Canadian prison, where 20 inmates were randomly split into two groups. One group of 10 inmates was placed in solitary confinement, and the other group was allowed to remain non-confined. 

Are you as struck as I was the first time I read about this experiment, by how unethical and cruel this experiment was, in that people were randomly assigned to be placed in solitary confinement!? 

Unfortunately, there have many, many experiments in the past that violated human rights. That realization has brought about the requirement for all research projects involving human subjects to be reviewed before any data can be collected. 

This discussion is about the ethics to be considered when one decides to carry out a study with human subjects (specifically an experiment that involves manipulating treatment conditions), collect data, or analyze data and publish results from any study. The first few questions below focus on historical studies, while the next few questions in this discussion look into what the process is to propose and carry out human subjects studies, and also what are ethical practices when it comes to data analysis and publication of study results. 

I hope that, going forward, this discussion helps you think critically about any studies that you may be involved in as a researcher, and keep in mind that (to borrow from the great American poet Maya Angelou) when we “know better, (we should) do better.” 

For this discussion, you need to make two (2) posts:

Part 1: First, you will post a response to one of the questions (1) – (10) below. Be sure to copy and paste the question that you are responding to. 

1. Google “Tuskegee Syphilis Study” – describe the study (year(s), methods, participants, objective, etc.). Why is it considered unethical? Cite your source(s). (e.g., Wikipedia link)

2. Google “US apologizes to Guatemalans, 1940s” – describe the study or studies conducted in the 1940s (year(s), methods, participants, objective, etc.). Why are the studies considered unethical? Cite your source(s). (e.g., Wikipedia link)

3. Google “Human Radiation Experiments in the US, 1940s” – describe the study or studies conducted in the 1940s and even later (year(s), methods, participants, objective, etc.). Why are the studies considered unethical? Cite your source(s). (e.g., Wikipedia link) 

4. Google “Project Bluebird, Project Artichoke” – describe the study or studies (year(s), methods, participants, objective, etc.). Why are the studies considered unethical? Cite your source(s). (e.g., Wikipedia link) 

5. Google “The Monster Study” – describe the study (year(s), methods, participants, objective, etc.). Why is the study considered unethical? Cite your source(s). (e.g., Wikipedia link) 

6. Google “Brown eyes, Blue eyes experiment, Jane Elliot” – describe the study (year(s), methods, participants, objective, etc.). What was the objective of the study? Why do some people consider the study to be unethical? Cite your source(s). (e.g., Wikipedia link) 

This first part of my discussion assignment requires students to read up about a particular historical study, identify some of the key elements such as what was the objective of the study, on whom was the study conducted, when it was conducted, how it was conducted, and why the study is considered unethical. Students are required to cite their sources.

All six of these studies have a plethora of information available from multiple reliable sources on the internet. My hope is that as students read about these studies, they will recognize the shortcomings in the study design – where the researchers went wrong in how they treated their subjects or how they recruited their subjects, or just who their subjects were. I also hope that students will recognize the need for an institutional review board (IRB), the need for informed consent, and the need to protect vulnerable populations.

The Tuskegee study, understandably the most infamous of the lot, draws the most outrage from students. Students find the experiment “crazy and insane,” “a great example of raging biases and racism,” and “lacking in decency.” Students are appalled that little to no information was shared with the participants, that a study that was supposed to last only 6 months lasted 40 years, and that even after penicillin was established to be a standard treatment for syphilis, it wasn’t administered to the participants. Students are saddened by the fact that the researchers abused the knowledge that the participants were impoverished by offering incentives such as free meals and free treatment for other ailments in return for their participation in the study.

Students have similar reactions to the other studies as well. Some of their common responses include:

  • Subjects in any study should be told whether any negative outcomes were to be expected.
  • Participation should be voluntary; leaving the study should be easy and come at no cost to the participant.
  • Children should not be experimented on, at least not without permission from a parent or guardian who can make decisions in the child’s best interest.
  • People who are vulnerable, such as children, prisoners, pregnant women, and people from racial and ethnic minorities, should be protected, and not taken advantage of.

The “Brown eyes, blue eyes” experiment draws some interesting responses*. Some of my students write that while the experiment was well meaning, and was trying to teach students about discrimination on the basis of color, conducting an experiment on impressionable children, especially without the consent of their parents, was unethical. 

* For anyone unfamiliar with this experiment: On the day after the assassination of Dr. Martin Luther King, Jr., teacher Jane Elliot repeatedly told students in her all-white third-grade class that brown-eyed people were better than blue-eyed people.  On the next day, she switched to saying that blue-eyed people were better than brown-eyed people. She observed her students’ behaviors toward each other on both days.


Through their answers to the questions above, sometimes directly and sometimes indirectly, students arrive at recognizing the need for an institutional review board, the need for informed consent, and the need to protect vulnerable populations. This leads to the next set of questions in my discussion assignment:    

7. When you conduct research on human subjects, your research protocol needs to be reviewed by an institutional review board, and you need to obtain informed consent from your subjects. Explain what the bold terms mean, when did these procedures start getting enforced in the U.S., and why you need the review or informed consent. Cite your source(s). (e.g., Wikipedia link) 

8. When you conduct research on human subjects, certain sections of the population are referred to as “vulnerable populations” or “protected groups.”  What are these groups, and why do they need to be protected? Give one or two historical examples that were unethically performed on vulnerable populations. Cite your sources (e.g. link from National Institutes of Health) 

For the question about the IRB and informed consent, students are required to describe the terms, why they are needed, and report what year these procedures were put in place in the U.S. Again they are required to provide references. Students discover that concerns about many of the studies referred to in (1) – (6), specifically the Tuskegee Syphilis study and the human radiation experiments, led to the creation of IRBs.

In the wrap-up of this discussion, we revisit the study about the Canadian prisoners, in which some inmates were assigned to solitary confinement to study the effect of sensory deprivation on brain function. The research article mentions that the subjects volunteered to participate, and were told that there were no incentives (e.g. monetary or parole recommendation), that their status in prison would remain unchanged, except for a note in their file mentioning their cooperation. Students discuss whether this is enough of a protection, or enough of an informed consent.


The next two questions touch upon what happens to data after they have been collected. Should the person analyzing the data get to pick and choose which data to include in the analysis, based on what creates a more sensational story? Should studies be published only if they show statistically significant findings? Who stands to lose from violations of the ethics of data analysis? Who stands to lose from publication bias*?

* For class examples, I intentionally use studies that showed statistically significant results as well as studies that didn’t. I also have a separate week-long discussion topic in which students read article abstracts from various peer-reviewed journals, where they see both statistically significant and not significant study results; that discussion touches on one more aspect of data ethics – who funded the study, and why that is important to disclose and to know?

9. What is publication bias? When does it arise? Who stands to benefit from it? More importantly, who stands to lose from it? Give an example of any study or studies where publication bias was present. Cite your source(s). (e.g., Wikipedia link)

10. What is data manipulation (including “selective reporting” and “data fabrication”)? How is it done? Who stands to benefit from it? More importantly, who stands to lose from it? Give an example of any study or studies where the researchers were accused of wrongful data manipulation. Cite your source(s). (e.g., Wikipedia link)


To earn full credit for the discussion assignment, students must also reply to another student’s post.  This is just my way of encouraging them to read and reflect on what other students posted. Students can only reply after they have first submitted their own initial post:

Part 2: Second, respond/reply to a post by another student – adding more detail/insight to their post. (Note: You will need to first post an answer to part 1 before you can see anybody else’s posts.)


I grade these student discussions very generously. Students almost always get full credit as long as they follow the instructions and make reasonable posts, cite their sources, and don’t just copy-and-paste a Wikipedia article.

On my end-of-quarter optional survey about the class this term, students noted this ethics discussion as the discussion they liked the most. Some students said that this discussion topic was the topic from the course that made the biggest impression on them – describing it as “thought-provoking,” “interesting,” and “eye opening.”

In the past I have used this discussion assignment only in introductory classes. But now that I have the online discussion set up in Canvas, I will also use it in my upper-level courses on design of experiments.

Even though I have used these questions as a discussion topic, I can also see using them as a homework assignment, mini-project, or student presentation. For now, I will stick with the online discussion format because my students said they liked reading what other students wrote. While the pandemic keeps us in remote online classrooms, this format provides one more way for students to connect with their peers, as well as learn about some ethical issues associated with collecting and analyzing data.

This guest post has been contributed by Soma Roy.  You can contact her at soroy@calpoly.edu.

#76 Strolling into serendipity

This post is going to meander.  I’ll get to the point right away, but then I’m going to take a long detour before I return to the point.

The point of this post is to let you know about the 2021 U.S. Conference on Teaching Statistics (USCOTS), encourage you to attend and participate in this conference, and urge you to help with spreading the word.  The conference theme is Expanding Opportunities.  It will be held virtually on June 28 – July 1, with pre-conference workshops beginning on June 24.  The conference sessions will be thought-provoking, directly relevant to teaching statistics, and fun!  See the conference website here for more information.

Now I’m going to indulge in a stroll down memory lane before I return to the point.  If you’re in a hurry or don’t feel like accompanying me on this journey, I understand completely and encourage to skip ahead past the next several sections.  You can search for “And then 2020 happened” to find the spot where I conclude my reminiscences and return to discussing the 2021 USCOTS.


I like conferences.  Even though I’m an introvert who feels much more comfortable in a small town than in a big city, I have greatly enjoyed and learned a lot from attending conferences across the country and around the world.  The best part has been meeting, learning from, and befriending people with similar professional goals and interests.

My first conference was the Joint Mathematics Meetings (JMM) held in San Francisco in 1991.  I had never been to San Francisco, and I had only been to California when I was nine years old.  I was in my second year of teaching at Dickinson College in Pennsylvania.  I roomed with my good friend from graduate school Tom Short, who was on the academic job market.  We walked around the city, taking in the sights and remarking that San Francisco is an even hillier city to walk than Pittsburgh, where we had attended Carnegie Mellon University together.  A conference highlight for me was attending a presentation by Tom Moore, whom I had never met.  Tom had written an article with Rosemary Roberts, titled “Statistics at Liberal Arts Colleges” (here), which had inspired me as I finished graduate school and before I started teaching at Dickinson.  I also gave a presentation at the conference, titled “Using HyperCard to teach statistics.”  I remember being extremely nervous before my presentation.  As I refresh my memory by checking the conference program here, I am surprised at not remembering that my presentation was apparently given at 7:05 on a Saturday morning!*

Another memorable conference from early in my career was the ASA’s 1992 Winter Conference, held in Louisville, Kentucky.  I was amazed and delighted to find an entire conference devoted to the theme of Teaching Statistics.  By this time Tom Short was teaching at Villanova University, so he and I drove to Louisville together.  I gave my first conference talk about an early version of Workshop Statistics.  Two presentations had a huge impact on my teaching and stand out in my mind to this day.  Bob Wardrop described his highly innovative introductory course that reimagined the sequencing of topics by using simulation-based inference to present topics of statistical inference from the beginning of the course.   Joan Garfield gave the plenary address, invited and introduced by David Moore, on educational research findings about how students learn statistics.  Joan later wrote an article based on this presentation titled “How Students Learn Statistics” (available here), the general principles of which hold up very well more than 25 years later.

Returning to San Francisco for the Joint Statistical Meetings (JSM) in 1993, I met and chatted with Jeff Witmer, convener of the “isolated statisticians” group and editor of Stats magazine, to which I had recently submitted an article.  I also interacted with Robin Lock for the first time at that conference; he and I have presented in the same sessions of conferences, sometimes with a joint presentation, many times over the years.  The 1993 JSM was also the occasion in which I met a graduate student from Cornell University who was studying both statistics and education, and who had a perfect name for a statistics teacher*.

* Of course, I had no clue at the time that Beth Chance and I would write articles and textbooks together, give conference presentations and conduct workshops together, coordinate the grading of AP Statistics exams, become colleagues in the same department, and eat ice cream together more times than I could count.

In 1994 I traveled outside of North America for the first time, to attend the International Conference on Teaching Statistics (ICOTS) in Marrakech.  Despite tremendously troublesome travel travails*, I greatly enjoyed the exotic locale and the eye-opening experience of meeting and hearing from statistics teachers and education researchers from around the world.  I gave another presentation about Workshop Statistics.  Some specific memories include George Cobb’s talk about workshops for mathematicians who teach statistics and Dick Scheaffer’s presentation about Activity-Based Statistics.

* Try saying (or typing) that ten times fast.

Oh dear, I really could keep writing a full paragraph (or more) about every conference that I’ve attended over the past thirty years.  But I need to remember that I’m writing a blog post, not a memoir.  I hope I’ve made my point that I benefitted greatly from attending and presenting at conferences as I embarked on my career as a teacher of statistics.  Especially for a small-town introvert, these conferences greatly expanded my horizons.  I’m incredibly fortunate and grateful that some of the people I met at these conferences, whose work I admired and had a big impact on me, went on to become lifelong friends and valued collaborators.

I hasten to add that I have continued to enjoy and benefit from conferences throughout my career.  Since 1995, the only JSM that I have missed was in 2016 due to illness.  It took me a few months to recover from my surgery that year, and I considered myself fully recovered when I was able to attend the AMATYC conference in Denver in November of 2016.  I remember feeling very happy to be well enough to walk around a conference hotel and be able to participate in a conference again.  I also recall feeling somewhat silly to consider conference attendance as an important marker of my recovery.


As I continue this stroll down memory lane, I now turn toward USCOTS.  I have attended all eight USCOTS conferences*, which have been held in odd-numbered years since 2005, and I have come to regard USCOTS as my favorite conference. 

* I realize that the word “conference” here is redundant with the C in USCOTS, but I fear that “USCOTSes” looks and sounds ridiculous.

The organizers of the first USCOTS, Dennis Pearl and Deb Rumsey and Jack Miller, did a terrific job of establishing a very welcoming and supportive environment.  Conference sessions were designed to engage participants, and the conference provided provide many opportunities for interaction among attendees, outside of sessions as well as during them.

The inaugural USCOTS in 2005 was the most influential conference my career.  The lineup of plenary speakers was star-studded: Dick Scheaffer and Ann Watkins, Roxy Peck, Cliff Konold, Robin Lock and Roger Woodard, and George Cobb (see the program here).  Roxy’s talk was memorable not only for its enticing title (How did teaching introductory statistics get to be so complicated?) but also for the insights about teaching statistics that Roxy garnered from a famous video of a selective attention test (here).  George’s banquet presentation at this conference, which also featured a provocative title* (Introductory statistics: A saber tooth curriculum?), has achieved legendary status for inspiring a generation of statistics teachers to pursue simulation-based inference**. 

* Of course, I admire that both of these titles ask good questions.

** See here for a journal article that George wrote, based on this presentation, in which he subtly revised to title to ask: A Ptolemaic curriculum?

The next three USCOTS were also very engaging and informative.  I will mention just one highlight from each:

  • In 2007 Dick De Veaux gave a terrific banquet presentation, titled “Math is music; statistics is literature,” that was almost the equal of George’s for its cleverness and thought-provoking-ness. 
  • Chris Wild inspired us in 2009, and provided a glimpse of even more impressive things to come, with his demonstration of dynamic software that introduces young students to statistics, and excited them about the topic, through data visualization. 
  • Rob Gould challenged us in 2011 to think about how best to prepare students to be “citizen statisticians,” arguing that they come to our classes having already experienced immersive experiences with data.

My point here is that USCOTS was designed from the outset as a very engaging and interactive conference, ideal for statistics teachers looking to meet like-minded peers and exchange ideas for improving their teaching.


Following the 2011 USCOTS, I was quite surprised and honored when Deb and Dennis asked me to take on the role of USCOTS program chair.  I have now served in this capacity for four conferences, from 2013 – 2019.  I have tried to maintain the distinctive features that make USCOTS so valuable and worthwhile.  My primary addition to the program has been a series of five-minute talks that comprise opening and closing sessions.  I have been thrilled that so many top-notch statistics educators have accepted my invitations to give these presentations.

If you’ve never given a five-minute presentation, let me assure you that it can be very challenging and nerve-wracking.  Condensing all that you want to say into five minutes forces you to focus on a single message and also to organize your thoughts to communicate that message in the brief time allotted.  

For my first year as program chair in 2013, I went so far as to insist on the “Ignite” format that requires each presenter use 20 slides that automatically advance every 15 seconds.  I have loosened this restriction in subsequent years.  The opening five-minute talks have launched the conferences with energy and fun.  They have generating thought-provoking discussions among attendees.  The closing talks have recapped the conference experience and inspired participants to depart with enthusiasm for implementing some of what they’ve learned with their own students*. 

* You can find slides and recordings for these five-minute talks, along with other conference presentations and materials, by going here, clicking on “years” on the right side, going to the year of interest, then clicking on “program,” and finally clicking on the session link within the program page.  As you peruse the lists of presenters for an opening or closing session, you may notice that I like to arrange the order of presentation alphabetically by first name.

My point in this section is that since I have been entrusted with the keys to the USCOTS program, I have tried to maintain USCOTS as welcoming, engaging, and valuable conference.  Serving as program chair for the past four incarnations of USCOTS has provided me with considerable helpings of both professional pride and enjoyment.


After the 2019 USCOTS, I decided to pass the program chair baton to someone of the next generation who would infuse the conference with new ideas and vitality.

I asked Kelly McConville to take on this role.  Even though Kelly is early in her career as a statistics professor*, she already has considerable experience as a successful program chair.  She has served as program chair for ASA’s Statistics and Data Science Education section at JSM, for the Electronic Undergraduate Statistics Research Conference, and for the Symposium on Data Science and Statistics (see here).  Kelly has attended several USCOTS conferences and gave one of the five-minute talks at the closing session for USCOTS in 2017.

* Congratulations are in order, because Kelly was informed just last week that she has earned tenure in her faculty position at Reed College.

Kelly replied by asking if I would consider co-chairing USCOTS with her in 2021, and I happily agreed.


And then 2020 happened*.

* There’s obviously no need for me to describe how horrible 2020 has been in myriad ways.  But I can’t resist noting that a vaccine has been developed, tested, and approved in less than one year.  This is an incredible achievement, one in which the field of statistics has played an important role. The vaccine is being administered for the first time in the U.S. (outside of trials) on the day that this post appears.

The pandemic required Dennis (who continues to serve as director of CAUSE, the organization that puts on USCOTS) and Kelly and me to decide whether to plan for an in-person, virtual, or hybrid USCOTS.  Spurred on by Camille Fairbourne, Michigan State University had agreed to host USCOTS in late June of 2021.  In August of 2020, we asked statistics teachers to answer survey questions about planning for USCOTS.  Among 372 responses, 50.3% recommended a virtual conference and only 11.8% recommended in-person, with the remaining 37.9% preferring a hybrid.  Mindful of drastic cuts to many schools’ budgets as well as continuing uncertainty about public health, we made the difficult decision to forego an in-person conference and hold USCOTS virtually.

We quickly selected a conference theme: Expanding Opportunities.  Aspects of this timely theme that conference sessions will explore include:

  • How can we increase participation and achievement in the study of statistics by students from under-represented groups?
    • What classroom practices can help with this goal?
    • How can curriculum design increase such participation and achievement?
    • What role can extra-curricular programs play?
    • How can remote learning and new technologies help?
    • How can we collaborate more effectively with colleagues and students in other disciplines to achieve this goal?
  • How can we support and encourage students and colleagues who are beginning, or contemplating, careers in statistics education?
  • Can the emerging discipline of data science help to democratize opportunities for students from under-represented groups?
  • What does educational research reveal about the effectiveness of efforts to expand opportunities?

The conference will feature thought-provoking plenary sessions, interactive breakout sessions, informative posters-and-beyond sessions, and opening and closing sessions with inspiring and lively five-minute presentations. Other highlights include birds-of-a-feather discussions, a speed mentoring session, an awards ceremony*, extensive pre-conference workshops, and sponsor technology demonstrations.

* The USCOTS Lifetime Achievement Award has been renamed the George Cobb Lifetime Achievement Award in Statistics Education, in honor of George, the first recipient of the USCOTS Award, who passed away on May 6, 2020.

One of the plenary sessions will be a panel discussion about fostering diversity in our discipline.  Kelly and I plan to ask the panelists questions such as:

  • What are some barriers to pursuing study of statistics, and succeeding in study of statistics, for students from under-represented groups?
  • What are some strategies for eliminating barriers and expanding opportunities for students from under-represented groups in the following areas?
    • Recruitment
    • Curriculum
    • Individual courses
    • Program/department culture
    • Other?
  • How (if at all) does the emerging discipline of data science offer potential solutions for expanding opportunities and fostering diversity?
  • What are some strategies for encouraging and supporting people from diverse backgrounds to pursue and succeed in careers as statistics teachers and statistics education researchers?

We are determined to reproduce the welcoming, engaging, interactive, and fun aspects of USCOTS as much as possible in a virtual setting.  We also hope that the virtual format will encourage participation from statistics teachers who might not have invested as much time as it takes to travel to an in-person conference.


One of my favorite words is serendipity.  I like the definition from Google’s dictionary almost as much as the word itself: the occurrence or development of events by chance in a happy or beneficial way.  The benefits that I gained from attending conferences early in my career resulted from chance encounters more than from planned meetings.  Serendipity is one of the best aspects of any conference*. 

* Heck, serendipity is one of the best things in life.  Sadly, serendipity has also been one of the biggest casualties of the pandemic.

By definition, serendipity is impossible to plan in advance.  Serendipity is especially challenging to arrange with a virtual conference that people can attend without leaving their homes.  But we’re going to do everything we can to infuse the 2021 USCOTS with opportunities for serendipity, and we welcome suggestions about how to create such opportunities.  I hope that all USCOTS participants in 2021 make new acquaintances and renew friendships with colleagues who are united by a common desire to teach statistics effectively to the next generation of citizens and scholars.

How can you help?  First, mark the dates June 28 – July 1, 2021 on your calendar and plan to attend USCOTS.  Second, consider submitting a proposal to conduct a workshop, lead a breakout session, present a virtual poster, or facilitate a birds-of-a-feather discussion.  Third, please let others know about USCOTS and encourage them to participate.  Spreading the word broadly can expand opportunities to participate in USCOTS, where we can share ideas about expanding opportunities for others to engage in our profession. 

Once again, more information is available at the conference website here.

#75 More final exam questions

I gave my first asynchronous online final exam this past week.  I find writing online exams to be much more time-consuming and stressful than writing good, old-fashioned in-person exams*.  I’ve identified five aspects of writing online exams that take considerable time and effort:

  1. Writing with good multiple-choice questions and answer options;
  2. Creating multiple versions of most questions in an effort to reduce cheating;
  3. Thinking of questions where googling does not provide much of an advantage;
  4. Entering all of the questions into the format required by the learning management system;
  5. Double- and triple- and quadruple-checking everything**

* I’m finding it hard to remember the days of photocopying exams and handing them to students on paper.

** I became obsessed with this last one, because typos and other errors are so much more problematic now than they used to be.  I may not remember photocopying, but I fondly recall the good old days when a student would point out a mistake and I simply had to say: Excuse me, class, please look on the board to see a correction for part c) of question #3.  I really stressed and lost sleep about this.  And somehow I still managed to mess up!  I’m embarrassed to report that despite my efforts, students found an error on both the Wednesday and Friday versions of my final exams.  I was especially grateful to the student who started the exam at 7am on Wednesday and let me know about the error as soon as she finished, so I was able to make the correction before most students began the exam.

Now I’m in the throes of grading.  You may know that when it comes to grading, I enjoy procrastination*.  But the timeline is tight because grades are due on Tuesday.  Without further preamble, I will now discuss some of the multiple-choice questions that I asked my students on this exam.  I will provide answers at the end.

* See post #66, First step of grading exams, here.


1. Suppose that you want to investigate whether Cal Poly students tend to watch more movies than Cal Poly faculty.  Would you collect data to investigate this question using random sampling, random assignment, or both? [Options: A) Random sampling only; B) Random assignment only; C) Both random sampling and random assignment]

I like this question because I try to emphasize the distinction between random sampling and random assignment.  This is a meant to be an easy question.  Students should realize that it’s not reasonable to randomly assign people to the roles of faculty or student.

2. Suppose that the nine current members of the U.S. Supreme Court are still the same nine members of the Supreme Court two years from now. Indicate how the following values will change from now until then (two years from now). a) Mean of ages; b) Standard deviation of ages; c) Median of ages; d) Inter-quartile range of ages [Options: A) Increase; B) Decrease; C) Remain the same]

This is also intended as an easy question.  The mean and median will increase by two years.  But as measures of variability, the standard deviation and inter-quartile range will not change when everyone becomes two years older.

3. a) Which would be larger – the mean weight of 10 randomly selected people, or the mean weight of 1000 randomly selected cats (ordinary domestic housecats)?  b) Which would be larger – the standard deviation of the weights of 1000 randomly selected people, or the standard deviation of the weights of 10 randomly selected cats (ordinary domestic housecats)? [Options: A) Cats; B) People]

I have written about this question before*.  Part (b) is very challenging for students.  Unfortunately, many students come to believe that a larger sample size produces a smaller standard deviation, without realizing that this result applies to the variability of a sample statistic, such as a sample mean, not to variability in the original measurements, such as weights of people and cats.

* See post #16, Questions about cats, here.

4. Suppose that a fair coin is flipped 10 times.  Which is more likely – that the flips result in 5 heads and 5 tails, or that the flips result in 6 of one outcome and 4 of the other? [Options: A) 5 of each; B) 6-4 split; C) These are equally likely.]

Students could answer this by calculating the relevant binomial probabilities.  But they might also realize the key point that a 6-4 split can happen in two different ways.  Even though a particular 6-4 split is less likely than a 5-5 result, a 6-4 split in either direction is more likely than a 5-5 result.  These probabilities turn out to be 0.246 for obtaining 5 heads and 5 tails, 0.410 for achieving a 6-4 split.

5. Suppose that Chiara has a 10% chance of making an error when she conducts a test. If she conducts 10 independent tests, which of the following is closest to the probability that she makes at least one error? [Options: A) 0.10; B) 0.25; C) 0.50; D) 0.65; E) 0.99]

I intend for students to perform the calculation: Pr(at least one error) = 1 – Pr(no errors) = 1 – (0.9)10 ≈ 0.651.  I chose options far enough apart that some students might use their intuition to determine the correct answer, if they realize that making at least one error would be more likely than not without being extremely likely.


6. The United States has about 330 million residents.  Suppose that you want to estimate the proportion of Americans who wore socks yesterday to within a margin-of-error of 3.5 percentage points with 95% confidence.  Which of the following is closest to the number of people that you would need to randomly sample? [Options: A) 30; B) 1000; C) 30,000; D) 1,000,000]

I also discussed this question, which I ask on every final exam, in post #21 here.  Influenced by the 330 million number, many students mistakenly believe that a sample size of 1 million, or at least 30 thousand, people is required.

7. Suppose that Carlos, Dwayne, and Elsa select separate and independent random samples of 50 Cal Poly students each.  They ask each student in the sample how much sleep they got last night, in minutes.  Then they calculate the average amount of sleep for the students in their sample.  How likely is it that Carlos, Dwayne, and Elsa obtain the same value for their sample average? [Options: A) This is very likely. B) There’s about a 50% chance of this. C) There’s a 1 in 3 chance of this. D) This is very unlikely.]

This question addresses the concept of sampling variability, which is even more fundamental than that of sampling distribution.  This is meant to be an easy question that students can answer based on their intuition or by remembering what we discovered when simulating the drawing of random samples with an applet such as this one (here) that randomly samples words from the Gettysburg Address.

8. Suppose that Yasmin and Jade want to select a random sample of San Luis Obispo county residents and ask each person whether or not they spent Thanksgiving in their own home.  Suppose also that Yasmin wants to estimate the population proportion to within ± 0.04 with 95% confidence, and Jade wants to estimate the population proportion to within ± 0.02 with 95% confidence.  Who would need to use a larger sample size?  (You need not calculate any sample sizes to answer this question.)  [Options: A) Jade; B) Yasmin; C) They would both need the same sample size.]

Here is another question for which students could spend a good bit of time performing calculations, but they’re better served by thinking this through.  They need only realize that obtaining a smaller margin-of-error requires a larger sample size.

9. Suppose that you conduct a hypothesis test about a population mean and calculate the t-test statistic to equal 0.68.  Which of the following is the best interpretation of this value?  [Options: A) If the null hypothesis were true, the probability would be 0.68 of obtaining a sample mean as far as observed from the hypothesized value of the population mean. B) The probability is 0.68 that the null hypothesis is true. C) The sample mean is 0.68 standard errors greater than the hypothesized value of the population mean. D) The sample mean is equal to 0.68 times the standard error.]

Students’ ability to interpret the value of a test statistic is worth assessing.  You no doubt realize that I purposefully chose a value less than 1 for the t-test statistic here, partly to see whether students might confuse the interpretation of a test statistic and a p-value.

10. Suppose that you take a random sample of 100 books from a large library.  For each of the following questions, indicate the appropriate inference procedure. a) How old, on average, is a book from this library? b) Are 75% of books in this library less than 20 years old? c) What percentage of books in this library contain fewer than 300 pages? d) How many pages, on average, are contained in a book from this library? e) What percentage of books in this library have been borrowed at least once in the past 10 years? [Options: A) z-interval for proportion; B) z-test for proportion; C) t-interval for mean; D) t-test for mean]

This series of questions is very similar to the questions that I discussed in last week’s post (A sneaky quiz, here), so my students should have expected questions of this type.  I think these questions are a bit harder than the ones I presented in class and on that quiz, though.  Parts (b) and (c) involve a categorical variable, but students might be tempted to think of a numerical variable because the context also refers to a book’s age and number of pages.


I’m selfishly glad that the time I invested into writing multiple-choice questions for my final exam has now served double-duty by providing me with the basis for this blog post.  But I really do need to get back to grading the open-ended questions …

P.S. The correct answers are: 1. A; 2. A, C, A, C; 3. B, B; 4. B; 5. D; 6. B; 7. D; 8. A; 9. C; 10. C, B, A, C, A.

#74 A sneaky quiz

Last summer I participated as a student in an online course for the first time.  The topic was how to teach an online course.  The course was delivered asynchronously, but it was not self-paced because there were regular due dates on assignments.  Somewhat to my embarrassment, I found that I was highly motivated by the assignments and those due dates.

As I have been teaching my own students online this term, I decided to give even more quizzes than usual to motivate my students to keep up.  For each topic that we have studied, I have given a handout quiz and an application quiz.  The handout quizzes have often asked the same questions that we answered as we worked through the handout, while the application quizzes have asked students to apply what they learned to a new study or situation.  As long as a student attended one of my live zoom sessions or watched the videos that I prepared, and paid a modest amount of attention, they should have done very well on the handout quizzes.  I even allowed two attempts on these handout quizzes, recording the average score.


My final class meeting of the term occurred on Monday of Thanksgiving week.  I told my students in advance that we would not study new material on that day.  Instead I provided them with practice questions about identifying which inference procedure to apply for a particular question.  As this is the first course in a two-course sequence, and we spent about half of the term studying probability, we have only studied inference for a single mean or a single proportion.  Here’s how I summarized things at the start of the handout for this class:

  • Statistical inference draws a conclusion (i.e., infers something) about a population parameter based on a sample statistic. 
    • A confidence interval estimates the value of a parameter with a range of values.
      • A population proportion π can be estimated with a z-interval.
      • A population mean μ can be estimated with a t-interval.
    • A hypothesis test assesses the plausibility of a particular claim about the parameter.
      • A claim about a population proportion π can be tested with a z-test.
      • A claim about a population mean μ can be tested with a t-test.

The instructions that I provided for the task were: For each of the following research questions, identify which of these four inference procedures would be appropriate.  Furthermore, if the research question calls for a hypothesis test, state the appropriate null and alternative hypotheses.  If the research question calls for a confidence interval, clearly identify the parameter to be estimated.

The ten questions that we analyzed were:

  • a) How many hours does a full-time Cal Poly student spend studying, per week, on average?
  • b) Does a full-time Cal Poly student spend an average of more than 25 hours per week studying?
  • c) Does the percentage of full-time Cal Poly students who were born in California differ from 80%?
  • d) What proportion of full-time Cal Poly students were born in California?
  • e) What proportion of people with a driver’s license in California have indicated a willingness to be an organ donor?
  • f) Have less than two-thirds of all people with a driver’s license in California indicated a willingness to be an organ donor?
  • g) What is the price of an average transaction at the Subway on campus?
  • h) What proportion of transactions at the Subway on campus include a soft drink?
  • i) Do most transactions at the Subway on campus include a soft drink?
  • j) Do weekday employees at a company take sick days disproportionately often on Mondays and Fridays?

We worked through the first four of these together.  I advised students to start by identifying the observational units, variable, and type of variable for each question.  I emphasized that deciding whether the parameter is a mean or a proportion boils down to determining whether the variable is numerical or categorical.  I also admitted that the question itself often contains a key (giveaway) word, such as average in parts (a) and (b), percentage in (c), and proportion in (d).

Next I asked students to discuss parts (e)-(j) together in zoom breakout rooms of 4-5 students per group.  Then we came back together to discuss these.  I pointed out that questions (f) and (i) do not use a giveaway word, so they require more careful thought.  Students need to realize that the variable in (f) is whether or not the person has indicated a willingness to be an organ donor, which is categorical, so the parameter is the proportion of all people with a California driver’s license who have indicated such a willingness.  The word most carries a lot of weight in (i), revealing that the alternative hypothesis is that the proportion of all Subway transactions that include a soft drink is greater than one-half.

Question (j) is a favorite of mine. Its impetus is an old* Dilbert cartoon, available here.  The joke is that the pointy-haired boss expresses outrage upon learning that two-fifths of all sick days at his company are taken on Mondays and Fridays.  The observational units are sick days, and the variable is whether or not the sick day was taken on Monday or Friday.  The null hypothesis asserts that two-fifths of all sick days are taken on Monday or Friday, which is what would be expected if sick days were not being mis-used to produce long weekends.  The alternative hypothesis is that more than two-fifths of all sick days are taken on Monday or Friday.

* I just realized that very few, if any, of my students were alive when this particular cartoon appeared in 1996.  Hmm, I wonder if my university’s special incentive to take early retirement is still available.


The title of this post promised something sneaky.  You might be thinking that unless sneaky has been redefined as boring, what you’ve read so far does not even come close.  Please keep reading …

I have mentioned before that my course this term is asynchronous, even though I strongly encourage students to attend my optional live zoom sessions on MWF mornings.  Because of the asynchronous listing, I feel obligated to make videos to accompany the handouts for the students who cannot, or choose not to, attend the live sessions.  These videos show me working through the examples in the handout.  I always begin by saying something like: I strongly encourage you to pause the video, answer the handout questions on your own first, and then resume the video to watch my discussion of the questions.  The videos usually show me writing answers directly in the handout file, especially when performing calculations. 

But this time I purposefully did not write on the handout for the video recording.  Instead I only talked about the ten questions (a) – (j).  For the students who ignored my advice to answer the questions for themselves before watching the video, I wanted them at a minimum to take their own notes based on what I was saying.  I hope that active listening and writing might have activated their learning to some extent.

That’s a bit of sneakiness on my part, but that does not constitute the sneaky quiz mentioned in the title of this post.


Most of my handout quiz questions throughout this term have repeated questions that were asked directly in the handout.  But this time students could not answer questions on the handout quiz merely by copying answers from their notes.  Here are the quiz questions:

  1. For how many of the ten questions in this handout are Cal Poly students the observational units?
  2. How many of the ten questions in this handout involve a categorical variable?
  3. How many of the ten questions in this handout involve inference for a population mean?
  4. How many of the ten questions in this handout ask for a confidence interval?
  5. How many of the ten questions in this handout ask for a hypothesis test with a two-sided alternative hypothesis?

I hope that this sneaky approach of mine forced students to review their notes and also reinforced some ideas about how to decide on an inference method.  I hope that these quiz questions reminded students, perhaps sub-consciously, to think about the observational units (question #1), the type of variable (#2), the parameter (#3), whether the question calls for a confidence interval or hypothesis test (#4), and whether an alternative hypothesis is one- or two-sided (#5).

My writing teachers from college might be disappointed that my previous two sentences both began with “I hope …”  Nevertheless, I return to that construction once more for my conclusion: I hope you agree that sneakiness is forgivable, perhaps even desirable, as a pedagogical strategy when the intent is to prompt student to think without their realizing it.

#73 No notes needed

My exams have been open-book, open-notes for as long as I can remember.  I tell students from the outset that they should focus on understanding rather than memorization, and I think they take comfort in knowing that they can look up formulas and definitions during exams.

As we approach the end of the fall term*, I have often told students that some terms, symbols, and facts should have become so familiar that they need not refer to their notes, even though they never set out to memorize the term, symbol, or fact.

* I am teaching the first in a two-course sequence for business majors.  We study inference for a single mean and for a single proportion at the end of the course.  The next course will begin with inference for comparing two groups.

For this blog post I decided to make a list of things that I want students to know without looking at their notes*.  In the spirit of fairness, I am going to do this without looking at any of my course notes.  In the spirit of fun, I encourage you to compile your own list before reading mine.

* This is very different from my list of thirteen important topics for students to learn. See post #52, here.


  1. The term observational unit
  2. The term variable
  3. The term categorical variable
  4. The term numerical variable
  5. The term explanatory variable
  6. The term response variable

These are the building blocks of designing a study, analyzing data, and drawing conclusions.  I don’t want my students to memorize definitions for these terms, but I ask about these so often* that I hope they can answer my questions without looking anything up.

* See post #11, Repeat after me, here.

  1. The symbol n

If students need to look up in their notes that n is our symbol for sample size, then they’re missing out on a lot.

  1. The term population
  2. The term sample
  3. The term parameter
  4. The term statistic

Again, I don’t want my students to memorize definitions of these terms, and I certainly won’t ask them to define these on an exam.  But I also don’t want students to have to stop and look up the definitions whenever they encounter these words.  In the last week or two, I have often said something like “remember that statistical inference is about inferring something about a population parameter based on a sample statistic,” and I sure don’t want students to need to look up those four terms in a glossary to understand my point.

  1. The symbol p-hat
  2. The symbol x-bar
  3. The symbol π
  4. The symbol μ

I have told my students several times in the past few weeks that understanding what these symbols mean needs to be second-nature for them.  It’s hard enough to understand a statement such as E(X-bar) = μ without having to look up what each symbol means.  I agree that we can and should express this result in words as well as symbols: If you select a very large number of random samples from a population, then the average of the sample averages will be very close to the population average.  But that’s a lot of words, and it’s very handy to use symbols.  I was very tempted to include the symbols σ and s on this list also.

  1. The term random sampling
  2. The term random assignment
  3. The term confounding variable

As I’ve written before*, I really want students to understand the difference between random sampling and random assignment.  In particular, I’d like students to understand the different kinds of conclusions that follow from these different uses of randomness.  Random sampling allows for findings about the sample to be generalized to the population, and random assignment opens the door to drawing cause-and-effect conclusions.  Confounding variables in observational studies provide an alternative to cause-and-effect explanations.  I hope that students learn these ideas well enough that they think of them as they read about statistical studies in their everyday lives.  Of course, I know that they will not refer to their notes from my class as they go about their everyday lives.

* See posts #19 and #20, Lincoln and Mandela, here and here.

  1. How to calculate an average

I always tell my students that they do not need to memorize formulas.  But I can expect them to know that calculating an average involves adding the values and dividing by the number of values, right?  I’d also like students to know that the median is the ordered value in position (n+1)/2, but that looks like a formula, so I’ll leave that off this list.

  1. The idea that standard deviation is a measure of variability

I do not expect my students to know the formula for calculating standard deviation, and I rarely ask them to calculate a standard deviation by hand.  But I do want them to know, without referring to their notes, that a larger standard deviation indicates more variability.

  1. How to calculate proportions (marginal, conditional, joint) from a two-way table of counts

My students have performed these calculations when analyzing categorical data and also for calculating conditional probabilities.  I hope that they feel confident with such calculations without using their notes.

  1. The idea that a difference between two percentages is not a percentage difference.

I don’t care if students need to look up how to calculate a percentage difference, but I do want them to know that that a difference in percentage points is not the same as a percent difference.  I don’t mean that I want to them to be able to state that fact, but I want them to recognize it when they encounter it.  For example, I’d like student to realize that increasing your success rate from 10% to 15% is not a 5% improvement in the success rate*.

* I wrote an entire essay about this in post #28, A pervasive pet peeve, here.

  1. How to interpret a z-score
  2. How to calculate a z-score

Perhaps I am violating my policy about not requiring students to learn formulas here.  But notice that I listed the interpretation first.  I want students to know, without looking it up, that a z-score reveals how many standard deviations a value is from the mean*.  This interpretation tells you how to calculate the z-score: [(value – mean) / standard deviation].  Granted, I suspect that most students learn the formula rather than think it through from the interpretation, but I think this one is important enough to know without referring to notes, because the idea is so useful and comes up so often.

* See post #8, End of the alphabet, here, for more about z-scores.

  1. That probabilities cannot be less than zero or greater than one.
  2. That the probability of an event is one minus the probability of its complement

These two do not require looking anything up, right?  If I ask what’s wrong with the statements that Pr(E) = -0.6 or Pr(E) = 1.7, I sure hope that a student does not need to refer to any rules to answer the question.  Similarly, if I say that the probability of rain tomorrow is 0.2 and then ask for the probability that it does not rain tomorrow, I’m counting on students to answer without using their notes.

  1. How to interpret an expected value

This is the one of the first items that came to mind when I decided to create this list.  If I had been given a dime for every time I’ve reminded a student that expected value means long-run average, then I would have accrued a very large average number of dimes per year over my long teaching career.

  1. The term mutually exclusive
  2. The term independent events

The meaning of these terms in probability closely mirrors their mean in everyday use, so I hope students can answer questions about these terms without consulting their notes.  I am tempted to include the addition rule for mutually exclusive events and the multiplication rule for independent events on this list, but I’ll resist that temptation.

  1. The idea that about 95% of the data from a normal distribution fall within two standard deviations of the mean

I’m not asking that students know the 68% and 99.7% aspects of the empirical rule by heart, only the part about 95% falling within two standard deviations of the mean*.  Several times in the past few weeks I have said something like: The value of the test statistic is 3.21 (or perhaps 1.21).  Is that very far out in the tail of a normal curve? How do you know?  At a minimum I’d like students to realize that a z-score of greater than 2 (in absolute value) is far enough in the tail to be worth noting.

* I am tempted to include knowing, without looking it up, that the more precise multiplier is 1.96, but I won’t go that far.  I do reserve the right to say things like “you know what the value 1.96 means in our course, right?” to my students.

  1. The idea that averages vary less than individual values
  2. The idea that the variability in a sample statistic (proportion or mean) decreases as the sample size increases

Now I’m asking a lot.  Being back to full-time teaching after a year off has led me to rethink many things, but I have not wavered on my conviction that sampling distributions comprise the most challenging topic for students*.  I am trying to keep my expectations modest with these two items, starting with the basic idea that averages vary less than individual values.  Even that is challenging for students, because the even more fundamental idea that averages vary from sample to sample is non-trivial to wrap one’s mind around.

* See posts #41 and #42, Hardest topic, here and here.

  1. That a confidence interval estimates the value of a population parameter
  2. That a larger sample size produces a smaller margin-of-error, a narrower confidence interval
  3. That a confidence interval for a population mean is not a prediction interval for a single observation

I’m not expecting students to know any confidence interval formulas off the top of their heads.  When it comes to confidence intervals, I only ask for these three things.  I consider the last of these three to be the most important misconception that we should address about confidence intervals*.

* See post #15, How confident are you, part 2, here.

  1. That null and alternative hypotheses are about population parameters
  2. That a smaller p-value indicates stronger evidence against the null hypothesis
  3. That the null hypothesis is rejected when the p-value is smaller than the significance level

Similarly, these are three things I’d like students to know about hypothesis testing without consulting their notes.  The first of these is part of my frequent reminder to students that part of statistics involves making inferences about a population parameter based on a sample statistic.   I hope that relying on simulation-based inference* leads students to internalize the second of these points.  I try not to over-emphasize making test decisions, as compared to assessing strength of evidence, but I do want students to know how to determine whether to reject a null hypothesis.

* See post #12, Simulation-based inference, part 1, here.

  1. The idea that statistical inference depends on random (or at least representative) samples from the population
  2. The idea that confidence intervals and hypothesis tests give consistent results
  3. The distinction between statistical significance and practical importance

Here’s a final set of three aspects of statistical inference for which I hope that students do not have to check their notes.  I’m not mentioning random assignment with the first one because my students have not yet studied inference for comparing two groups.  For the middle one, I want students to realize that when a hypothesis test rejects a hypothesized value for a parameter, then the corresponding confidence interval should not include that value.  And when the hypothesis test fails to reject the value, then the corresponding confidence interval should include that value.  I don’t expect students to know the subtleties involved here, for example that the test needs to be two-sided and that this doesn’t always hold exactly for inference about a proportion.  I just want the basic idea of this consistency to make sense and not require looking up.


Whew, this list is far longer than I anticipated when I began.  Remember that my students and I are only halfway through a two-course sequence!  I also strongly suspect that I’ve omitted several things that will cause me to shake my head vigorously when they come to me.

But also remember that my exams are open-notes, so my students can always look these things up.  But it would certainly save them a lot of time if these 40 items truly come as second-nature to them.  More importantly, I want them to know these things well enough to apply what they’ve learned far beyond their brief time in my course.

#72 Trade-offs

Making good decisions requires assessing trade-offs.  We encounter such situations frequently in everyday life as well as in professional settings.  As I am deciding what to do with myself at this very moment, I am weighing the trade-offs associated with writing this blog post and watching the Masters golf tournament.  If I watch golf, then I will have less time to write this post.  Its quality will suffer, and I will need to keep working on this post into Sunday evening.  Because I’m a morning person, that means that its quality will suffer even further.  But If I write this blog post now instead of watching golf, then I will miss out on a fun diversion that I look forward to every year.  What to do?  You could argue that I try to do a bit of both, watch golf with one side of my brain and write this post with the other.  But multi-tasking is not my strong suit.  Because this particular golf tournament only comes around once per year, I think I’ll focus on that for a while.  I’ll be back, I promise …

Okay, where was I?  While I was away, I realized that you are probably wondering: What does this have to do with teaching statistics?  I recently asked my students to complete an assignment based on the activity I presented in post #40, Back to normal (here).  This assignment has three goals:

  • The immediate goal is for students to develop their ability to perform fairly routine calculations from normal probability distributions, calculating both probabilities and percentiles. 
  • A secondary goal is to introduce students to the topic of classification problems.
  • The big-picture goal is to lead students to think about trade-offs and how decision-making often requires striking a balance between competing interests.

Here’s the assignment:

Suppose that a bank uses an applicant’s score based on some criteria to decide whether or not to approve a loan for the applicant.  Also suppose that these scores follow normal distributions, both for people who would repay to the loan and for those who would not:

  • Those who would repay the loan have a mean of 60 and standard deviation of 8;
  • Those who would not repay the loan have a mean of 40 and standard deviation of 12.

Consider this decision rule:

  • Approve a loan for applicants with a score above 50.
  • Deny the loan for applicants with a score of 50 or below.
  • a) Determine the z-score of the cut-off value 50 for each kind of applicant: those who would repay the loan and those who would not.  Show how to calculate these two z-scores by hand.  Also write a sentence interpreting each z-score.
  • b) Determine the probability that an applicant who would repay the loan is denied.  Also provide a shaded sketch.  (Feel free to use the applet here: www.rossmanchance.com/applets/NormCalc.html.)
  • c) Determine the probability that an applicant who would not repay the loan is approved.  (Again provide a shaded sketch.)

Now consider changing the cut-off value in the decision rule.

  • d) Determine the cut-off value needed to decrease to 0.05 the probability that an applicant who would repay the loan is denied.  (Also report the z-score and provide a shaded sketch.)
  • e) For this new cut-off value, what is the probability that an applicant who would not repay the loan is approved?  (Again report the z-score and provide a shaded sketch.)
  • f) Comment on how these two error probabilities with the new cutoff value compare to their counterparts with the original cutoff value.

Now consider changing the cut-off value in the decision rule again.

  • g) Determine the cut-off value needed to decrease to 0.05 the probability that an applicant who would not repay the loan is approved?  (Again report the z-score and provide a shaded sketch.)
  • h) For this new cut-off value, what is the probability that an applicant who would repay the loan is denied.  (Again report the z-score and provide a shaded sketch.)
  • i) For each of the three cut-off values that have been considered, calculate the average of the two error probabilities.  Which cut-off rule is the best according to this criterion?

The following table displays all of the probabilities in this assignment:

Question (f) is the key one that addresses the issue of trade-offs.  I want students to realize that decreasing one of the two error probabilities has the inescapable consequence of increasing the other error probability.  Then question (i) asks students to make a decision that balances those trade-offs.


I think this assignment achieves its three goals to some extent.  My main concern is that many students struggle to see the big picture about trade-offs.  I think many students tend to adopt tunnel-vision, answering one question at a time without looking for connections between them.  This is especially true for students who find the meant-to-be-routine calculations to be challenging.

If you compare this assignment to the extensive activity described in post #40 (here), you’ll see that I left out a lot.  Why?  Because I had to assess trade-offs.  I give so many other quizzes and assignments, and the course goes by so quickly on the quarter system, that I thought assigning the full activity would overwhelm students.  In hindsight I do wish that I had asked two more questions in the assignment:

  • j) Suppose that you regard denying a loan to an applicant who would repay it as three times worse than approving a loan for someone who would not repay it.  For each of the three cut-off values, calculate a weighted average of the two error probabilities that assigns weights according to this criterion.  Which cut-off rule is the best?

I think this question could have helped students to realize that they need not consider two trade-offs to be equally valuable, that they can incorporate their own judgment and values into consideration.  I also think my students could have benefitted from more work with the concept of weighted average.

  • k) Now suppose (perhaps unrealistically) that you could change the two probability distributions of scores.  What two changes could you make that would enable both error probabilities to decrease?  (Hint: Think of one change about their means, another about their standard deviations.)

With this question I would want students to realize that providing more separation between the means of the score distributions would reduce both error probabilities.  Reducing the standard deviations of the score distributions would also have this desired effect.  I hope that my hint would not make this question too easy and eliminate students’ need to think carefully, but I worry that the question would be too challenging without the hint.  I may use a multiple-choice version of this question on the final exam coming up after Thanksgiving.

I also wonder whether I should have asked students to produce a graph of one error probability versus the other for many different cut-off values in the decision rule.  I have not used R with my business students, but I could have asked them to use Excel.  I have in mind something like the following R code and graph:


The issue of trade-offs also arises with other introductory statistics topics. My students learned about confidence intervals recently.  Here’s a favorite question of mine: Higher confidence is better than lower confidence, right?  So, why do we not always use 99.99% confidence intervals? 

The answer is that higher confidence levels produce wider confidence intervals.  Higher confidence is good, but wider intervals are bad.  In other words, there’s a trade-off between two desirable properties: high confidence and narrow intervals.

With a confidence interval for a population proportion, how many times wider is a 99.99% confidence interval than a 95% confidence interval?  The critical values are z* = 1.960 for 95% confidence, z* = 3.891 for 99.99% confidence.  This means that a 99.99% confidence interval is 3.891 / 1.960 ≈ 1.985 times wider than a 95% confidence interval.

How can a researcher achieve the best of both worlds – high confidence and a narrow interval?  The only way to achieve both is to use a very large sample size.  What are some trade-offs associated with a very large sample size?  Selecting a very large sample requires much more time, effort, and expense.  Also, increasing the sample size come with diminishing returns: You must quadruple the sample size in order to cut the interval’s width in half.


I’m tempted, but have never dared, to ask students to write an essay about how they have assessed trade-offs when making a decision of their own.  The COVID-19 crisis, which we are all trying to navigate as best we can, involves many, many trade-offs.  My students had to weigh trade-offs in deciding whether to live on campus this term, and they’ll have to do so again as they decide where to live next term.  They may have to evaluate trade-offs in deciding whether to go to their grandparents’ house for Thanksgiving dinner.  If those topics are too personal, they could also write about much less serious trade-offs, perhaps about their strategy for playing a board or card game, or deciding whether to have a salad or a cheeseburger for lunch. I could also invite students to write about trade-offs from another’s perspective, such as a mayor deciding whether to open or close schools during the COVID-19 crisis, or whether a football coach should “go for it” on fourth down*.

* This article (here) describes how analytics has led some football coaches to revise their conservative strategy on this question.


As you know, I was distracted by watching a golf tournament as I began writing this blog post.  While I was watching golf, I was also thinking about trade-offs.  Golfers have long debated whether it’s better to strive for distance or accuracy as the more important goal.  The trade-off is that you can hit the ball farther if you’re willing to sacrifice some accuracy.  On one hand, hitting the ball farther means that you’ll hit your next shot from closer to the hole.  But by giving up some accuracy, you’ll more often have to hit your next shot from the rough rather than from the fairway, so you’ll have less ability to control where your next shot goes.  On the other hand, prioritizing accuracy means achieving less distance.  With more accurate shots, you’ll more often hit your next shot from the smooth fairway where you can control your next shot better, but you’ll be farther from the hole and therefore diminish your chance to hit the next shot close.  Much of golf strategy involves navigating this trade-off.  The recent push toward analytics in sports has extended to golf, where statisticians gather and analyze lots of data to help players make decisions*.

* This article (here) from last week by golf writer Alan Shipnuck summarizes some of these developments.

And now, if you will excuse me, since the golf tournament is over and I have (ever-so-nearly) finished writing this blog post, I need to check on how my fantasy football team, the Domestic Shorthairs, is doing this week.

#71 An SBI quiz

During this past week, I introduced my students to simulation-based inference (SBI), as described in post #12, here.  I gave a follow-up quiz in our learning management system Canvas to assess how well they could apply what they learned to a study that they had not yet seen.  I give three of these application quizzes in a typical week, along with three quizzes that assess how well they followed the handout that we worked through*.  All of these quizzes consist of five questions that are auto-graded in Canvas.  I regard these quizzes as formative rather than summative, and I encourage students to help each other on the quizzes.

* Students could work through the handouts completely on their own, but most students either attend a live zoom session, during which I lead them through the handout, or watch videos that I prepare for each handout**. 

** For those of you who read about my ludicrous, comedy-of-errors experience with recording my first video (post #63, here), I am happy to report that I have recorded 83 more videos for my students since then.  Not many have gone smoothly, but all have gone much more smoothly than my first feeble attempts.

Writing auto-graded quizzes is a new experience for me.  For this blog post, I will present my auto-graded SBI quiz questions, describe my thinking behind each question, and discuss common student errors.  I will also discuss some questions that I did not ask, and I may very well second-guess some of my choices.  The quiz questions appear below in italics.


For the context in this quiz, I use a study that I described in an exam question presented in post #22, here.

Researchers presented young children (aged 5 to 8 years) with a choice between two toy characters who were offering stickers.  One character was described as mean, and the other was described as nice.  The mean character offered two stickers, and the nice character offered one sticker.  Researchers wanted to investigate whether children would tend to select the nice character over the mean character, despite receiving fewer stickers.  They found that 16 of the 20 children in the study selected the nice character.

1. What values would you enter for the inputs of a coin-tossing simulation analysis of this study?

  • Probability of heads
  • Number of tosses
  • Number of repetitions

I used the matching format in Canvas for this question.  The options presented for each of the three sub-parts were: 0.5, 0.8, 1, 10, 16, 20, and 10,000.  The correct answers are 0.5, 20, and 10,000, respectively.

As the sample proportion of children who selected the nice character, the value 0.8 makes a good option for the probability of heads.  Some students believe that the simulation is conducted with the sample value rather than the null-hypothesized value.  I chose the value 16 as a good distractor for the number of tosses, because it is the number of children in the sample who selected the nice character.  I threw in the values 1 and 10 for good measure.


2. Consider the following graph of simulation results:

Based on this graph, which of the following is closest to the p-value?

The options presented were: 0.005, 0.100, 0.500, 0.800.  I had to keep these options pretty far apart, because we cannot determine the p-value very precisely from the graph.

Students are to realize that the p-value is approximated by determining the proportion of repetitions that produced 16 or more heads.  Although we cannot approximate the p-value very accurately from this graph, we can see that obtaining 16 or more heads did not happen very often.  The closest option is 0.005.

I considered asking students to use an applet (here) to conduct a simulation analysis for themselves and report the approximate p-value.  But I wanted this question to focus on whether they could read a graph of simulation results correctly.

I also thought about asking students to indicate how to determine an approximate p-value from the graph of simulation results.  The correct answer would have been: count the number of repetitions that produce 16 or more heads, and then divide by the number of repetitions.  Some obvious incorrect options could have been to count the repetitions that produced 10 or more heads, or to count the number of repetitions that produced exactly 10 heads.  Perhaps that would have been better than the version I asked.  I am a bit concerned that some students might have answered my question correctly simply be selecting the smallest option presented for the p-value.  On the other hand, one of the two examples presented in the handout led to a large p-value close to 0.5, so I hope my students do not necessarily think that the smallest p-value will always be the correct answer.


3. Based on this simulation analysis, do the data from this study provide strong evidence that children have a genuine preference for the nice character with one sticker rather than the mean character with two stickers?  Why?

The options presented were:

  • Yes, because it is very unusual to obtain 16 or more heads
  • Yes, because the distribution follows a bell-shaped curve
  • Yes, because the distribution is centered around 10
  • No, because it is very unusual to obtain 16 or more heads
  • No, because the distribution follows a bell-shaped curve
  • No, because the distribution is centered around 10

I like this one.  This question directly addresses the reasoning process of simulation-based inference.   The correct answer is the first one listed here.  I think the distractors are fairly tempting, because some students focus on the shape or center of the distribution, rather than thinking about where the observed result falls in the distribution.  Those misconceptions are common and important to address.

You could fault me, I suppose, for not adding if the children actually had no preference after it is very unusual to obtain 16 or more heads to the end of the correct answer.  But I think omitting that from all of the options kept the question reasonable.  In hindsight perhaps I should have written the correct answer as: Yes, because the simulation rarely produced 16 or more heads.


4. The following graph pertains to the same simulation results, this time displaying the distribution of the proportion of heads:

Calculate the z-score for the sample proportion of children in the study who selected the nice character with one sticker.  Report your answer with one decimal place of accuracy.

This question calls for a numerical answer rather than multiple-choice.  The correct answer is: z = (.800 – .500) / .111 ≈ 2.7*.  I allowed an error tolerance of 0.05 for the auto-grading process, so as not to penalize students who ignored my direction to use one decimal place of accuracy in their answer.

* My students have not yet studied the general expression for the standard deviation of the sampling distribution of a sample proportion, so their only option is to use the standard deviation of the 10,000 simulated sample proportions, as report in the output.

This z-score calculation is not directly related to simulation-based inference, I suppose.  But I think z-scores are worth emphasizing*, and this also foreshadows the one-proportion z-test to come.

* See post #8, End of the alphabet, here.


5. Suppose that the study had found that 13 of 20 children selected the nice character with one sticker.  How would the p-value have changed, as compared to the actual result that 16 of 20 children selected that character?

The options presented here were: larger, smaller, no change.  The correct answer is larger*, because the p-value would entail repetitions that produced 13 or more heads, which will certainly be more than those that produced 16 or more heads.

* You may have noticed that I have always presented the correct answer first in this post, but Canvas shuffled the options for my students, so different students saw different orderings.

I considered asking how the strength of evidence would change, rather than how the p-value would change.  It’s certainly possible for a student to answer the p-value question correctly, without making the connection to strength of evidence.  But it’s also possible that a student could correctly answer about strength of evidence without thinking through what that means for the p-value.  In hindsight, I wish that I had asked both versions in one question, like this:

Suppose that the study had found that 13 of 20 children selected the nice character with one sticker.  How would the p-value have changed, as compared to the actual result that 16 of 20 children selected that character, and how would the strength of evidence that children genuinely prefer the nice character have changed?  [Options: larger p-value, stronger evidence; larger p-value, weaker evidence; smaller p-value, stronger evidence; smaller p-value, weaker evidence]


As I mentioned earlier, I confine myself to asking five questions on every quiz.  I like this consistency, and I hope students appreciate that too.  But I feel no such constraint with blog posts, so now I will present five other questions that I could have asked on this quiz, all based on the same study about children selecting toy characters.

6. What are the observational units and variable in this study?  I ask these questions very often in class*, and I also ask them fairly often on assessments.  This might have worked well in matching format, with options such as: children, toy characters, which character a child selected, number of children who selected nice character, proportion of children who selected nice character.

* See post #11, Repeat after me, here.

7. Which of the following describes the null model/hypothesis?  Options could have included:

  • that children have no genuine preference between these two characters,
  • that infants genuinely prefer the nice character with one sticker to the mean character with two stickers,
  • that 80% of all infants prefer the nice character with one sticker.

8. Which of the following graphs is based on a correct simulation analysis?

9. What does the p-value represent in this study?  Options could have included:

  • the probability that 16 or more children would have selected the nice character, if in fact children have no genuine preference between the two characters
  • the probability that 10 children would have selected the nice character, if in fact children have no genuine preference between the two characters
  • the probability that 10 children would have selected the nice character, if in fact children have a genuine preference for the nice character
  • the probability that children have no genuine preference between the two characters

10. How would the p-value change if the study had involved twice as many children, and the same proportion had selected the nice character with one sticker?  The options would be: smaller, larger, no change. Students would have needed to use the applet on this question, or else relied on their intuition, because we had not yet investigated the effect of sample size on p-value or strength of evidence.

The correct answers for these additional questions are: 6. children, which character a child selected; 7. no genuine preference; 8. the graph on the right, centered at 10 with a normal-ish shape; 9. the first option presented here; 10. smaller.


Confining myself to auto-graded questions on quizzes* is a new experience that requires considerable re-thinking of my assessment questions and strategies.  In this post I have given an example of one such quiz, on the topic of simulation-based inference.  I have also tried to provide some insights into my thought process behind these questions and the various answer options for multiple-choice ones.  I have also indicated some places where I think in hindsight that I could have asked better questions.

* Not all aspects of my students’ work are auto-graded.  I assign occasional investigation assignments, like the batch testing investigation that I wrote about in my previous blog post here, for which I provide a detailed rubric to a student grader.  On exams, I use a mix of auto-graded and open-ended questions that I grade myself, as I discussed in this post #66, First step of grading exams, here.

P.S. The study about children’s toy character selections can be found here.

#70 Batch testing, part 2

I recently asked my students to analyze expected values with batch testing for a disease, which I discussed in some detail in post #39, here.  Rethinking this scenario led me to ask some new questions that I had not asked in that earlier post.

I will first re-introduce this situation, present the basic questions and analysis that my students worked through, and then ask the key question that I wish I had asked previously.  If you’d like to skip directly to the new part, scroll down to the next occurrence of “key question.” As always, questions that I pose to students appear in italics.


Suppose that 12 people need to be given a blood test for a certain disease.  Assume that each person has a 10% chance of having the disease, independently from person to person.  Consider two different plans for conducting the tests:

  • Plan A: Give an individual blood test to each person.
  • Plan B: Combine blood samples from all 12 people into one batch; test that batch.
    • If at least one person has the disease, then the batch test result will be positive, and then all 12 people will need to be tested individually.
    • If nobody has the disease, then the batch test result will be negative, and no additional tests will be needed.

Let the random variable X represent the total number of tests needed with plan B (batch testing).

a) Determine the probability distribution of X. [Hint: List the possible values of X and their probabilities.]

Even with the hint, some of my students were confused about where to begin, so I tried to guide them through the implications of the two sub-bullets describing how batch testing works.

The possible values of X are 1 (if nobody has the disease) and 13 (if at least one person has the disease).  The probabilities are: Pr(X = 1) = Pr(nobody has the disease) = (.9)12 ≈ 0.2824 by the multiplication rule for independent events, and Pr(X = 13) = 1 – Pr(nobody has the disease) = 1 – (.9)12 ≈ 0.7176.  This probability distribution can be represented in the following table:

b) If you implement plan B once, what is the probability that the number of tests needed will be smaller than it would be with plan A?

This question really stumps some students.  Because plan A always requires 12 tests, the answer is simply: Pr(X < 12) ≈ 0.2824.  My goal is for students to realize that batch testing reduces the required number of tests only about one-fourth of the time, so this criterion does not reveal any advantage of batch testing.  Maybe I need to ask the question differently, or ask a different question altogether, to direct students’ attention to this point.

c) Determine the expected value of X.

This calculation is straightforward: E(X) = 1(.9)12 + 13(1 – .912) ≈ 9.61.

d) Interpret what this expected value means in this context.

My students quickly realize that I want them to focus on long-run average when they interpret expected value (see post #18, here).  But a challenging aspect of this is to describe what would be repeated a large number of times.  In this case: If the batch testing plan were applied for a very large number of groups of 12 people, then the long-run average number of tests needed would be very close to 9.61 tests.

e) Which plan – A or B – requires fewer tests, on average, in the long run?

Maybe I should have asked this differently, perhaps in terms of choosing between plan A and plan B.  The answer is that plan B is better in the long run, because it will require about 9.61 tests on average, compared to 12 tests with plan A.


Now consider a third plan:

  • Plan C: Randomly divide the 12 people into two groups of 6 people each.  Within each group, combine blood samples from the 6 people into one batch.  Test both batches.
    • As before, a batch will test positive only if at least one person in the group has the disease.
      • Any batch that tests positive requires individual testing for the 6 people in that group.
    • As before, a batch will test negative if nobody in the group has the disease. 
      • Any batch that tests negative requires no additional testing.

Let the random variable Y represent the total number of tests needed with plan C (batch testing on two sub-groups).

f) Determine the probability distribution of Y.

Analyzing plan C is more challenging than plan B, because there are more uncertainties involved.  I advise my students to start with the best-case scenario, proceed to the worst-case, and finally tackle the remaining case. The best case is that only 2 tests are needed, because nobody has the disease. The worst case is that 14 tests are needed (the original 2 batch tests plus 12 individual tests), because at least one person in each sub-group has the disease. The remaining case is that 8 tests are needed, because at least one person in one sub-group has the disease and nobody in the other sub-group has the disease.

The most straightforward probability to determine is Pr(Y = 2), because this is the probability that none of the 12 people have the disease.  This equals (.9)12 ≈ 0.2824, just as before.

The second easiest probability to calculate is Pr(Y = 14), which is the probability that both sub-groups have at least one person with the disease.  This probability is [1 – (.9)6] for each sub-group.  The assumption of independence gives that Pr(Y =14) = [1 – (.9)6]2 ≈ 0.2195.

At this point we could simply determine Pr(Y = 8) = 1 – Pr(Y = 2) – Pr(Y = 14) ≈ .4980.  But I encouraged my students to try to calculate Pr(Y = 8) directly and then confirm that the three probabilities sum to 1, as a way to check their work.  To do this, we recognize that Y = 8 when one of the sub-groups has nobody with the disease and the other sub-group has at least one person with the disease.  A common error is for students to neglect that there are two ways for this to happen, because either sub-group could be the one that is disease-free.  This gives: Pr(Y = 8) = 2 × [1 – (.9)6] × (.9)6 ≈ .4980.

The probability distribution of Y can therefore be represented in this table:

g) Determine the expected value of Y.

This calculation is straightforward: E(Y) = 2(.2824) + 8(.4980) + 14(.2195) ≈ 7.62 tests.

h) Write a sentence or two summarizing your findings, with regard to an optimal plan for minimizing how many tests will be needed in the long run.

Students who correctly determined the expected values realize that the best of these three plans is Plan C.  If this procedure is applied for a very large number of groups, then Plan C will result in an average of about 7.62 tests per group of 12 people.  This is smaller than the average number of tests needed with Plan B (9.61) or Plan A (12.00).


Now comes the key question that I did not address in my earlier post about batch testing: Can we do even better (in terms of minimizing the average number of tests needed in the long run) than using 2 sub-groups of 6 people?  I chose the number 12 here on purpose, because it lends itself to several more possibilities: 3 sub-groups of 4, four sub-groups of 3, and six sub-groups of 2.

We can imagine groans emanating from our students at this prospect.  But we can deliver them some good news: We do not need to determine the probability distributions for the number of tests in all of these situations.  We can save ourselves a lot of bother by solving one general case and then using properties of expected values.

i) Let W represent the number of tests needed when an arbitrary number of people (n) are to be tested in a batch.  Determine the probability distribution of W and expected value of W, as a function of n.

The possible values are simply 1 and (n + 1).  We can calculate Pr(W = 1) = Pr(nobody has the disease) = .9n.  Similarly, Pr(W = n + 1) = Pr(at least one person has the disease) = 1 – .9n.  The expected value is therefore: E(W) = (1 × .9n) + (n + 1) × (1 – .9n) = n + 1 – n(.9n).  This holds when n ≥ 2.

j) Confirm that this general expression gives the correct expected value for n = 12 people.

I encourage my students to look for ways to check their work throughout a complicated process. Plugging in n = 12 gives: E(W) = 12 + 1 – 12(.912) ≈ 9.61 tests. Happily, this is the same value that we determined earlier.

k) Use the general expression to determine the expected value of the number of tests with a batch of n = 6 people. 

This gives: E(W) = 6 + 1 – 6(.96) ≈ 3.81 tests

l) How does this compare to the expected value for plan C (dividing the group of 12 people into two sub-groups of 6) above?  Explain why this makes sense.

This question holds the key to our short-cut. This expected value of 3.81 is equal to one-half of the expected number of tests with plan C, which was 7.62 tests.  This is not a fluke, because we can express Y (the total number of tests with two sub-groups of 6) as Y = Y1 + Y2, where Y1 is the number of tests with the first sub-group of 6 people, and Y2 is the number of tests with the second sub-group of 6 people.  Properties of expected value then establish that E(Y1 + Y2) = E(Y1) + E(Y2).

This same idea will work, and save us considerable time and effort, for all of the other sub-group possibilities that we mentioned earlier.

m) Determine the expected value of the number of tests for three additional plans: three sub-groups of 4 people each, four sub-groups of 3 people each, and six sub-groups of 2 people each.  [Hint: Use the general expression and properties of expected value.]

With a sub-group of 4 people, the expected number of tests with one sub-group is: 4 + 1 – 4(.94) ≈ 2.3756.  The expected value of the number of tests with three sub-groups of 4 people is therefore: 3(2.3756) ≈ 7.13 tests.

With a sub-group of 3 people, the expected number of tests with one sub-group is: 3 + 1 – 3(.93) ≈ 1.813.  The expected value of the number of tests with four sub-groups of 3 people is therefore: 4(1.813) ≈ 7.25 tests.

With a sub-group of 2 people, the expected number of tests with one sub-group is: 2 + 1 – 2(.92) = 1.38.  The expected value of the number of tests with six sub-groups of 2 people is therefore: 6(1.38) = 8.28 tests.

n) Write a paragraph to summarize your findings about the optimal sub-group composition for batch-testing in this situation.

The following table summarizes our findings about expected values:

With a group of 12 people, assuming independence and a disease probability of 0.1 per person, the optimal sub-group composition is to have 3 sub-groups of size 4 people each.  This produces an expected value of 7.13 for the number of tests to be performed.  This is 39.6% fewer tests than the 12 that would have to be conducted without batch testing.  This is also 24.5% fewer tests than would be performed with just one batch.  (See post #28, here, for my pet peeve about misconceptions involving percentage differences.)


Let’s conclude with two more extensions of this batch testing problem:

o) How do you predict the optimal sub-group composition to change with a smaller probability that an individual has the disease?  Change the probability to 0.05 and re-calculate the expected values to test your prediction.

It makes sense that larger sub-groups would be more efficient with a more rare disease.  With p = 0.05, we obtain the following expected values for the total number of tests:

In this case with a more rare disease (p = 0.05), the optimal strategy is to divide the 12 people into two groups of 6 people each.  This results in 5.18 tests on average in the long run.

p) How would the optimal sub-group composition change (if at all) if there were twice as many people (24) in the group?

We can simply double the expected values above.  We also have new possibilities to consider: three sub-groups of size 8, and two sub-groups of size 12.  For the p = 0.05 case, this produces the same optimal sub-group size as before, 6 people per sub-group, as shown in the following table of  expected values:


Batch testing provides a highly relevant application of expected values for discrete random variables that can also help students to develop problem-solving skills. Speaking of relevance, you may have noticed that COVID-19 and coronavirus did not appear in this post until now.  I did not want to belabor this connection with my students, but I trust that they could not help but recognize the potential applicability of this technique to our current challenges.  I also pointed my students to an interactive feature from the New York Times here, an article in the New York Times here, and an article in Significance magazine here.

P.S. I recorded a video presentation of this batch testing for the College Board, which you can find here.

#69 More probability questions – correction

I often tell my students that I make mistakes in class on purpose as a teaching strategy, to encourage them to pay close attention, check my work regularly rather than simply copy what I say into their notes, and speak up when they notice something that they question.

This is partially true, but most of the mistakes that I make in class are, of course, genuine ones rather than purposeful.  I admit that I sometimes try to bluff my way through, with tongue firmly planted in cheek, claiming that my mistake had been intentional, an application of that teaching strategy.

Thanks very much to the careful blog reader who spotted a mistake of mine in today’s post.  In a follow-up discussion to the first example, I wrote: If the marginal percentages had been 28% and 43%, then the largest possible value for the intersection percentage would have been 28% + 43% = 71%.  This is not true, because the intersection percentage can never exceed either of the marginal percentages.  With marginal percentages of 28% and 43%, the largest possible value for the intersection percentage would be 28%. 

Perhaps I was thinking of the largest possible percentage for the union of the two events, which would indeed be 28% + 43% = 71%.  Or perhaps I was not thinking much at all when I wrote that sentence.  Or perhaps, just possibly, you might be so kind as to entertain the notion that I made this mistake on purpose, as an example of a teaching strategy, which I am now drawing to your attention?