#90 Two more final exam questions
As they prepare for a final exam, I always advise my students to try to focus on the big picture rather than small details. I’m pretty sure that they find this advice to be unsatisfying, perhaps worthless. I don’t think they know what I mean when I say to focus on the big picture. I also admit that this is much easier said than done.
I just gave a final exam to my students, as the Winter quarter has now ended at Cal Poly*. I think I asked some final exam questions that succeed at focusing on the big picture. I will present and discuss two such free-response questions here. As always, questions that I pose to students appear in italics.
* Well, perhaps I should clarify that the Winter quarter has ended for students, but it continues for faculty like me who still have final exams to grade and course grades to assign.
My students were randomly assigned to receive one or the other of these two versions:
1a. Suppose that a friend of yours says that they were reading about confidence intervals, and they encountered the symbols x-bar and mu (μ). How would you respond if they ask: What’s the difference between what these symbols represent, and what does that have to do with confidence intervals?
1b. Suppose that a friend of yours says that they were reading about confidence intervals, and they encountered the symbols p-hat and pi (π). How would you respond if they ask: What’s the difference between what these symbols represent, and what does that have to do with confidence intervals?
My goal here was to assess whether students could provide a big-picture overview of the distinction between parameter and statistic, along with explaining how that distinction relates to the topic of confidence intervals. I’m fairly pleased with how this question turned out.
Before I continue, let me say that students were allowed to use their notes and my handouts on this exam. This is not a new policy of mine related to the pandemic and remote teaching; I have used open-notes exams for a long time. It’s also possible, of course, that some students also performed google searches during my unproctored final exam.
As I’m sure you can imagine, many students copied sentences directly from their noted or my handouts into their response. As you can also imagine, this question was not a routine one to grade. The grading went fairly smoothly, though, once I settled on the four things that I would look for:
- that p-hat/x-bar represents sample proportion/mean;
- that pi/mu represents population proportion/mean;
- that the goal of a confidence interval is to estimating the unknown value of pi/mu with a high level of confidence;
- that the confidence interval uses p-hat/x-bar as its midpoint and then extends a certain amount on either side of that midpoint.
Each of these four aspects was worth one point. The first two of these should have been easy points. Most students earned these points successfully, but some did not. For example, one student wrote that p-hat represents a population proportion and pi represents a population mean.
For the third component, I awarded a half-point for conveying the idea that a confidence interval estimates the value of pi/mu. The word “estimates” was not needed for this half-point. Many students earned this half-point with fairly loose language such as “the confidence interval is for mu.” The other half-point was for communicating the idea that the value of the parameter is unknown, or estimated with a high level of confidence. This half-point proved elusive for many students.
Students could earn a half-point for the fourth component by saying that the confidence interval is calculated from the value of the statistic. The response needed to mention the midpoint, which most responses failed to do, in order to earn full credit.
I had also wanted to insist upon a fifth aspect for full credit. I had hoped that strong responses would say something about “proportion of the sample having a characteristic of interest” or “sample mean value for the variable of interest.” But very few responses included something along these lines, so I decided against requiring it.
I was skeptical about whether this question would provide helpful information about students’ understanding, but I decided that it worked well. Grading the question was not easy, but I think the four aspects described above provided a good rubric. When I use a variation of this question again, I might explicitly say not to use formulas as part of the response, and I also might say that responses should be limited to 3-5 sentences.
Here is one of six versions of another question on my students’ final exam:
2a. Suppose that the manager of a Walmart store collects data on the following variables for a random sample of transactions/receipts at the store:
- Total amount spent
- Number of items purchased
- Day of week
- Time of day (morning, afternoon, evening)
- Payment type (credit card, cash, other)
a) State a research question that could be addressed by applying analysis of variance (ANOVA) to (some of) the data.
b) State an additional variable for which data could be collected, and classify it as categorical or numerical.
Two other versions presented similar scenarios followed by the same questions (a) and (b):
2b. Suppose that a restaurant manager collects data on the following variables for a random sample of parties who dine at the restaurant:
- Total amount spent on meal
- Time of day (breakfast, lunch, dinner)
- Day of week
- Amount spent on drinks
- Number of people in the party
- Number of children (younger than age 18) in the party
2c. Suppose that a hotel manager collects data on the following variables for a random sample of customers’ stays at the hotel:
- Number of people staying in the room
- Distance from their home
- Total amount spent at the hotel during the stay
- Type of reservation (online, telephone, none)
- Day of week on which stay began
The other three versions arose by repeating the same scenarios and variables, but with simple linear regression replacing ANOVA as the procedure in in part (a).
I often give my students practice with identifying which procedure is the relevant one to address a particular research question. In fact, we spent the last day of class this term doing nothing else, as we discussed 15 questions for which my students were to identify the appropriate analysis procedure. I always tell my students that the key to identifying the correct procedure is to identify the variables and their types.
This final exam question asks students to do the opposite: state a research question for which a particular procedure would be appropriate. The same key applies here. For example, students needed to realize that ANOVA applies when the explanatory variable is categorical and the response variable is numerical. With that in mind, a reasonable answer for part (a) of version 2a is: “do Walmart customers tend to spend different amounts on their transaction, on average, depending on whether they shop in the morning, afternoon, or evening?”
Coming up with a research question is often challenging for students. I made it easier this time by presenting many variables to them. I suspect that part (a) of this question would have been substantially harder if students had needed to think of variables for themselves.
Part (b) is meant to be fairly easy, but some students struggle with the ideas of observational units and variables despite my emphasizing those ideas frequently. Two common, correct answers for the Walmart scenario have been:
- the amount of time spent in the store prior to completing the transaction, which is numerical
- whether the transaction was completed with a cashier or self-service, which is categorical
This question was worth four points, two points for each part. Students generally did very well on this question. I graded fairly strictly; incorrect responses received zero points. For example, an answer of “does payment type help to predict total amount spent?” for the regression version of the question earned zero points, because the explanatory variable given is categorical, not numerical. Examples of incorrect responses for part (b) often followed from mis-understanding the observational units, such as “how many customers shopped at Walmart that day?” and “what part of the country was the Walmart located in?”
For essentially correct responses with poor or unclear wording, I deducted a half-point. For example, some students answered the regression version of part (a) with: “what is the correlation between number of items and total amount paid?” I deducted a half-point for this response, on the grounds that there’s a lot more to regression than calculating the value of a single statistic. I also deducted a half-point for using causal language inappropriately, for example by answering the ANOVA version of part (a) with: “does type of payment affect total amount spent?”
In hindsight, I wish that I had worded these questions a bit more clearly myself. I should have been more clear that responses to part (a) were to be based only on the variables that I presented. Part (b) could have been more clear by specifying that the variable proposed needed to be based on the same observational units as the ones presented.
I provide my students with practice questions before midterm exams but not for the final exam, mostly because I try to keep final exam questions secure. But I might consider providing these questions to students before the final exam in the future, to help them understand my advice about focusing on the big picture. The drawback is that I’ll then have to come up with new and better questions to use on the final exam.
I’ve used many of your questions on my own assignments. I’m always afraid that students will Google and find YOUR BLOG. I wonder if your students might do the same?
Great point, thanks. I’ve wondered that too, but I have not seen any evidence that my students have searched my blog posts during exams. If they did, I might be more impressed than alarmed. 🙂
Are you being to picky in insisting on “midpoint” for full credit in rubric item #4 for question 1? If students are using a bootstrap CI with a percentile method, the sample statistic might not be the midpoint of the interval.
I was strict about this, because my students have only seen the conventional (Wald) interval, for which p-hat is the midpoint of the CI. My main goal is for students to say more than “p-hat is used to calculate the CI.” I wanted them to acknowledge that p-hat is also a point estimate (although I did not expect my students to use that term) of pi, not just another component of the CI calculation, such as the sample size n.