#89 An exam question
It’s hard to imagine a more boring title for a blog post, isn’t it? I’m going to present an open-ended, five-part exam question that I used in the past week. I will describe my thought process behind writing and grading the question, and I will discuss what I learned from common student responses. I think the question turned out to be quite revealing, so I hope that this post will turn out to be less boring that its title and first paragraph.
This was my third exam of the term. I was not entirely pleased with how the first two exams worked out. In hindsight the first exam was too hard, the second one too easy. I was really hoping for a Goldilocks result (just right) for the third exam. It can be quite challenging to write and grade exams, and assessments in general, that distinguish between students with a very thorough understanding of fundamental ideas from those with a modest level of understanding.
The topic of this exam question is multiple regression. I do not teach this topic very often, so I have not developed a large bank of questions that I like to pose. Also, I am less aware of common student misunderstandings than I am with more introductory topics. I spent a lot of time writing this exam, and now I am taking a break from grading it* to write this post.
* In post #66 (here), I proposed that the first step of grading exams is: Procrastinate!
This question is based on the same dataset about prices for pre-owned Prius cars that I described in post #86 (here). My students should have been familiar with the dataset from the assignment that I described in that post. But in that assignment I asked students to predict a car’s price based on a single variable: its age in years, or its number of miles driven. For this exam question, I asked students to consider a multiple regression model for predicting price from both age and miles.
The exam question presented students with output but did not provide the datafile or ask students to analyze the data themselves. Students were allowed to use their notes on the exam. Here’s the background and output for the question:
Consider the following output from a multiple regression model that uses both age (in years) and number of miles driven to predict price (in dollars), based on a sample of 32 pre-owned Prius cars advertised for sale in February of 2021:
Now I will present and discuss one part of the question at a time. The entire question was worth 10 points, on a 40-point exam. Each of the five parts was worth 2 points.
a) Write out the regression equation for predicting price from the two predictor variables.
This is as basic as it gets, right? I would not quite consider this part as free points, but I intended this to provide two easy points to students who simply learned how to read computer output well enough to express a regression equation. The correct answer is: predicted price = 22,076.66 – 0.0619 × miles – 579.21 × age.
Most, but not all, of my students earned full credit for this part. The most common error surprised me a bit: neglecting to include the left-hand side of the equation. Several students only wrote: 22,076.66 – 0.0619 × miles – 579.21 × age. I don’t like to be a stickler for mathematical notation, but omitting the response variable strikes me as failing to communicate that the goal of this regression analysis is to predict price. I deducted a half-point for this error.
A few students wrote: The regression equation = 22,076.66 – 0.0619 × miles – 579.21 × age. I also deducted a half-point for this, because of the missing response variable. But I did give full credit to responses that included a colon rather than an equal sign: predicted price: 22,076.66 – 0.0619 × miles – 579.21 × age.
I usually insist on using the word predicted or a carat (“hat”) symbol with the response variable, but this time I did not deduct a half-point for omitting that.
b) Identify and interpret the value of the residual standard error.
Almost all students identified the correct value in the output, the root mean square error value of 1841.708. A few students mistakenly answered the standard error of the intercept term, 651.9749.
I was looking for an interpretation along the lines of: A typical predicted price from this model differs from the actual price of a Prius in this sample by about $1841.71. I realize that “typical” is a vague word, but using a more precise word like “average” is not technically correct. I did award full credit to students who used “average” or “on average” in their response, though.
This question makes me worry that I am rewarding students simply for copying phrases from their notes without thinking. (In fact, one student expressed the interpretation in terms of age and number of bidders for an auction of grandfather clocks, which was one of the examples we had worked through in class.) But I hope that students demonstrate some understanding by selecting the correct interpretation and also by revising the generic interpretation to fit the context.
Some students mistakenly said that the residual standard error is a typical amount by which the regression line deviates from predicted values. I did not penalize them for referring to a line instead of a plane, but I did deduct a half-point for not talking about deviations from the actual prices.
A few students did not use the measurement units (dollars) in their interpretation. I only deducted a half-point once if they also failed to mention dollars in their response to part (c).
I did not ask for an interpretation of R2 in this question, only because I asked for that on the previous exam that included simple linear regression among its topics.
c) Interpret what the value -579.2092 means in this context.
This is the coefficient of the age variable. I was looking for students to say something like: The predicted price of a Prius decreases by about $579.21 for each additional year of age on the car, after accounting for number of miles driven. Another version is: Among pre-owned Prius cars with the same number of miles, we predict the price to decrease by about $579.21 for each additional year of age.
Many students neglected to include the caveat about accounting for the number of miles driven. This is the key difference between interpreting coefficients with multiple versus simple regression. Such responses earned 1 of 2 points. Some gave a more generic interpretation that mentioned accounting for all other variables. I deducted a half-point for this, on the grounds that this response did not describe context fully.
A few students did not include direction (decreased) in their interpretation, and some did not express the “for each additional year of age” part of the interpretation clearly. Each of these errors earned a half-point deduction.
d) JMP produced the following under “Mean Confidence Interval” with a setting of 95%, for input values of 5 and 50,000: ($15,321, $16,854). Interpret what this interval means.
I really wrestled with how to word this question. My main goal was to assess whether a student can distinguish between a confidence interval for a mean, as opposed to a prediction interval for an individual observation. I worried that I was giving too much away by using the word mean in my statement about the output. But I couldn’t figure out how else to identify which confidence interval I was providing.
I need not have worried. Many of my students interpreted this interval in terms of the price of an individual car. Such a response earned 1 of 2 points, if the other components of the response were correct. Of course, I don’t know whether such responses indicated a lack of understanding or simply poor communication by omitting the word mean or average. Needless to say, there’s a big difference between an average and an individual value. I regret that so many of my students failed to answer this part correctly, but this is a big idea that is worthy of assessing, so I’m glad that I asked the question.
When writing this part of the question, I also struggled with how to express that the confidence interval was generated for 5-year-old cars with 50,000 miles. At the last minute, I decided to make that wording more vague by simply referring to input values of 5 and 50,000. I figured that I could reasonably expect students to realize that 5 referred to age in years, and 50,000 pertained to miles.
I’m glad that I made this change, because it revealed that some of my students did not understand that these were inputs for two different predictor variables. A few responses talked about cars with between 5 and 50,000 miles.
I was surprised by a somewhat common error in which students did not refer to the input values at all. Several responses interpreted the interval as estimating the population mean price of all pre-owned Prius cars listed for sale online in February 2021, with no regard for the car’s age or number of miles.
A few students made clear that they thought the interpretation applied to a sample mean rather than a population mean. I only deducted a half-point for this error, if the rest of the interpretation was fine, because they at least recognized that the interval estimates a mean rather than an individual value.
e) How would you respond to someone who says: “Age and miles must be related to each other, because older cars have been driven for more miles than newer cars. Therefore, it’s not necessary or helpful to include both age and miles together in a model for predicting price.”
This is, by far, my favorite part of this question. I think this gets addresses a very important aspect of multiple regression analysis: investigating whether including an additional variable is worthwhile to include in a model.
I wanted students to notice that individual t-tests for both predictor variables produce very large (in absolute value) test statistics and therefore very small p-values: t = -5.77 and t = -4.83 for miles and age, respectively, with p-values considerably less than 0.0001. Those test results reveal that each variable is helpful to include in the model, even in the presence of the other. Even though age and miles may very well be positively correlated, the individual t-tests reveal that both variables are worth including in the model for predicting a car’s price.
Again I struggled mightily with how to word this part of the question. In particular, I debated with myself about whether to prompt students to refer to output in their response. As you can see, I decided against including that, and I’m glad that I did.
Many students did not refer to output at all. I think it’s telling that these students opted to rely on their own impressions of the context rather than look at what the data revealed. In the words of David Moore: Data beat anecdotes. I’m pleased that I asked this question in a way that assessed whether students would look to data, rather than their own opinions or suppositions, to answer the question. I graded this fairly harshly: Students who did not refer to output could only earn 0.5/2 points for this part.
Some students used this part of the question to remind me of some things I had said in class. For example, several repeated my comment that fitting regression models is an inexact science, and a few cited George Box’s famous saying: All models are wrong; some models are useful. I’m glad that this saying made enough of an impression that some students wanted to write it on an exam, but I wish that they had instead looked at the data, as reflected in the output provided.
I suspect that few students have any idea how much time and thought goes into writing and grading exam questions. Speaking of which, I need to get back to grading the second question on this exam. I’ll spare you a 2000-word analysis* of that one.
* Actually, in case you are keeping track,, I believe that this post fell just short of containing 2000 words, until this sentence put it over the top.