# #77 Discussing data ethics

This guest post has been contributed by Soma Roy.  You can contact her at soroy@calpoly.edu.

Soma Roy is a colleague of mine in the Statistics Department at Cal Poly – San Luis Obispo. Soma is an excellent teacher and has been so recognized with Cal Poly’s Distinguished Teaching Award.  She also served as editor of the Journal of Statistics EducationI recently learned about some of Soma’s ideas for generating student discussions in online statistics courses, and I am delighted that she agreed to write this guest blog post about one such idea, which introduced students to data ethics.

The GAISE (Guidelines for Assessment and Instruction in Statistics Education) College Report (available here) recommends the use of real data with a context and purpose in statistics classes*. One of the ways I achieve this throughout the course, regardless of what statistics topic we are studying at the time, is by always using data (either in raw or summarized form) from research studies published in peer-reviewed journals.

* Just because the recommendation comes in the college report doesn’t mean that the advice couldn’t apply to K-12 classes.

For example, a study I use to motivate the comparison of means between two groups was conducted by Gendreau et al. and published in the Journal of Abnormal Psychology in 1972 (here). In this study, 20 inmates at a Canadian prison were randomly assigned either to be in solitary confinement or to remain non-confined (that is, have contact with others around them) for seven days. Researchers measured each inmate’s EEG alpha frequency on several days* in order to investigate the effect that sensory deprivation can have on one’s EEG alpha frequency**.

* The article provides data for the 20 inmates at three different time periods, but my students only analyze the data from the final (seventh) day of the experiment.

** Alpha waves are brain waves, the predominance of which is believed to indicate that the individual is in a relaxed but aware state. High frequency of alpha waves is considered to be better than low frequency of alpha waves (Wikipedia).

Without fail, one of the first things that students do when they read about this study is ask: How could they just put someone in solitary confinement? That becomes a jumping off point for our discussion on data ethics. This discussion covers the ethics of study design, data collection, data analyses, and publication of findings.

When the COVID-19 pandemic turned my in-person class into an online class, I decided to turn our brief, in-class discussion into an asynchronous, week-long discussion in our learning management system, Canvas. Borrowing from Allan’s style, the questions that I posted appear in italics, below, accompanied by short blurbs on what I was hoping to address with each of the questions, as well as some student responses and comments.

You have read about an experiment conducted on inmates of a Canadian prison, where 20 inmates were randomly split into two groups. One group of 10 inmates was placed in solitary confinement, and the other group was allowed to remain non-confined.

Are you as struck as I was the first time I read about this experiment, by how unethical and cruel this experiment was, in that people were randomly assigned to be placed in solitary confinement!?

Unfortunately, there have many, many experiments in the past that violated human rights. That realization has brought about the requirement for all research projects involving human subjects to be reviewed before any data can be collected.

This discussion is about the ethics to be considered when one decides to carry out a study with human subjects (specifically an experiment that involves manipulating treatment conditions), collect data, or analyze data and publish results from any study. The first few questions below focus on historical studies, while the next few questions in this discussion look into what the process is to propose and carry out human subjects studies, and also what are ethical practices when it comes to data analysis and publication of study results.

I hope that, going forward, this discussion helps you think critically about any studies that you may be involved in as a researcher, and keep in mind that (to borrow from the great American poet Maya Angelou) when we “know better, (we should) do better.”

For this discussion, you need to make two (2) posts:

Part 1: First, you will post a response to one of the questions (1) – (10) below. Be sure to copy and paste the question that you are responding to.

1. Google “Tuskegee Syphilis Study” – describe the study (year(s), methods, participants, objective, etc.). Why is it considered unethical? Cite your source(s). (e.g., Wikipedia link)

2. Google “US apologizes to Guatemalans, 1940s” – describe the study or studies conducted in the 1940s (year(s), methods, participants, objective, etc.). Why are the studies considered unethical? Cite your source(s). (e.g., Wikipedia link)

3. Google “Human Radiation Experiments in the US, 1940s” – describe the study or studies conducted in the 1940s and even later (year(s), methods, participants, objective, etc.). Why are the studies considered unethical? Cite your source(s). (e.g., Wikipedia link)

4. Google “Project Bluebird, Project Artichoke” – describe the study or studies (year(s), methods, participants, objective, etc.). Why are the studies considered unethical? Cite your source(s). (e.g., Wikipedia link)

5. Google “The Monster Study” – describe the study (year(s), methods, participants, objective, etc.). Why is the study considered unethical? Cite your source(s). (e.g., Wikipedia link)

6. Google “Brown eyes, Blue eyes experiment, Jane Elliot” – describe the study (year(s), methods, participants, objective, etc.). What was the objective of the study? Why do some people consider the study to be unethical? Cite your source(s). (e.g., Wikipedia link)

This first part of my discussion assignment requires students to read up about a particular historical study, identify some of the key elements such as what was the objective of the study, on whom was the study conducted, when it was conducted, how it was conducted, and why the study is considered unethical. Students are required to cite their sources.

All six of these studies have a plethora of information available from multiple reliable sources on the internet. My hope is that as students read about these studies, they will recognize the shortcomings in the study design – where the researchers went wrong in how they treated their subjects or how they recruited their subjects, or just who their subjects were. I also hope that students will recognize the need for an institutional review board (IRB), the need for informed consent, and the need to protect vulnerable populations.

The Tuskegee study, understandably the most infamous of the lot, draws the most outrage from students. Students find the experiment “crazy and insane,” “a great example of raging biases and racism,” and “lacking in decency.” Students are appalled that little to no information was shared with the participants, that a study that was supposed to last only 6 months lasted 40 years, and that even after penicillin was established to be a standard treatment for syphilis, it wasn’t administered to the participants. Students are saddened by the fact that the researchers abused the knowledge that the participants were impoverished by offering incentives such as free meals and free treatment for other ailments in return for their participation in the study.

Students have similar reactions to the other studies as well. Some of their common responses include:

• Subjects in any study should be told whether any negative outcomes were to be expected.
• Participation should be voluntary; leaving the study should be easy and come at no cost to the participant.
• Children should not be experimented on, at least not without permission from a parent or guardian who can make decisions in the child’s best interest.
• People who are vulnerable, such as children, prisoners, pregnant women, and people from racial and ethnic minorities, should be protected, and not taken advantage of.

The “Brown eyes, blue eyes” experiment draws some interesting responses*. Some of my students write that while the experiment was well meaning, and was trying to teach students about discrimination on the basis of color, conducting an experiment on impressionable children, especially without the consent of their parents, was unethical.

* For anyone unfamiliar with this experiment: On the day after the assassination of Dr. Martin Luther King, Jr., teacher Jane Elliot repeatedly told students in her all-white third-grade class that brown-eyed people were better than blue-eyed people.  On the next day, she switched to saying that blue-eyed people were better than brown-eyed people. She observed her students’ behaviors toward each other on both days.

Through their answers to the questions above, sometimes directly and sometimes indirectly, students arrive at recognizing the need for an institutional review board, the need for informed consent, and the need to protect vulnerable populations. This leads to the next set of questions in my discussion assignment:

7. When you conduct research on human subjects, your research protocol needs to be reviewed by an institutional review board, and you need to obtain informed consent from your subjects. Explain what the bold terms mean, when did these procedures start getting enforced in the U.S., and why you need the review or informed consent. Cite your source(s). (e.g., Wikipedia link)

8. When you conduct research on human subjects, certain sections of the population are referred to as “vulnerable populations” or “protected groups.”  What are these groups, and why do they need to be protected? Give one or two historical examples that were unethically performed on vulnerable populations. Cite your sources (e.g. link from National Institutes of Health)

For the question about the IRB and informed consent, students are required to describe the terms, why they are needed, and report what year these procedures were put in place in the U.S. Again they are required to provide references. Students discover that concerns about many of the studies referred to in (1) – (6), specifically the Tuskegee Syphilis study and the human radiation experiments, led to the creation of IRBs.

In the wrap-up of this discussion, we revisit the study about the Canadian prisoners, in which some inmates were assigned to solitary confinement to study the effect of sensory deprivation on brain function. The research article mentions that the subjects volunteered to participate, and were told that there were no incentives (e.g. monetary or parole recommendation), that their status in prison would remain unchanged, except for a note in their file mentioning their cooperation. Students discuss whether this is enough of a protection, or enough of an informed consent.

The next two questions touch upon what happens to data after they have been collected. Should the person analyzing the data get to pick and choose which data to include in the analysis, based on what creates a more sensational story? Should studies be published only if they show statistically significant findings? Who stands to lose from violations of the ethics of data analysis? Who stands to lose from publication bias*?

* For class examples, I intentionally use studies that showed statistically significant results as well as studies that didn’t. I also have a separate week-long discussion topic in which students read article abstracts from various peer-reviewed journals, where they see both statistically significant and not significant study results; that discussion touches on one more aspect of data ethics – who funded the study, and why that is important to disclose and to know?

9. What is publication bias? When does it arise? Who stands to benefit from it? More importantly, who stands to lose from it? Give an example of any study or studies where publication bias was present. Cite your source(s). (e.g., Wikipedia link)

10. What is data manipulation (including “selective reporting” and “data fabrication”)? How is it done? Who stands to benefit from it? More importantly, who stands to lose from it? Give an example of any study or studies where the researchers were accused of wrongful data manipulation. Cite your source(s). (e.g., Wikipedia link)

To earn full credit for the discussion assignment, students must also reply to another student’s post.  This is just my way of encouraging them to read and reflect on what other students posted. Students can only reply after they have first submitted their own initial post:

Part 2: Second, respond/reply to a post by another student – adding more detail/insight to their post. (Note: You will need to first post an answer to part 1 before you can see anybody else’s posts.)

I grade these student discussions very generously. Students almost always get full credit as long as they follow the instructions and make reasonable posts, cite their sources, and don’t just copy-and-paste a Wikipedia article.

On my end-of-quarter optional survey about the class this term, students noted this ethics discussion as the discussion they liked the most. Some students said that this discussion topic was the topic from the course that made the biggest impression on them – describing it as “thought-provoking,” “interesting,” and “eye opening.”

In the past I have used this discussion assignment only in introductory classes. But now that I have the online discussion set up in Canvas, I will also use it in my upper-level courses on design of experiments.

Even though I have used these questions as a discussion topic, I can also see using them as a homework assignment, mini-project, or student presentation. For now, I will stick with the online discussion format because my students said they liked reading what other students wrote. While the pandemic keeps us in remote online classrooms, this format provides one more way for students to connect with their peers, as well as learn about some ethical issues associated with collecting and analyzing data.

