Friday, June 19, 2020

Bloom’s Grading: Reduce Cheating and Improve Student Outcomes

With the advent of more courses moving to virtual delivery, particularly during the COVID-19 era, I've had more frequent conversations with my faculty peers about how to prevent cheating in online exams. Here, I explain how I’ve addressed this concern, while also improving student learning. In summary, I've made three simple changes to my course design:
  • I changed from a 60/70/80/90% point cutoff for D/C/B/A letter grades to 20/40/60/80%
  • I aligned each letter grade (A, B, C, D, F) with the levels of Bloom's Taxonomy (I'll refer to these as "Bloom's levels")
  • I designed my assessments (especially exams) to provide an equal number of available points for each Bloom's level (and thus for each letter grade).
The outcomes are that instructors are more conscientious about actively assigning more point weight toward exam questions requiring higher-level Bloom's functions (like Analyze, Evaluate, Create). These are inherently more cheat-resistant question types, because such questions tend not to have answers that are easily found by search engine query. Also, student responses to such prompts are expected to be unique, resisting peer plagiarism. This approach also tends to reduce the number and/or point weight of exam questions that are more easily cheated.

Together, this suite of changes, which are more fully described below, have not only discouraged cheating but also provided students with more resources to know my expectations of what they should be able to do in order to demonstrate their understanding of course topics and concepts.

This post is lengthy, for those who want details for implementing this approach. For the TL;DR version, here is a PDF of an essay that condenses this post into a version I pitched to the Chronicle of Higher Education for publication.

Introduction

A history of letter grading

Humans do have a tendency to organize and categorize, which isn't always a good thing. Somewhere along the way, categorizing learning by ranking students, either against each other or against an instructor standard, became the norm. According to Durm's (1993) historical reconstruction of letter grading (download a PDF of my graphical interpretation of the history of letter grading, also shown below), Yale (at least in the USA) is the first university with evidence of such a system, from 1785, where students were graded into four tiers. Multiple higher ed institutions tinkered with grading concepts over the next century, including the introduction of the Pass/No Pass concept, which morphed into Pass/Conditional/Absent at the University of Michigan.

In the late 1800s, Harvard switched from a six-tier grading system, in which placement was, for the first time, based on a percentile scale, to a five-tiered system, in which the bottom tier (Roman numeral "V") was not considered passing. In 1897, Mt. Holyoke finally arrived at a five letter grade scale based on particular percentiles…but this was A/B/C/D/E grading! At some point, E disappeared and was broadly replaced by F.


A timeline of years that particular universities adopted different grading scales
One of the more notable contemporary changes was Brown's decision in 1969 to allow A/B/C/no pass and Satisfactory/no pass grading. What was innovative here is that Brown does not report "no pass" grades on student transcripts, so that failed courses do not count against the grade point average (GPA)!

This brief survey of grading underscores one key conclusion. I hope you either already recognize this or you will now agree with me:

Letter grading is arbitrary

Let's face it: instructors spend a lot of time grading coursework. I suspect that most of us have realized that multiple factors interact to produce a single letter grade for each student in each course, integrating multiple factors of a student's performance. The myriad variables under the instructor's control include:
  • The types of point-earning activities (e.g. attendance, participation, homework, exams)
    • and the relative weights of those activities
  • The particular questions asked on assessments
    • and the relative weights of those questions
  • When, if ever, point values are rounded
    • and the decimal place to which any rounding occurs
  • Ultimately: the score thresholds separating letter grades
Making things worse, I suppose that most instructors (at least in higher ed) make many of the above decisions on their own without consultation with other instructors (past or contemporary) of the same course. As such:

Letter grades have poor inter-rater reliability

What earns a student a "B" in my genetics course is not what would earn that student a "B" in somebody else's course. When I evaluate a transcript of a student who wants to join my research group, I have no real concept of what the grades they earned really indicate! We could throw up our hands in despair and conclude that it isn't worth our additional effort to try to rectify this mess about the arbitrary-ness of grading. We could just agree to give all of our students "A"s and save ourselves so much time and effort. Instead, I argue that we should use the fact that letter grading is arbitrary, and historically has been very flexible, to our advantage and to our students' benefit!

My philosophy of assessment and grading

To understand my perspective, a bit of an introduction is in order. I'm an Associate Professor of Biology at California State University, Fresno, which is a regional and minority-serving institution in central California, with mainly undergraduate and Masters programs. If you're not a biologist or scientist, please don't hold that against me. Indeed, the grading approach I describe here can easily adapt to any discipline, so please keep reading!

My overarching philosophy on teaching and evaluating students is that I hope all of my students will earn an A grade. I don't curve my grades, and I also believe that students should have a clear rubric for earning a letter grade, beyond just describing what percent of points must be earned to achieve particular grades.

I tend to teach upper-division courses required of biology majors, particular genetics and molecular biology. The course I'm going to describe as a case study today is our required genetics course, which tends to be attended mainly by junior (third year) students. The course tends to have an enrollment of fifty to sixty students in two sections, offered by two different faculty, each semester. The main point here is that all biology students, including students who may be more interested in topics other than genetics (e.g. ecology), take this course, and that the enrollment is moderate - this is normally a lecture-style class in a relatively large room with fixed, stadium-style seating. I've taught this course about eight times in the last seven years, but I began to design and adopt the reforms I'm suggesting here about three years ago.

I'm also keen to leverage technology to support student learning. I'm not talking about technology for technology's sake, but about using technology to provide students with on-demand resources for learning, to reduce the cost of course materials (I create many of my own written materials and videos), and most academically importantly to help students have "authentic" disciplinary training. In my field, for example, that means students should be trained how to access online databases containing genetic information and how to use computer programs to perform analyses and statistical tests.

In this face to face course (before COVID-19), I had adopted a blended learning ("flipped") course design, where students would read the textbook and watch introductory videos before coming to class. During class, I would have question and answer sessions, provide additional practice activities, and conduct other active learning exercises. Because of my expectations that students would learn to use online resources specific to genetics, I felt it necessary for them to have access to computers during exams. Then, I began to redesign my class to try to figure out how best to encourage academic honesty with open-internet exams. I do not use any technological tools (like limiting browser access via campus wi-fi to a limited selection of websites) to prevent cheating on exams. I just tell students my expectations:
  • exams are open-book, open-note, and open-internet
  • thus, I make my exams a bit harder than they would otherwise be
  • to prevent distractions, I don't allow students to access the audio of any videos they might want to watch
  • real-time interactions and collaborations with others (e.g. chat) are not allowed (but I have no way to catch this; I hold students to the academic honesty code of the University)
  • students may only use one mobile device (laptop, tablet, or smartphone) during an exam
Over the years, I've learned how to design open-note/text/internet exams that students can complete in their fifty minutes and that, even in the potential presence of cheating, still produce a bell curve grade distribution (assuming that is desirable…). I'm writing today to share some of these approaches that I've taken. One of the main insights I will share is that it is important for instructors to be aware of at least two aspects of their assessments (e.g. exams), especially in preventing or discouraging cheating: the relative weighting of points per question, and the Bloom's Taxonomy level of each question.

students taking an exam using notes and laptops to look up information if they want
Students taking an open-resource (notes, internet, textbook) exam in one of my courses

Bloom's Taxonomy

Here is a link to a fantastic resource with a background on Bloom's Taxonomy. Briefly, Bloom described a hierarchy of six different educational goals, from basic to advanced. The 2001 revised taxonomy lists them as: Remember, Understand, Apply, Analyze, Evaluate, and Create (Anderson and Krathwohl 2001.) This structure has been widely used as a framework to scaffold instruction. For example, students first remember key concepts and facts, and then they can demonstrate Understanding by comparing or classifying those concepts and facts. The next higher goal is Applying the concepts, facts and understandings - perhaps in a novel situation, and so on. Many agree that, in practice, it is ideal to have students demonstrate their facility with course content and practices by Creating (e.g. a computer program, or a poem, or a song, or a laboratory report, or an analysis of different stock market investing strategies).

The most critical aspects of Bloom's taxonomy for our current purpose are to appreciate that:
  • each assignment or quiz/exam question in your course can be categorized by its Bloom's level, and
  • the higher up Bloom's taxonomy one goes, the less easy it is to cheat

Instructor Actions

So, what does it look like, in practice, when letter grades and assessment questions are aligned to Bloom's Taxonomy?

1. Student Knowledge Survey (SKS)

Although this first step is optional, I highly recommend it as a valuable pedagogical practice. The basic concept of an SKS is that you provide students with an "exam" at the start of the term. The exam (actually described and delivered as a survey) comprises a task for each student learning outcome, and the format of each question is to ask the student to self-rate how well they think they could do at that task. Some, like Bowers et al. (2005), suggest that the SKS be given at the start and end of each term as one way to measure student learning based on whether confidence in their abilities improved. I use it for a different purpose.

Here is a link to a PDF of my SKS for my genetics course. Hopefully you'll see that, as in the first task,
"Arrange nucleotides by chemical structure and hydrogen-bonding capability"
this is a specific task that begins with a verb, "Arrange" (the relevance of the verb will be explained shortly!) Even though you might not be a biologist or a chemist, I hope you can conceive how this task is clearly assessable. However, this is also clearly not an exam question, because it provides no specific content or details for a student to work with.

By providing students this exhaustive list of tasks on the first day of class, I accomplish a few things. First, I have provided them with a complete study guide for the course. I am being transparent by sharing my expectations of what I think students should be able to do (tasks) to demonstrate their mastery of course topics and concepts. Have you ever had a student ask you, "What's going to be on the test?" From here on, you'll be able to refer them to the SKS from the first day of class as a great resource!

Second, building an SKS essentially creates a checklist that focuses my instruction on what I've told the students I expect them to be able to do. Likewise, the SKS serves as a template for building assessments (e.g. exams). All I need to do to write an exam is to look at the SKS tasks my class has covered. I then fill in the details of each task to generate an exam question that is directly aligned with what I've told the students I expect them to be able to do. You may recognize this process as "Backward Design," which Wiggins and McTighe (1998) have helped popularize. Read more about Backward Design in the context of my course at my prior post. Ultimately, our student learning outcomes and tasks should drive our assessment content, and our assessment content should drive our instruction. Some negatively call this "teaching to the test" - and it absolutely is, and we should strive to do so!

Third, and most critically here, by writing that exhaustive list of tasks, which begin with verbs, I can easily look over all of my student learning outcomes/expectations/tasks and categorize them by Bloom's level.

The big idea here is to ensure that my tasks are roughly evenly distributed across all of the Bloom's levels, so that I am providing multiple opportunities from the basic (factual recall in the Remember level) to the advanced (synthesizing and generating information in the Create level) to allow students to demonstrate the extent of their mastery.

If you are interested in building an SKS, don't panic! It can be overwhelming to think about doing this. Don't try to do it all at once - consider giving yourself a few terms to develop this resource. The way I proceeded was to build a document where I collected tasks in real time, over the course of a semester, as I assigned homework and created exams, and then I built an SKS from there.

Remember, you do not need to have an SKS to achieve the major goal of reducing cheating described below, but it will eventually help the process be more efficient.

2. Create a letter grade scheme using Bloom's levels

Most instructors have the power to place their letter grade thresholds wherever they like. In the USA, it seems most common (and arbitrary) for 0-59% of points to equate to an F, 60-69% a D, 70-79% a C, 80-89% a B, and 90-100% an A.

I use the point-to-grade alignment shown in the image below (a PDF version of this graphic is available here), where 0-19% of points is an F, 20-39% is a D, 40-59% is a C, 60-79% is a B, and 80-100% is an A. In other words, the percentile scores are evenly distributed across the letter grade range.

Diagram of how the six Bloom's taxonomy levels are aligned with point percentages, letter grades, and Bloom's tasks

There are a few aspects of this letter grade scheme that I think are worth additional exploration.

First, and most importantly, spreading the letter grades out (equally) over the entire percentile range makes tremendous sense. Why should we cram the "passing" letter grades (D through A, or maybe C through A, depending on your program) into the top half or quartile of possible percentiles? What this traditional practice indicated to me, when I started teaching (and writing exams) was that I had to include questions, worth about 60% of points, that would be relatively easy for most students to earn. This would ensure that most students would get at least to the D level (60%+ of points in most "normal" courses) and hopefully then produce a bell curve of letter grades. I wonder, if you reflect on your own assessments and exams, if this is also true for you?

To address this shortcoming, I aligned my new letter grade scheme with Bloom's levels. This is perhaps the most fundamental paradigm shift I am proposing. For example, if all a student can accomplish on my exam is Bloom's level 1 (Remember) work, e.g. "define," "label," then they will earn an F. If they can also complete some of the Understand level tasks, then they move into the D letter grade range.

By the way, you might have noticed above that I combined two of the six Bloom's levels (Create and Evaluate) so that there are the same number of levels as letter grades. If you like the general concept I'm sharing here, you can certainly make other such arbitrary alignments to modify this approach.

Second, by aligning the letter grades with Bloom's taxonomy, which is a well-described and widely-used concept, we have a letter grade rubric! Does your course syllabus go beyond displaying how many points have to be earned to achieve each letter grade? Heck, even the USDA has a rubric for grading beef, which depends on the amount of fat marbling in the muscle! Imagine how much more useful it is to students to be given an actual rubric, that more explicitly defines what (tasks, verbs) is expected of students to earn particular letter grades! Leveraging the verbs that accompany Bloom's taxonomy is the real strength of this approach.

Aligning letter grades with Bloom's will also improve inter-rater reliability, I suspect. For example, now, when I write recommendation letters, I can efficiently explain what types of work a student did to earn their letter grade in my class.

Third, I want to note that this is not a curve! My grade rubric doesn't mandate that a particular number or percent of students will earn a particular letter grade.

Finally, although this is just a mindset hack that I decided to include: you might also have noticed that I gave names to each of the letter grades. In my classes, an F grade doesn't stand for Failing, it stands for Foundational. Students who can perform Bloom's level 1 work have provided me with evidence mainly of Foundational accomplishment. Then students move up to Developing, Competent, Burgeoning, and Accomplished evidence.

3. Provide an equal number of points per letter grade (Bloom's level)

Step 1 (Student Knowledge Survey or SKS) was optional. Step 2, adding the new letter grading rubric to your class, is easy to do - just add it to your syllabus. Together, those two steps can help improve student outcomes by providing them resources and concrete expectations. That achieves one of the goals from the title of this post, "Bloom's Grading: Reduce Cheating and Improve Student Outcomes."

The first two steps laid the foundations for designing cheat-resistant exams. Step 3, then, will be the main focus of your efforts, and this is where Reduce Cheating will be addressed. Here's how.

When you write an assessment (e.g. exam), make sure that there are roughly the same number of points available from questions of each Bloom's level: 20% of points from level 1 type questions, 20% from level 2 type questions, and so on. This doesn't have to be a difficult process. Here's the ideal approach:
  1. Write your exam questions
  2. Assign each question to its Bloom's level
  3. Choose the total points available for the exam
  4. Assign point values to each question so that the sum of points in each level is about 20% of the total points available
Here's an example from one of my exams. If you follow this link to a PDF, then you will see all of the questions, Bloom's level assignments, and point values. Below, I summarize this document. My exam had 15 questions, and the image of my spreadsheet table below shows how I weighted the exam question point values. For example, 27% of the questions were Bloom's level 1, and 20% of the available points were available from those four questions.

Table listing the number of exam questions and their point values available in each of the Bloom's levels
Clearly, I did not succeed on this exam in creating a truly even distribution of point values across the five letter grades, with only 7% of points available in Level 4-type questions.

Now let's see how this approach translated into student grades:

Table listing the number of exam questions and their point values available in each of the Bloom's levels, including the percent of students earning each letter grade

On this exam, five percent of students earned an F, and nineteen percent of students earned an A. More than half of the students earned an A or a B. We can debate whether this type of grade distribution is desirable (but it is for me)! As I mentioned earlier, my goal as an instructor is to help students succeed in learning and expressing their learning. It might be that I write easy exams, but the exam items didn't change from prior semesters, when my grade distributions were lower. Instead, it might be that the SKS study guide and the letter grade rubric actually helped students better prepare for the exams. Perhaps most likely, my having gone through the process of creating the SKS might have led me to focus instruction on the student learning outcomes, and so that is why students did so well on the exam!

How might all of this instructor effort reduce cheating?

In my grading scheme, one letter grade's worth of points (20%) are available from Create+Evaluate questions, which are essentially cheat-proof. When a student is asked to create or to provide an evaluation of something, the instructor should have every expectation that each response will be unique. When two responses match or are very similar, then plagiarism has been identified. At least most of our students already know to avoid this kind of cheating, because it is so easily detected.

Likewise, the Analyze and Apply type questions, depending on how you word them, are highly cheat-resistant. For these types of questions, I often append a "Briefly explain your response" requirement, to elicit a unique response from each student.

The above question types are also resistant to cheating because they're often questions for which answers are not readily available by querying a search engine. That's not necessarily true of the lower Bloom's question types, which are more often (in my discipline) delivered as labeling, matching, true/false, multiple-choice, or fill in the blank questions.

However, please note: because of the even distribution of points across Bloom's levels, even if students can (and do) Google answers to your questions, that still won't earn them a good grade. For example, if they succeed at cheating by looking up the answers to all of your Level 1 and Level 2 questions, and if they got the answers all correct (40% of total points), they would only have earned a D letter grade by their cheating.

Moreover, cheating takes time! On a timed exam, students can't afford to look up the answers to the questions, or they won't have enough time left to address the 40-60% of available points on your cheat-resistant and cheat-proof upper Bloom's level questions. As I mentioned above, I hold open-resource (notes, internet) exams, and even under those circumstances, where I expect students to look information up online during the exam, I've had many students mention that they spent too much time on those portions and did not have enough time to complete some of the other questions on the exam.

So, modifying the number of questions on a timed exam is another aspect of exam preparation, independent of the question score weighting I'm proposing, that you can also use to discourage cheating.

I hasten to confess that this approach isn't ideal in all circumstances, it has flaws and drawbacks. For example, in any online exam, students can cheat by communicating answers with each other. Higher-level Bloom's questions are still safe, but chat/messaging can provide a faster method of cheating on the lower-level questions without resorting to web searching. Also, the instructor has to make a judgement about how much effort they're willing to put into enforcing academic honesty. There is no equitable way to make any exam cheat-proof, especially online. There is a clear and direct trade-off between how cheatable an exam is and how much effort the instructor puts into creating the questions and into grading the responses. The process I've described here is not great for a multi-hundred student section of an introductory class, where there is not enough time available to grade free-response higher Bloom's level questions.

I hope you find some or all components of this process useful to implement and/or at least to think about! Please share comments with me - I'll enjoy reading additional perspectives on the topic of academic dishonesty in online (presumably open-internet) exams.

References

  • “A taxonomy for learning, teaching, and assessing: a revision of Bloom’s Taxonomy of Educational Objectives.” Anderson and Krathwohl, eds. New York: Longman. 2001.
  • Bowers, Brandon and Hill (2005) “The Use of a Knowledge Survey as an Indicator of Student Learning in an Introductory Biology Course.” Cell Biol. Ed. 4:311-322.
  • Durm (1993) “An A is not an A is not an A: a history of grading” Educ Forum 57.
  • Wiggins and McTighe (1998) “What is backward design?” in Understanding by Design (7–19). Merrill Prentice Hall.

No comments:

Post a Comment