At a recent Danielson professional development (see posts around March 26th), the presenter uttered the following quote that I saved on a blue Post-It note:
“Of course, we would have a rubric so it has validity,” she said.
Statistical definition of validity: A test is valid if it measures what it is supposed to measure. In science and statistics, validity tells us how well a conclusion or measurement corresponds to the real world.
My mother’s old bathroom scale weighs in at about 6 pounds lighter than my actual weight. It’s reliable. If I step on it carefully, I’ll be six pounds lighter than I am at home. These mom-scale weights lack validity, however. The doctor’s scale matches my home scale, not my mom’s scale. In the real world, I believe the home scale provides a valid measurement of my weight. My mom’s scale does not, although I’ll confess I love to look at its numbers.
A tool by itself does not make a measurement valid or invalid. A rubric does not automatically create a valid measurement, any more than my mom’s scale with its rusty, forty-year-old springs can create that valid measurement. Any validity a tool offers proceeds from that tool’s ability to reflect the world’s ACTUAL truths.
I just created a lousy rubric and a search on bad rubrics will turn up a fair number of funny, if pathetic, rubric examples. Do I have validity? Am I measuring student learning? What if my student wrote down four facts about Ulysses S. Grant’s favorite types of whiskey? Do I give all four points in that category? What if my student correctly used grammar and punctuation in describing Grant’s taste in whiskey? Do I give all four points? What if my student has strong artistic skills and drew the whiskey bottles beautifully, using calligraphy to lay out his facts? Let’s say the same student thinks the South won the Civil War. That yields 13 points out of a possible 16. My student has 81% — or a “B” on his project.
Eduhonesty: Rubrics are not magic. Rubrics can even be silly. They don’t have to be valid, either. Teacher Scott’s “attractive” can be Teacher Mary’s “very attractive.” Teacher Mary may give Emilia a “very attractive” because Emilia threw lots of glitter and stickers onto her project, while giving Johnny only “attractive” because Johnny didn’t add extra touches — even if both students have pretty much the same content. Does that validly reflect student learning? Using this rubric, two students with the same content could end up 3 points apart due to glitter usage. That’s 18.75% — or almost two full grades according to the standard 60-70-80-90-100% grading scale.
Validity in social science too often is in the eye of the beholder, but rubrics per se cannot and will not solve this problem. At their worst, they encourage glitter consumption rather than learning.