Costs from the Manufactured Crisis | Notes from the Educational Trenches

We don’t know how well teaching to the test works. We cannot measure the costs and benefits from a test-focused curriculum because we cannot know the results that alternative strategies might have provided. How would our students have tested if we had provided a more general, student-focused education for them? Teaching to the test removes the focus from students, and puts that focus on a measurement instrument instead.

For as much time and effort as we are now putting into testing, we don’t know how the results of those teaching efforts compare to past years in which the curriculum was determined by districts based on what leaders thought students needed to know for their future, rather than what students needed to know for the annual state test. Thanks to the Common Core Initiative, we may never be able to even approximate that data.

The new PARCC and Smarter Balanced tests, along with other state tests that are being rewritten to match the Common Core, ensure that we cannot compare today’s apples to yesterday’s apples. How are today’s students doing compared to students from the past? With the new tests, it’s impossible to say because students are taking significantly different tests. They are also taking those tests differently in many cases, as computers replace paper and pencil. For analysis purposes, these changes in testing instruments effectively damage or even destroy the ability to make comparisons over time.

If a million students took a test in 1975, and a million students took the same or a very similar test in 2005, we could comb our data (assuming we had saved enough of that data) to compare educational results for 1975 and 2005. We could say that Nebraska’s students had answered 67% of a section’s math questions correctly in 1975 and only 52% in 2005. (I made those numbers up for purposes of illustration.) When the same test is employed over time, results can be compared over time. Questions that were changed during that time period can be eliminated from analysis as long as the remaining questions make up a large enough sample to use for comparison.

Once those students started taking the PARCC test instead, the ability to make useful comparisons over time became vastly more complex. We don’t have apples to apples now, we have apples to watermelons or even shellfish. With the new emphasis on critical thinking and scenario-based problems, we may have shifted to testing different student attributes as well as different test content.

If we wanted to examine the effectiveness of teaching to state tests, we could go back and look at NCLB numbers for this purpose, numbers that provide slightly over a decade’s slice of teaching to identical or very similar tests* — once we queried school administrators to select those districts that actually adapted instruction to teach to that state test. For comparison purposes, though, we have another set of probably insurmountable problems: Many of America’s strongest districts never changed to that test-based strategy. They continued with their older curricula which had always worked well. Their students continued to receive broad-based instruction based on what administrators believed those students needed to be prepared for the world, instruction with a long-term rather than short-term view.

Other considerations need to be factored in as well. Different schools’ effectiveness at teaching to annual state tests will vary enormously, depending both on how well the teacher and district anticipate and focus on test content, and on how close the testing students’ level of academic understanding reflects the content of the test. Teaching to the test has always been a matter of degree, with some district’s even cheating to find out test content, while others merely chose topics to teach based on that content.

Individual student achievement levels are often ignored in discussions of teaching to the test, but my experience tells me those levels may be vitally important to the big picture. If I teach the content of an eighth grade test to two students, one operating at an eighth-grade level and the other at a third-grade level, I will have prepared the first student, but I may not have done much for the second student, who probably understood little of the material I was presenting. Even with lengthy before-school and after-school tutoring, my lower student may simply be too far behind to get ready for a test set five years above his or her academic operating level. (In the meantime, while I was desperately tutoring that lower student, what other learning opportunities did other students miss?) The net effect of what I just described will be that a district with students near grade level may gain significant points in the test-score game, while a district with students performing much below grade level may sacrifice useful instructional time to a goal that cannot be reached. In the first case, teaching to the test “worked.” In the second, inappropriate instruction may even have prevented learning from taking place.

*A few states changed their tests during these years to make attaining NCLB targets easier. Teasing out valid comparisons over time in these states will be challenging. That was the whole idea behind changing the tests, of course.