Editor’s note: Aaron M. Pallas is Professor of Sociology and Education at Teachers College, Columbia University. He writes for the blog "A Sociological Eye on Education" for The Hechinger Report.
(CNN) – It's the dead of summer, and many states are releasing the results of testing done in the spring. A lot happens between the time that a student fills in the last bubble and a score is produced.
Some things are just general logistics: collecting the exams, routing them to the appropriate destination and processing students' responses to multiple-choice questions using high-speed scanners. Others require more judgment: Scorers must review and rate students' responses to open-ended questions to which they must construct an answer, using a grading rubric, or guide to the elements of a good response.
But even more judgment is required after that, much of which takes place behind closed doors.
Testing firms such as CTB/McGraw-Hill or Pearson, which contract with states to produce the scores, must look at the patterns of students' responses and see if they "behave" in predictable ways. Most tests are assumed to measure a single skill, such as mathematics proficiency, or reading ability, or science knowledge, with some test items designed to be somewhat easier, and others more difficult. Students who get the easy and difficult questions correct are assumed to have more "ability" than those who get the easy questions right and the hard questions wrong.
If a student gets the hard questions right and the easy ones wrong, that's a red flag that something may be amiss, either for the student or for the test. And if a test item seems to behave differently for test-takers of different groups, such as poor students and affluent students, the item may need to be dropped from the test. For example, if smart poor kids have never seen a yacht, then a math test item about a yacht may not reveal their mathematical knowledge.
The most challenging part of this process, though, is trying to place this year's test results on the same scale as last year's results, so that a score of 650 on this year's test represents the same level of performance as a score of 650 on last year's test. It's this process of equating the tests from one year to the next which allows us to judge whether scores this year went up, declined or stayed the same.
But it's not straightforward, because the test questions change from one year to the next, and even the format and content coverage of the test may change.
Different test companies even have different computer programs and statistical techniques to estimate a student's score and, hence, the overall picture of how a student, school or state is performing. (Teachers too, but that's a subject for another day.)
All of these variables - different test questions from year to year; variations in test length, difficulty and content coverage; and different statistical procedures to calculate the scores - introduce some uncertainty about what the "true" results are.
I think it's similar to the problem a doctor faces when interpreting a patient's blood work. Different labs will produce different results for the exact same sample of blood; in fact, different labs may have different "reference ranges" to indicate when a particular value is flagged as being unusual. For this reason, doctors prefer to rely consistently on the same lab. Doing so allows them to minimize the possibility that a change from one lab test to the next is due solely to the measuring equipment rather than a true change in, say, a patient's cholesterol level.
In testing, every year is like changing labs, in somewhat unpredictable ways, even if a state hires the same testing contractor from one year to the next. For this reason, I urge readers to not react too strongly to changes from last year to this year, or to consider them a referendum on whether a particular set of education policies – or worse, a particular initiative – is working.
One-year changes have many uncertainties built into them; if there's a real positive trend, it will persist over a period of several years. Schooling is a long-term process, the collective and sustained work of students, teachers and administrators; and there are few "silver bullets" that can be counted on to elevate scores over the period of a single school year.
Better to keep one's eyes attuned to the long run, and not jump to conclusions.
The opinions expressed in this commentary are solely those of Aaron Pallas.