Stanford researcher Linda Darling-Hammond talks at a TED conference about the trouble with too much high-stakes testing, and promising moves toward a more effective and meaningful standardized assessment system.
In this post, I'll compare my experience with high-stakes testing at an international school in Mexico City to reports about high-stakes testing at public schools in the U.S. This comparison is especially relevant for me now, as it looks like next year I'll be moving back to the U.S. to teach. To be honest, high-stakes testing is one of the things that worries me the most about shifting to the U.S. teaching environment, and I'm hoping that recent moves toward a more balanced approach to high-stakes testing, such as the performance-based assessment pilot program that came as part of the Obama administration's Every Student Succeeds Act (2015), will gain momentum.
A brief history of high-stakes testing in the U.S.
High-stakes testing has been around in one form or another for more than a century (Nichols & Berliner, 2007), but it really came to the fore with the passage of the Elementary and Secondary Education Act (ESEA) in 1965. Under this act, all public school students had to pass basic assessments in reading and math in order to graduate. The ESEA high-stakes testing requirements came out of U.S. concern over Russia's advances in the space race, for example the launching of the Sputnik satellite. It was felt at the time that a national system of high-stakes testing would spur educational advances and help us compete with the Russians.
In time, the ESEA tests came under criticism for being too easy, establishing a floor for educational achievement rather than inspiring growth. After the economic stagnation of the 1970s, the seminal report A Nation at Risk, published in 1983, warned that unless we overhauled our education system, we would lose our status as a global leader and innovator. Since its publication, many eloquent and damning critiques have been written about the flawed logic of A Nation at Risk, but the report had its desired effect. The government took steps to reform public education, including another increase in high-stakes testing.
Testing was expanded even further with the passage in 2001 of No Child Left Behind, the most sweeping, and many would argue intrusive, piece of U.S. education legislation ever passed. The legislation required that tests be administered once a year in 3rd-8th grades, and once again in high school, and school funding was tied to the outcomes. Growth targets needed to be reached, or schools would be in danger of losing funding, having to pay to transfer students to other schools, and eventually being taken over by state education agencies. The Race to the Top revision of the No Child Left Behind Act, under the Obama administration, added incentives for states to tie teachers' pay to test scores. Now, as No Child Left Behind has been left behind, and the Every Student Succeeds Act takes its place, the U.S. is a nation in which high-stakes testing plays a pivotal role in school funding, teacher evaluation and in some cases teacher pay, student promotion and graduation, and the public's perception of the effectiveness of our education system (Nichols & Berliner, 2007).
Colegio Peterson, Mexico City
I currently work at a bilingual international school in Mexico City, where I am the English coordinator of the primary section. At our school, we administer a standardized test called MAP three times a year, in August, January, and May. Students from pre-first through first grade are tested in reading and math, and second through fifth graders are tested in reading, math, and language. Each round of testing takes up an average of four to five hours of class time. One interesting feature of the MAP test is that it is adaptive- the computer will give students harder or easier questions based on the accuracy of their responses. In this way, the program zeroes in on the student's current level.
The test results are shared with teachers, for the purpose of adapting instructional strategies to meet the diverse needs of their learners. They are sent a basic report, that shares the students' overall scores and a rating of low, low average, average, high average, or high in each subcategory. For example, the subcategories in reading are literature, informational text, and vocabulary acquisition and use. To move beyond this basic report, students can sign into their NWEA accounts and attain information on each student's progress. The information I personally find most helpful is the learning continuum, which shows you the skills that each student needs to learn next, in each subject.
The results of each campus are shared with the other campuses, with a comparison against U.S. norms and the norms of our conference (the Tri-Association), but the scores do not affect teacher evaluation or pay. Truly, their primary purpose is to enhance student learning. The testing does not seem to affect teachers' instructional focus. Curricular focus is much more affected by department expectations, such as the use of our literacy curriculum, Core Ready, our curriculum for word work, Words Their Way, and our STEM-based approach to science instruction. Student promotion is not affected generally, although MAP results are one factor parents, teachers, and administrators look at when evaluating a student's academic progress. The test results also affect the way that teachers differentiate for learners, so students might be placed in a particular reading group because of the Lexile score obtained from MAP. Anecdotal observation suggests that students feel a healthy amount of pressure related to the tests, but not too much. They tend to express distaste for the testing process, while at the same time showing genuine excitement when their scores improve.
Interestingly, the MAP test has revealed inequities in student achievement. For example, boys tend to outperform girls in math, and girls tend to outperform boys in reading and language. Students born outside Mexico tend to outperform students from Mexico, when looking at the results overall. Further investigation is merited to determine why these inequities exist. In general, the MAP test would seem to be a useful measure of student progress and, most importantly, a data source for improving instruction.
U.S. public schools
In writing about the current prevalence and impacts of high-stakes testing in the U.S., I am reporting on what I have read rather than what I have witnessed firsthand. My own experience with high-stakes testing as a child consisted of taking the Iowa Test of Basic Skills once a year, and if my teachers or administrators were stressed about the test, I wasn't aware of it. Personally, I enjoyed the testing experience as a novel, once-a-year challenge. When I entered the teaching profession, I taught in the university, so I have not experienced high-stakes testing from a teacher's perspective in the U.S.
What I have read about it, though, gives me pause. While the rationale behind high-stakes testing is to provide accountability; motivate schools via a system of rewards and consequences; and increase educational equity (High-stakes test, 2014), the reality is often very different. What follows is a brief description of the effects of high-stakes testing in the U.S. in recent years, according to student learning, teacher evaluation and pay, and educational equity.
Student learning
An over-reliance on high-stakes testing would seem to both narrow the scope of student learning and limit its depth. Because high-stakes testing has such sweeping impacts on students, teachers, and schools, the tendency is to target instruction toward the test, limiting the range of learning (WMHT, 2013). Schools tend to administer practice tests to get students ready for the high-stakes tests (Kamenetz, 2015), and less time remains for instruction in science, social studies, music, art, research and writing, physical education, and world language studies, pursuits that cognitive science has shown to expand our cognitive capacity and basic intelligence. The cognitive expansion sparked by this variety of subjects has been shown "[to raise] achievement and accomplishment in a variety of domains" (Darling-Hammond, 2015).
High-stakes testing, as it is currently undertaken in the U.S., can also limit the depth of learning. The tests themselves tend to ask questions at the remembering and understanding levels of Bloom's taxonomy, at the same time that the workplace demand for higher-order thinking skills and communicative competency is skyrocketing, and the demand for routine cognitive and manual skills is decreasing, due to the use of technology to automate basic tasks (Darling-Hammond, 2015).
Strangely, although students are spending more time on high-stakes testing, their results on these tests have stayed the same or even gone down slightly. Meanwhile, U.S. results on international standardized assessments has fallen. While in the 1970s the U.S. led the world in education, currently we rank between 21st and 32nd on the various parts of the PISA exam, largely because this exam is calling for higher order thinking skills and students' ability to apply their knowledge to new problems (Darling-Hammond, 2015).
U.S. teachers have expressed an internal conflict between their desire to teach using student-centered pedagogies such as inquiry, discovery, and problem solving, and their belief that traditional methods are the best way to raise test scores (Bulgar, 2012). In one survey 85% of teachers said that high-stakes testing undermines student learning (Darling-Hammond, 2015). In short, too much high-stakes testing limits rather than motivates student learning, and the very format of most high-stakes tests in the U.S. tends to elicit a shallow understanding.
Teacher evaluation and pay
In recent years, student performance on high-stakes tests has become an important factor in the teacher evaluation process, potentially affecting decisions related to compensation, tenure, hiring, and firing (High-stakes test, 2014). While linking teacher evaluation to test scores may be intended as a motivator for better teaching, the motivator often has unintended and harmful effects. One educational commentator argued, "They [high-stakes tests] can't count so much that you have teachers feeling that the last student they want to teach is a student that's challenged, because if that student doesn't get all the supports that he or she needs, then their career depends on it. And when we need the best teachers in the most challenged schools, we're not going to get them as long as they feel that their job is in jeopardy" (WMHT, 2013). In this example, we see the linking of teacher evaluation and test scores prompting teachers to avoid challenging teaching environments because of the risks involved for their careers. Linking teacher evaluation to test scores has also incited some teachers to outright cheating on the tests (High-stakes test, 2014).
Whenever we put incentives in place to try to guide human behavior, we need to be careful that the incentives don't cause unexpected, negative reactions. In the case of linking teacher evaluation and test scores, experience has shown that the incentive system doesn't work the way it was intended.
Equity
Perhaps the most convincing argument in favor of high-stakes testing is that of educational equity. Another commentator in the news program cited above (WMHT, 2013) shared that, as a teacher, she had seen students from minority backgrounds being passed through the system without appropriate instructional support or accountability. Minority students have tended to be ill served by their schools, and schools with higher minority populations have tended to be ill served by state and federal government. For this reason, she advocated for high-stakes testing. It provides concrete evidence of the achievement gap, and thus makes it more likely that real, lasting change will happen.
Having said that, serious questions have been raised about the impact of high-stakes testing on traditionally underserved students. The Glossary of Education Reform asserts that high-stakes testing results in a narrowing of the curriculum, diminishing the quality of education for the very students high-stakes testing was intended to benefit. When teachers feel pressured to teach to the test, students of color and students from lower-income homes "may be more likely to receive narrowly focused, test-preparation heavy instruction instead of an engaging, challenging, well-rounded academic program" (High-stakes test, 2014). In support of this concern, the Glossary of Education Reform also points out that high-stakes testing "has been correlated in some research studies to increased failure rates, lower graduation rates, and higher dropout rates, particularly for minority groups, students from low-income households, students with special needs, and students with limited proficiency in English."
Another concern with high-stakes testing and equity is that teachers may feel pressure to teach to the middle, rather than appealing to all of their students' needs. Teachers in some states have their bonuses tied to the performance of a certain percentage of their students (40%, let's say). An unintended consequence of this motivator is that teachers tend to direct their instruction to on-level students, leaving behind gifted students and students with special needs (WMHT, 2013).
Finally, state and federal government sends a mixed message about equity and accountability. At the same time that government implements high-stakes testing requirements for the supposed purpose of increasing educational equity, they impose financial sanctions on schools that fail to meet the new requirements. Typically, the schools failing to meet high-stakes testing requirements are schools in low SES neighborhoods. These schools receive less funding and face other disciplinary measures, having a negative impact on the very students the new laws were supposed to assist. Schools in low-SES neighborhoods are further undermined by an emphasis on charter-school funding at the expense of public education (Croft, Roberts, & Stenhouse, 2015).
Perhaps as a result of the above reasons, No Child Left Behind, with its over-reliance on standardized testing, did not actually close the achievement gap as it was designed to do (Nichols & Berliner, 2007).
Finding the right balance: alternatives to the current approach to high-stakes testing
It's a basic tenet of research that the act of observation changes the phenomenon being observed. In physics, it's called the observer effect. In psychology, it's called the Hawthorne effect. In education, it could be called the testing effect. High-stakes testing is a form of observation intended to ensure the quality of the educational process. There's nothing wrong with observation in and of itself: in fact, observation is necessary to ensure that students are learning.
Unfortunately, this act of observation, especially as it has become more and more prevalent, has had a big impact on the phenomenon it's intended to observe. In many cases, students are learning more poorly because of high-stakes testing. The testing narrows the curriculum, pressures teachers to play it safe with traditional methods instead of innovating, and hurts the educational chances of underserved student populations, such as minorities, low-income students, special needs students, and students whose first language is not English.
One way to strike a better balance on high-stakes testing would be to de-emphasize the importance of the tests for major decision making, like school funding, teacher evaluation and pay, and student promotion and graduation. We could give fewer tests, and consider the results in a more holistic manner, taking into account a host of other factors that go into educational quality. This is the approach taken at the school where I work in Mexico City.
Another way to strike a better balance would be to follow the lead of researchers like Linda Darling-Hammond, who argue that we are testing for the wrong things. Instead of asking basic questions on the remembering and understanding levels of Bloom's taxonomy, we ought to be calling on students to apply, analyze, evaluate, and create. One example of this approach is the Graduation Portfolio System that's been adopted by several U.S. schools. Under this system, high school students complete projects in scientific investigation, literary analysis, social science research, mathematical application, world language proficiency, and artistic performance. They develop their work in light of a clearly described standard, and they revise and revise their work until it meets the standard. They then present the work as they would a dissertation, with an expert panel of judges, often professionals from the community. Kids from schools with the Graduation Portfolio System go to college at higher rates and graduate from college at twice the rate of the average American student (Darling-Hammond, 2015). When asked why they enjoy more success in college, students from these schools tend to talk about the Graduation Portfolio System, and how it taught them life skills, such as how to receive and make use of critical feedback, how to persevere, and how to be resourceful. This movement is in keeping with an international push among the nations ranked highest in educational performance, toward the nurturing of higher-order thinking skills to address real-world problems. Darling-Hammond describes a shift that's beginning in the design of standardized tests in the U.S., toward more performance-based assessment, rather than just bubble filling with questions of lower cognitive demand.
Other alternatives include statistical sampling, as is used on the PISA test, rather than asking every student to take every test, as well as the use of big data, for example tracking student performance in computer-aided learning experiences, without the students even knowing that they are being "tested." This data could be interpreted in a longitudinal fashion to draw conclusions about the quality of U.S. education, and there would be no need to intrude into the teaching and learning process with frequent high-stakes testing (Kamenetz, 2015).
If enough people with enough power come to the conclusion that high-stakes testing is out of balance in the U.S., they will not be at a lack for viable alternatives. This is all about student learning, right? So let's do what's best for our students and bring balance back to high-stakes testing.
Reference list
Bulgar, S. (2012, May-July). The effects of high-stakes testing on teachers in N.J. Journal on Educational Psychology, 6(1), 34-44.
Croft, S. J., Roberts, M. A., & Stenhouse, V. L. (2015). The perfect storm of education reform: high-stakes testing and teacher evaluation. Social Justice, 42(1), 70-92.
Darling-Hammond, L. (2015, June 29). Testing, testing [online video]. Retrieved Jan. 28, 2017, from https://www.youtube.com/watch?v=2G_vWcS1NTA
High-stakes test. (2014, Aug. 18). In S. Abbott (Ed.), The glossary of education reform. Retrieved Jan. 28, 2017, from http://edglossary.org/high-stakes-testing/
Kamenetz, A. (2015, Jan. 22). The past, present, and future of high-stakes testing. NPR online. Retrieved Jan. 28, 2016, from http://www.npr.org/sections/ed/2015/01/22/377438689/the-past-present-and-future-of-high-stakes-testing
Nichols, S. L. & Berliner, D. C. (2007, March 4). A short history of high-stakes testing. In Collateral damage: how high-stakes testing corrupts America’s schools. Cambridge, MA: Harvard Education Press.
WMHT. (2013, Jan. 30). High stakes testing and student success [online video]. Retrieved Jan. 23, 2017, from https://www.youtube.com/watch?v=czlZG8brjC0