The Promise of Large-Scale Learning Assessments: Acknowledging Limits to Unlock Opportunities
In this NORRAG Highlights contributed by Sobhi Tawil and Maya Prince from the Education Research and Foresight at UNESCO, the authors look at the impact Large-Scale Learning Assessments (LSLAs) have on policy both at the local and international levels. The authors point to the fact that while LSLAs serve a growing number of purposes, they also raise concerns about a range of unintended, and sometimes, adverse effects.
The last few decades have seen a rise in large-scale learning assessments (LSLAs) in the form of both cross-national and national standardized tests. They have been gaining currency across the world, permeating policy debates at global and national levels. Beyond more affluent contexts, a growing number of middle- and lower-income countries – frequently encouraged by multilateral and bilateral agencies – currently invest in various modalities of LSLAs. The recent adoption of Sustainable Development Goal 4 has further galvanized attention on the centrality of learning data in an unprecedented way, making LSLAs a crucial tool both to monitor learning and to guide policy action. As LSLAs continue to serve a growing number of purposes, they also raise concerns about a range of unintended, and sometimes, adverse effects. Evidence of such effects, however, is most often focused on the experience of high-income countries, and remains largely unsystematised. The promise of large-scale learning assessments: acknowledging limits to unlock opportunities (UNESCO 2019) reviews these experiences and highlights some of the limits in the design of LSLAs and the potential unintended consequences in the use of results. It argues that a fuller understanding of these concerns can help ensure that LSLAs contribute more effectively to improving learning quality and equity.
Valuing more than what can be measured
A first set of concerns stems directly from characteristics inherent to the design of LSLAs. One relates to the limited scope of learning habitually assessed, in great part conditioned by the need to ensure system-level or cross-country comparability. While this is not problematic in and of itself, the limited number of domains assessed may influence what learning is valued particularly when LSLAs take prominence over other forms of assessment. Having said this, we must recognize that multiple efforts are underway to develop measures of learning beyond foundational skills to include ‘21st century skills’, ‘transversal competencies’, digital literacy and other knowledge, skills and values related to social and civic learning. However, the conceptual and technical challenges with which these efforts are confronted point to a tension between the wide scope of learning we value and the limited scope of learning than can (easily) be measured. Rather than more measurement, what is arguably needed is better integration of standardized testing within broader assessment systems, combining them with other forms of evaluation.
Insufficient consideration for diversity and disadvantage
An additional concern that stems from assessment design relates to equity. Assessment practices hold great potential to advance the equity agenda. But for this to be possible, the design of assessments needs to account for the diversity of learners and allow for disaggregation of data by a wide range of factors. This has implications in terms of the translation and adaptation of test materials, the accommodation of learners with special needs, and the incorporation of vulnerable and minority populations – ensuring that the number of students from these groups fulfils the minimum requirement for a meaningful analysis. However, such imperatives are not always observed. Out-of-school children or children and youth enrolled in unregistered schools, or those in hard-to-reach areas, are generally not included in test samples. Likewise, children with cognitive or physical impairments, as well as those with insufficient language experience, are not always included in assessment efforts. In addition, assessments may not differentiate between pupils’ learning levels – creating ‘floor effects’ (when the assessment is too difficult) or ceiling effects (when the assessment is too easy). This seemingly excessive emphasis on comparability, at the expense of local validity, limits the usefulness of assessments to provide a finer understanding of the situation of disadvantaged learners.
Beyond data as an end in itself
Another set of limitations of LSLAs are related to the under-use or over-use of the resulting data and their combination with (or subordination to) accountability measures.
LSLAs run the risk of becoming futile investments and/or lost opportunities when results are not properly analysed and used to inform policy action. A range of factors may explain the underuse of assessment data to inform policy debates and guide policy action at national/local level. These factors include financial and technical barriers, lack of political will, as well as limited levels of institutionalization of assessment practices. In low- and middle-income countries, the effective use of assessment data is also frequently constrained by limited ownership – that is, by limited participation in and control over their design, funding and implementation.
There are also risks related to the overuse of assessment data – when excessive focus may distort policy priorities or suggest ill-suited courses of policy action. This may occur in a variety of ways. For instance, the impact of assessment data on shaping policy will be problematic if any ensuing policy action is oriented at improving figures rather than addressing the underlying causes of underperformance. Moreover, the use of assessment in results-based funding schemes increasingly used by the donor community entails an important risk as they can foster practices of “gaming the system” oriented at rapidly improving figures. In addition, the use of rankings within and between countries can encourage competition dynamics that, in turn, foster ill-informed policy borrowing practices.
Finally, a number of the risks typically associated with LSLAs are in fact the result of their combination with – or subordination to – accountability frameworks in which certain stakeholders, including teachers, principals and schools, are rewarded or sanctioned on the basis of student performance. The unintended effects of such arrangements include teaching to the test and narrowing of the curriculum. While such risks are frequently discussed as a product of high-stakes examinations used to make decisions about students, there is admittedly a more limited understanding and recognition of the consequences of LSLAs. Yet, the increased importance attached to globally comparable learning data derived from LSLAs runs the risk of unintentionally promoting competition among schools and/or countries, and, more worryingly, influencing teaching and learning strategies and content.
Overall, these dynamics suggest that while LSLAs hold great potential, the benefits need to be weighed against the potential risks. They call for greater vigilance to ensure a balanced design of tools, and controlled use of the results – if we are to ensure progress towards better learning for all.
About the authors: Sobhi Tawil, Head Education Research and Foresight, UNESCO. Email: s.tawil@unesco.org; Maya Prince, Associate Project Officer, Education Research and Foresight, UNESCO. Email: m.prince@unesco.org
Editor’s note: This post is published in conjunction with UNESCO’s new report: The Promise of Large-scale Learning Assessments: Acknowledging Limits to Unlock Opportunities, which is available for download in English, French and Spanish (Arabic forthcoming).
Contribute: The NORRAG Blog provides a platform for debate and ideas exchange for education stakeholders. Therefore if you would like to contribute to the discussion by writing your own blog post please visit our dedicated contribute page for detailed instructions on how to submit.
Disclaimer: NORRAG’s blog offers a space for dialogue about issues, research and opinion on education and development. The views and factual claims made in NORRAG posts are the responsibility of their authors and are not necessarily representative of NORRAG’s opinion, policy or activities.
The blog makes very sensible points. Furthermore, most governments really cannot change curricula and instruction on the basis of test scores. Journalists have a field day with the results for a while, everyone gets horrified, and then life goes on. The brain uses mechanisms that cannot easily be attended to by test scores.
Testing is prevalent, partly because international organizations hire staff with quantitative skills, who love getting data to analyze. Also, testing is much easier and cheaper than actual teaching. Institutions such as the World Bank, finance measurement, so the testing industry has burgeoned. Actual teaching, and all the cognitive science that can improve learning, has been left up to local educators. After all, the officials of the international organizations will not be held accountable for performance in poor countries. They are only accountable for tasks such as making presentations at international conferences.
The reservations are important: Valuing the Immeasurable; Diversity and Disadvantage; Data as an End in Itself. But underlying all of this is the perpetuation of the Dark Ages (i.e. pre-Digital) compulsion to calculate, categorise, compare and criticise. With Digitisation, a fundamental worldwide educational transformation is beginning to occur: One World One School. From now and forever PISA and league tables and failing schools and Large-Scale Learning Assessments are second millennium hangovers and the only thing educational worth measuring is the extent to which the world’s learners – and their teachers – are enjoying the learning experience. The rest is vanity, and outdated vanity at that.