The recent wave of randomized trials in development economics has provoked criticisms regarding external validity. We investigate two concerns—heterogeneity across beneficiaries and implementers—in a randomized trial of contract teachers in Kenyan schools. The intervention, previously shown to raise test scores in NGO-led trials in Western Kenya and parts of India, was replicated across all Kenyan provinces by an NGO and the government. Strong effects of short- term contracts produced in controlled experimental settings are lost in weak public institutions: NGO implementation produces a positive effect on test scores across diverse contexts, while government implementation yields zero effect. The data suggests that the stark contrast in success between the government and NGO arm can be traced back to implementation constraints and political economy forces put in motion as the program went to scale.
I believe the paper is already well known by development economists.
One can see those results as worrisome because they tell us how difficult it is to scale up interventions to improve education (and effective policies in general) across different environments within one country, let along across countries, or even across continents.
One can also see the findings of the paper as a signal of the humongous challenging work ahead for RCTs economists to find out next the numerous circumstances under which such and such intervention can be scalable, which might require running RCTs to find out about RCTs (as the paper shows).
The findings of the paper are the tip of the iceberg. Many uncertainties lie beneath the surface, so to speak.
It is a very creative paper. It is by Tessa Bold, Mwangi Kimenyi, Germano Mwabu, Alice Ng’ang’a, and Justin Sandefur (March 2013) and it is titled: "Scaling Up What Works: Experimental Evidence on External Validity in Kenyan Education."
The authors conclude:
In the terminology of Shadish, Campbell and Cook’s (2002) classic text on gen- eralizing experimental results, this is a question of ‘construct validity’ rather than external validity per se, i.e., of identifying the higher order construct represented by the experimental treatment. In most of the experimental evaluation literature in development economics, the treatment construct is defined to include only the school- or clinic-level intervention, abstracting from the institutional context of these interventions. Our findings suggest that the treatment in this case was not a “contract teacher”, but rather a multi-layered organizational structure including monitoring systems, payroll departments, long-run career incentives and political pressures.