Rema Hanna, S. Mullainathan, Joshua Schwartzstein (2014)
Learning Through Noticing: Theory and Evidence from a Field ExperimentQuarterly Journal of Economics, 129
Erica Field, R. Pande, John Papp, Natalia Rigol (2013)
Does the Classic Microfinance Model Discourage Entrepreneurship Among the Poor? Experimental Evidence from India †The American Economic Review, 103
M. Kremer, J. Leino, E. Miguel, A. Zwane (2009)
Spring Cleaning: Rural Water Impacts, Valuation and Property Rights InstitutionsRandomized Social Experiments eJournal
(2005)
New Development Economics and the Challenge to Theory.” InNew Directions in Development Economics: Theory or Empirics? A Symposium in Economic and PoliticalWeekly, edited by R. Kanbur
U. Gneezy, K. Leonard, J. List (2008)
Gender Differences in Competition: Evidence from a Matrilineal and a Patriarchal SocietyMicroeconomic Theory eJournal
Banerjee (2005)
?New Development Economics and the Challenge to Theory.?New Directions in Development Economics: Theory or Empirics? A Symposium in Economic and Political Weekly
Bloom (2013)
?Does Management Matter? Evidence from India.?Quarterly Journal of Economics, 128
Moussa Blimpo (2014)
Team Incentives for Education in Developing Countries A Randomized Field Experiment in BeninAmerican Economic Journal: Applied Economics, 6
R. Tella, Ernesto Schargrodsky (2009)
Criminal Recidivism after Prison and Electronic MonitoringJournal of Political Economy, 121
K. Muralidharan, Paul Niehaus, Sandip Sukhtankar (2017)
General Equilibrium Effects of (Improving) Public Employment Programs: Experimental Evidence from IndiaDevelopment Economics: Regional & Country Studies eJournal
Olken (2014)
?Should Aid Reward Performance? Evidence from a Field Experiment on Health and Education in Indonesia.?American Economic Journal: Applied Economics, 6
A. Deaton, Nancy Cartwright (2016)
Understanding and Misunderstanding Randomized Controlled TrialsBehavioral & Experimental Economics eJournal
C. Blattman, N. Fiala, Sebastian Martinez (2013)
Generating Skilled Self-Employment in Developing Countries: Experimental Evidence from UgandaERN: Other Macroeconomics: Employment
Lori Beaman, Jeremy Magruder (2012)
Who Gets the Job Referral? Evidence from a Social Networks ExperimentThe American Economic Review, 102
Bauer (2012)
?Behavioral Foundations of Microcredit: Experimental and Survey Evidence from Rural India.?American Economic Review, 102
J. Mccambridge, J. Witton, D. Elbourne (2014)
Systematic review of the Hawthorne effect: New concepts are needed to study research participation effects☆Journal of Clinical Epidemiology, 67
E. Roetman (2012)
A can of worms? Implications of rigorous impact evaluations for development agencies
E. King, J. Behrman (2008)
Timing and Duration of Exposure in Evaluations of Social ProgramsPolitical Economy: Government Expenditures & Related Policies eJournal
Cai (2009)
?Observational Learning: Evidence from a Randomized Natural Field Experiment.?American Economic Review, 99
de Mel (2009)
?Returns to Capital in Microenterprises: Evidence from a Field Experiment.?Quarterly Journal of Economics, 124
Suresh Mel, D. McKenzie, C. Woodruff (2012)
The Demand for, and Consequences of, Formalization Among Informal Firms in Sri LankaERN: Other Development Economics: Microeconomic Issues in Developing Economies (Topic)
D. Rodrik (2008)
The New Development Economics: We Shall Experiment, but How Shall We Learn?Emerging Markets: Economics eJournal
(2010)
CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials
J. Temple (2010)
Aid and ConditionalityHandbook of Development Economics, 5
Olivier Armantier, Amadou Boly (2013)
Comparing Corruption in the Laboratory and in the Field in Burkina Faso and in CanadaInstitutions & Transition Economics: Political Economy eJournal
Muralidharan (2015)
?The Aggregate Effect of School Choice: Evidence from a Two-Stage Experiment in India.?Quarterly Journal of Economics, 130
Timothy Besley, Konrad Burchardi, Maitreesh Ghatak (2012)
Incentives and the De Soto EffectQuarterly Journal of Economics, 127
D. Coady, R. Harris (2004)
Evaluating Transfer Programmes within a General Equilibrium FrameworkPublic Economics eJournal
Tessa Bold, M. Kimenyi, G. Mwabu, A. Ng’ang’a, J. Sandefur (2013)
Scaling Up What Works: Experimental Evidence on External Validity in Kenyan EducationBehavioral & Experimental Economics eJournal
M. Ravallion (2012)
Fighting Poverty One Experiment at a Time: Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty : Review EssayJournal of Economic Literature, 50
Gneezy (2009)
?Gender Differences in Competition: Evidence from a Matrilineal and a Patriarchal Society.?Econometrica, 77
Rajeev Dehejia (2015)
Experimental and Non-Experimental Methods in Development Economics: A Porous DialecticJournal of Globalization and Development, 6
Vivi Alatas, A. Banerjee, Rema Hanna, B. Olken, Julia Tobias (2010)
Targeting the Poor: Evidence from a Field Experiment in IndonesiaNBER Working Paper Series
P. Glewwe, M. Kremer, Sylvie Moulin (2007)
Many Children Left Behind? Textbooks and Test Scores in KenyaLabor: Human Capital
Jonas Hjort (2013)
Ethnic Divisions and Production in FirmsCESifo: Labour Markets (Topic)
Eble (2017)
?On Minimizing the Risk of Bias in Randomized Controlled Trials in Economics.?World Bank Economic Review, 31
J. Cilliers, Oeindrila Dube, B. Siddiqi (2015)
The white-man effect: How foreigner presence affects behavior in experimentsJournal of Economic Behavior and Organization, 118
Banerjee (2017)
?From Proof of Concept to Scalable Policies: Challenges and Solutions, with an Application.?Journal of Economic Perspectives, 31
David Evans, Anna Popova (2015)
What Really Works to Improve Learning in Developing Countries? An Analysis of Divergent Findings in Systematic ReviewsWorld Bank Policy Research Working Paper Series
Jishnu Das, S. Dercon, James Habyarimana, P. Krishnan, K. Muralidharan, V. Sundararaman (2011)
School Inputs, Household Substitution, and Test ScoresMicroeconomics: Intertemporal Choice & Growth eJournal
E. Stuart, S. Cole, Catherine Bradshaw, P. Leaf (2011)
The use of propensity scores to assess the generalizability of results from randomized trialsJournal of the Royal Statistical Society: Series A (Statistics in Society), 174
Peters (2016)
?Policy Evaluation, Randomized Controlled Trials, and External Validity?A Systematic Review.?Economics Letters, 147
(2016)
How Much Can We Generalize from Impact Evaluations
(2005)
`New Directions in Development Economics: Theory or Empirics? ́, in BREAD Working Paper No
P. Dupas, Jonathan Robinson (2009)
Savings Constraints and Microenterprise Development: Evidence from a Field Experiment in KenyaSocial Science Research Network
Nava Ashraf, Erica Field, Jean Lee (2014)
Household bargaining and excess fertility: An experimental study in Zambia.The American Economic Review, 104
Marianne Bertrand, Dean Karlan, S. Mullainathan, E. Shafir, Jonathan Zinman (2009)
What's Advertising Content Worth? Evidence from a Consumer Credit Marketing Field ExperimentEntrepreneurship & Marketing eJournal
David Evans, Anna Popova (2017)
Cash Transfers and Temptation GoodsEconomic Development and Cultural Change, 65
M. Burke (2014)
Selling low and buying high: An arbitrage puzzle in Kenyan villages
M. Voors, Eleonora Nillesen, Philip Verwimp, E. Bulte, R. Lensink, D. Soest (2012)
Violent conflict and behavior: A field experiment in BurundiThe American Economic Review, 102
Dean Karlan, Martín Valdivia (2006)
Teaching Entrepreneurship: Impact of Business Training on Microfinance Clients and InstitutionsReview of Economics and Statistics, 93
E. Duflo, M. Greenstone, R. Pande, Nicholas Ryan (2013)
Truth-Telling by Third-Party Auditors and the Response of Polluting Firms: Experimental Evidence from IndiaPSN: Analysis of Experimental Data (Topic)
Acharya (2012)
?The Impact of Health Insurance Schemes for the Informal Sector in Low- and Middle-Income Countries: A Systematic Review.?World Bank Research Observer, 28
O. Attanasio, Adriana Kugler, C. Meghir (2011)
Subsidizing Vocational Training for Disadvantaged Youth in Colombia: Evidence from a Randomized TrialAmerican Economic Journal: Applied Economics, 3
Tarozzi (2014)
?Micro-loans, Insecticide-Treated Bednets, and Malaria: Evidence from a Randomized Controlled Trial in Orissa, India.?American Economic Review, 104
Crépon (2013)
?Do Labor Market Policies have Displacement Effects? Evidence from a Clustered Randomized Experiment.?Quarterly Journal of Economics, 128
Hjort (2014)
?Ethnic Divisions and Production in Firms.?Quarterly Journal of Economics, 129
Thomas Fujiwara, Léonard Wantchékon (2013)
Can Informed Public Deliberation Overcome Clientelism? Experimental Evidence from BeninAmerican Economic Journal: Applied Economics, 5
K. Muralidharan, V. Sundararaman (2010)
*The Impact of Diagnostic Feedback to Teachers on Student Learning: Experimental Evidence from IndiaThe Economic Journal, 120
E. Duflo, M. Kremer, Jonathan Robinson (2009)
Nudging Farmers to Use Fertilizer: Theory and Experimental Evidence from KenyaAgricultural & Natural Resource Economics
M. Kremer, E. Miguel, R. Thornton (2004)
Incentives to LearnThe Review of Economics and Statistics, 91
Pedro Vicente (2014)
Is Vote Buying Effective? Evidence from a Field Experiment in West AfricaWiley-Blackwell: Economic Journal
Kowalski (2016)
?How to Examine External Validity within an Experiment.?Mimeo.
Adrienne Lucas, I. Mbiti (2014)
Effects of School Quality on Student Achievement: Discontinuity Evidence from KenyaAmerican Economic Journal: Applied Economics, 6
Gechter (2016)
?Generalizing the Results from Social Experiments: Theory and Evidence.?
B. Olken, J. Onishi, Susan Wong (2012)
Should Aid Reward Performance? Evidence from a Field Experiment on Health and Education in IndonesiaNBER Working Paper Series
Gary Saretsky (1972)
The OEO P.C. Experiment and the John Henry Effect.Phi Delta Kappan
Karlan (2009)
?Observing Unobservables: Identifying Information Asymmetries With a Consumer Credit Field Experiment.?Econometrica, 77
(2016)
Generalizing the Results from Social Experiments: Theory and Evidence’, Working Paper, available online: sites.bu.edu/neudc/files/2014/10/paper_472.pdf [last checked: 8/29/2017
K. Basu (2014)
Randomisation, Causality and the Role of Reasoned IntuitionOxford Development Studies, 42
S. Muller (2015)
Causal Interaction and External Validity: Obstacles to the Policy Relevance of Randomized EvaluationsThe World Bank Economic Review, 29
P. Collier, Pedro Vicente (2014)
Votes and Violence: Evidence from a Field Experiment in NigeriaRandomized Social Experiments eJournal
Jessica Cohen, P. Dupas (2008)
Free Distribution or Cost-Sharing? Evidence from a Malaria Prevention ExperimentBehavioral & Experimental Economics
Aldashev (2017)
?Assignment Procedure Biases in Randomised Policy Experiments.?The Economic Journal, 127
Bruno Crépon, E. Duflo, M. Gurgand, Roland Rathelot, Philippe Zamora (2012)
Do Labor Market Policies Have Displacement Effects? Evidence from a Clustered Randomized ExperimentEmployee Social Responsibility & HR Practices eJournal
Collier (2014)
?Votes and Violence: Evidence from a Field Experiment in Nigeria.?Economic Journal, 124
D. Moher, S. Hopewell, K. Schulz, V. Montori, P. Gøtzsche, P. Devereaux, D. Elbourne, M. Egger, D. Altman (2010)
CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trialsThe BMJ, 340
Bertrand (2010)
?What's Advertising Content Worth? Evidence from a Consumer Credit Marketing Field Experiment.?Quarterly Journal of Economics, 125
Coady (2004)
?Evaluating Transfer Programmes within a General Equilibrium Framework.?Economic Journal, 114
Evans (2016)
?What Really Works to Improve Learning in Developing Countries? An Analysis of Divergent Findings in Systematic Reviews.?World Bank Research Observer, 31
P. Dupas, Jonathan Robinson (2011)
Why Don&Apos;T the Poor Save More? Evidence from Health Savings ExperimentsPublic Health Law & Policy eJournal
Dupas (2013)
?Why Don't the Poor Save More? Evidence from Health Savings, Experiments.?American Economic Review, 103
Macours (2014)
?Changing Households? Investment Behaviour through Social Interactions with Local Leaders: Evidence from a Randomised Transfer Programme.?Economic Journal, 124
Muller (2015)
?Causal Interaction and External Validity: Obstacles to the Policy Relevance of Randomized Experiments.?World Bank Economic Review, 29
Pearl (2014)
?External Validity: From Do-Calculus to Transportability across Populations.?Statistical Science, 29
K. Macours, Renos Vakis (2014)
Changing Households' Investment Behaviour Through Social Interactions with Local Leaders: Evidence from a Randomised Transfer ProgrammeMacroeconomics: Production & Investment eJournal
Blattman (2014)
?Generating Skilled Self-Employment in Developing Countries: Experimental Evidence from Uganda.?Quarterly Journal of Economics, 129
Duflo (2011)
?Nudging Farmers to Use Fertilizer: Theory and Experimental Evidence from Kenya.?American Economic Review, 101
David McKenzie, C. Woodruff (2012)
What are We Learning from Business Training and Entrepreneurship Evaluations Around the Developing World?Development Economics: Microeconomic Issues in Developing Economies eJournal
Drexler (2014)
?Keeping It Simple: Financial Literacy and Rules of Thumb.?American Economic Journal: Applied Economics, 6
A. Zwane, Jonathan Zinman, Eric Dusen, William Parienté, C. Null, E. Miguel, M. Kremer, Dean Karlan, R. Hornbeck, X. Giné, E. Duflo, F. Devoto, Bruno Crépon, A. Banerjee (2011)
Being surveyed can change later behavior and related parameter estimatesProceedings of the National Academy of Sciences, 108
M. Kremer (2008)
USING RANDOMIZATION IN DEVELOPMENT ECONOMICS RESEARCH: A TOOLKIT
Katherine Casey, R. Glennerster, E. Miguel (2011)
Reshaping Institutions: Evidence on Aid Impacts Using a Pre-Analysis PlanPublic Choice: Analysis of Collective Decision-Making eJournal
Sarah Baird, Craig Mcintosh, B. Ozler (2010)
Cash or Condition? Evidence from a Cash Transfer ExperimentWorld Bank Policy Research Working Paper Series
N. Ashraf (2009)
Spousal Control and Intra-household Decision Making: An Experimental Study in the PhilippinesThe American Economic Review, 99
R. Moffitt (2002)
The Role of Randomized Field Trials in Social Science ResearchAmerican Behavioral Scientist, 47
G. Kirchsteiger, Alexander Sebald, Gani Aldashev (2017)
Assignment Procedure Biases in Randomised Policy ExperimentsBehavioral & Experimental Economics eJournal
de Mel (2013)
?The Demand for, and Consequences of, Formalization among Informal Firms in Sri Lanka.?American Economic Journal: Applied Economics, 5
Martina Björkman (2007)
EVIDENCE FROM A RANDOMIZED FIELD EXPERIMENT OF A COMMUNITY-BASED MONITORING PROJECT IN UGANDA
Casey (2012)
?Reshaping Institutions: Evidence on Aid Impacts Using a Pre-Analysis Plan.?Quarterly Journal of Economics, 127
J. Chinkhumba, S. Godlonton, R. Thornton (2014)
The Demand for Medical Male CircumcisionAmerican Economic Journal: Applied Economics, 6
R. Jensen (2010)
The (Perceived) Returns to Education and the Demand for SchoolingQuarterly Journal of Economics, 125
Ravallion (2012)
?Fighting Poverty One Experiment at a Time: A Review of Abhijit Banerjee and Esther Duflo's Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty.?Journal of Economic Literature, 50
A. Deaton (2009)
Instruments of Development: Randomization in the Tropics, and the Search for the Elusive Keys to Economic DevelopmentRandomized Social Experiments eJournal
Glewwe (2009)
?Many Children Left Behind? Textbooks and Test Scores in Kenya.?American Economic Journal: Applied Economics, 1
Robinson (2012)
?Limited Insurance within the Household: Evidence from a Field Experiment in Kenya.?American Economic Journal: Applied Economics, 4
Björkman (2009)
?Power to the People: Evidence from a Randomized Field Experiment on Community-Based Monitoring in Uganda.?Quarterly Journal of Economics, 124
Duflo (2011)
?Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya.?American Economic Review, 101
M. Woolcock (2013)
Using case studies to explore the external validity of ‘complex’ development interventionsEvaluation, 19
McKenzie (2013)
?What Are We Learning from Business Training and Entrepeneurship Evaluations around the Developing World??World Bank Research Observer, 29
A. Banerjee, R. Banerji, James Berry, E. Duflo, Harini Kannan, Shobhini Mukerji, Marc Shotland, Michael Walton (2016)
From Proof of Concept to Scalable Policies: Challenges and Solutions, with an ApplicationPublic Choice: Analysis of Collective Decision-Making eJournal
Chassang (2012)
?Selective Trials: A Principal-Agent Approach to Randomized Controlled Experiments.?American Economic Review, 102
J. Aker, Christopher Ksoll, T. Lybbert (2012)
Can Mobile Phones Improve Learning? Evidence from a Field Experiment in NigerAmerican Economic Journal: Applied Economics, 4
R. Jensen, Nolan Miller (2008)
Do Consumer Price Subsidies Really Improve Nutrition?Review of Economics and Statistics, 93
E. Oster, R. Thornton (2011)
Menstruation, Sanitary Products, and School Attendance: Evidence from a Randomized EvaluationAmerican Economic Journal: Applied Economics, 3
Macours (2012)
?Cash Transfers, Behavioral Changes, and Cognitive Development in Early Childhood: Evidence from a Randomized Experiment.?American Economic Journal: Applied Economics, 4
E. Duflo, Rema Hanna, Stephen Ryan (2012)
Incentives Work: Getting Teachers to Come to School
Suresh Mel, D. McKenzie, C. Woodruff (2007)
Returns to Capital in Microenterprises: Evidence from a Field ExperimentIZA Institute of Labor Economics Discussion Paper Series
K. Muralidharan, V. Sundararaman (2012)
Preliminary and Incomplete: Please do not cite or circulate without authors’ permission The Aggregate Effect of School Choice: Evidence from a two-stage experiment in India
Dean Karlan, Jonathan Zinman (2005)
Observing Unobservables: Identifying Information Asymmetries with a Consumer Credit Field ExperimentMicroeconomic Theory eJournal
Dupas (2011)
?Do Teenagers Respond to HIV Risk Information? Evidence from a Field Experiment in Kenya.?American Economic Journal: Applied Economics, 3
R. Glennerster (2017)
The Generalizability Puzzle
A. Mosqueira (2014)
Technical Proposal for Replication of "Power to the People: Evidence from a Randomized Field Experiment on Community-Based Monitoring in Uganda"
Alatas (2012)
?Targeting the Poor: Evidence from a Field Experiment in Indonesia.?American Economic Review, 102
M. Pradhan, D. Suryadarma, A. Beatty, Maisy Wong, Armida Alishjabana, Arya Gaduh, Rima Artha (2011)
Improving Educational Quality Through Enhancing Community Participation: Results from a Randomized Field Experiment in IndonesiaLabor: Human Capital eJournal
Giné (2012)
?Credit Market Consequences of Improved Personal Identification: Field Experimental Evidence from Malawi.?American Economic Review, 102
A. Simons, Theresa Beltramo, Garrick Blalock, D. Levine (2017)
Using Unobtrusive Sensors to Measure and Minimize Hawthorne Effects: Evidence from Cookstoves
R. Jensen (2012)
Do Labor Market Opportunities Affect Young Women's Work and Family Decisions? Experimental Evidence from IndiaQuarterly Journal of Economics, 127
Dean Karlan, R. Osei, I. Osei-akoto, C. Udry (2012)
Agricultural Decisions after Relaxing Credit and Risk ConstraintsKauffman: Large Research Projects (Topic)
A. Eble, Peter Boone, D. Elbourne (2014)
On Minimizing the Risk of Bias in Randomized Controlled Trials in EconomicsBehavioral & Experimental Economics eJournal
Karlan (2014)
?Agricultural Decisions after Relaxing Credit and Risk Constraints.?Quarterly Journal of Economics, 129
Suresh Mel, D. McKenzie, C. Woodruff (2008)
Are Women More Credit Constrained? Experimental Evidence on Gender and Microenterprise ReturnsRandomized Social Experiments eJournal
Beaman (2009)
?Powerful Women: Does Exposure Reduce Bias??Quarterly Journal of Economics, 124
X. Giné, Jessica Goldberg, Dean Yang (2011)
Credit Market Consequences of Improved Personal Identification: Field Experimental Evidence from MalawiSRPN: Microcredit (Topic)
(2009)
D.McKenzie, and C.Woodruff
Bulte (2014)
?Behavioral Responses and the Impact of new Agricultural Technologies: Evidence from a Double-Blind Field Experiment in Tanzania.?American Journal of Agricultural Economics, 96
E. Miguel, Colin Camerer, K. Casey, J. Cohen, K. Esterling, A. Gerber, R. Glennerster, D. Green, M. Humphreys, G. Imbens, D. Laitin, T. Madon, L. Nelson, Brian Nosek, M. Petersen, R. Sedlmayr, J. Simmons, U. Simonsohn, M. Laan (2014)
Promoting Transparency in Social Science ResearchScience, 343
Alessandro Tarozzi, A. Mahajan, B. Blackburn, Dan Kopf, Lakshmi Krishnan, J. Yoong (2011)
Micro-Loans, Insecticide-Treated Bednets and Malaria: Evidence from a Randomized Controlled Trial in Orissa (India)Political Economy: Government Expenditures & Related Policies eJournal
B. Roe, D. Just (2009)
Internal and External Validity in Economics Research: Tradeoffs between Experiments, Field Experiments, Natural Experiments, and Field DataAmerican Journal of Agricultural Economics, 91
P. Dupas (2010)
Savings Constraints and Microenterprise Development : Evidence from a Field Experiment in Kenya
J. Aker (2013)
Scaling Up What Works: Experimental Evidence on External Validity in Kenyan Education
Angus Deaton (2010)
Instruments, Randomization, and Learning about DevelopmentJournal of Economic Literature, 48
Nancy Cartwright (2009)
What are randomised controlled trials good for?Philosophical Studies, 147
Baird (2011)
?Cash or Condition? Evidence from a Cash Transfer Experiment.?Quarterly Journal of Economics, 126
Barry Hirsch, Bruce Kaufman, T. Zelenska (2011)
Minimum Wage Channels of AdjustmentLabor: Human Capital eJournal
Achyuta Adhvaryu (2014)
Learning, Misallocation, and Technology Adoption: Evidence from New Malaria Therapy in Tanzania.The Review of economic studies, 81 4
King (2009)
?Timing and Duration of Exposure in Evaluations of Social Programs.?World Bank Research Observer, 24
Sylvain Chassang, Gerard Miquel, E. Snowberg (2010)
Selective Trials: A Principal-Agent Approach to Randomized Controlled ExperimentsRandomized Social Experiments eJournal
Bates (2017)
?The Generalizability Puzzle.?Stanford Social Innovation Review
E. Bulte, G. Beekman, Salvatore Falco, J. Hella, Pan Lei (2014)
Behavioral Responses and the Impact of New Agricultural Technologies: Evidence from a Double‐Blind Field Experiment in TanzaniaRandomized Social Experiments eJournal
Lori Beaman, Raghabendra Chattopadhyay, E. Duflo, R. Pande, P. Topalova (2008)
Powerful Women: Does Exposure Reduce Bias?NBER Working Paper Series
B. Feigenberg, Erica Field, R. Pande (2013)
The Economic Returns to Social Interaction: Experimental Evidence from MicrofinanceThe Review of Economic Studies, 80
J. Peters, Joerg Langbein, G. Roberts (2016)
Policy Evaluation, Randomized Controlled Trials, and External Validity – A Systematic ReviewERN: Primary Taxonomy (Topic)
Giné (2010)
?Put Your Money Where Your Butt Is: A Commitment Contract For Smoking Cessation.?American Economic Journal: Applied Economics, 2
Levitt (2009)
?Field Experiments in Economics: The Past, the Present, and the Future.?European Economic Review, 53
Alejandro Drexler, G. Fischer, A. Schoar (2010)
Keeping it Simple: Financial Literacy and Rules of ThumbRandomized Social Experiments eJournal
Jonathan Robinson (2008)
Limited Insurance within the Household: Evidence from a Field Experiment in KenyaLabor: Demographics & Economics of the Family
(2010)
Teacher Incentives.”American
Dupas (2014)
?Short-Run Subsidies and Long-Run Adoption of New Health Products: Evidence from a Field Experiment.?Econometrica, 82
E. Duflo, P. Dupas, M. Kremer (2008)
Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in KenyaNBER Working Paper Series
E. Duflo, M. Ravallion (2016)
Fighting Poverty One Experiment at a Time: A Review of Abhijit Banerjee and
Armantier (2013)
?Comparing Corruption in the Laboratory and in the Field in Burkina Faso and in Canada.?Economic Journal, 123
N. Bloom, Benn Eifert, A. Mahajan, D. McKenzie, J. Roberts (2011)
Does Management Matter? Evidence from IndiaBehavioral & Experimental Economics eJournal
O. Attanasio, C. Meghir, Ana Santiago (2005)
Education Choices in Mexico: Using a Structural Model and a Randomized Experiment to evaluate Progresa.∗
K. Macours, Norbert Schady, Renos Vakis (2008)
Cash Transfers, Behavioral Changes, and Cognitive Development in Early Childhood: Evidence from a Randomized ExperimentPublic Health Law & Policy
Leonardo Bursztyn, Lucas Coffman (2012)
The Schooling Decision: Family Preferences, Intergenerational Conflict, and Moral Hazard in the Brazilian FavelasJournal of Political Economy, 120
O. Attanasio, A. Barr, Juan-Camilo Cardenas, Garance Genicot, C. Meghir (2012)
Risk Pooling, Risk Preferences, and Social NetworksAmerican Economic Journal: Applied Economics, 4
L. Pritchett, J. Sandefur (2015)
Learning from Experiments When Context MattersThe American Economic Review, 105
Muralidharan (2011)
?Teacher Performance Pay: Experimental Evidence from India.?Journal of Political Economy, 119
de Mel (2009)
?Are Women More Credit Constrained? Experimental Evidence on Gender and Microenterprise Returns.?American Economic Journal: Applied Economics, 1
S. Levitt, John List (2008)
Field Experiments in Economics: the Past, the Present, and the FutureEconomic History eJournal
Duflo (2013)
?Truth-Telling by Third-Party Auditors and the Response of Polluting Firms: Experimental Evidence from India.?Quarterly Journal of Economics, 128
K. Basu, A. Foster (2014)
Development Economics and Method: A Quarter Century of ABCDEThe World Bank Economic Review, 29
A. Acharya, S. Vellakkal, F. Taylor, Edoardo Massett, A. Satija, M. Burke, S. Ebrahim (2013)
The Impact of Health Insurance Schemes for the Informal Sector in Low- and Middle-Income Countries: A Systematic ReviewPSN: Health Care (Topic)
Vicente (2014)
?Is Vote Buying Effective? Evidence from a Field Experiment in West Africa.?Economic Journal, 124
P. Dupas (2010)
Short-Run Subsidies and Long-Run Adoption of New Health Products: Evidence from a Field ExperimentPOL: Innovation & Strategy (Topic)
Felipe Barrera-Osorio, Marianne Bertrand, Leigh Linden, Francisco Pérez-Calle (2011)
Improving the Design of Conditional Transfer Programs: Evidence from a Randomized Education Experiment in Colombia †American Economic Journal: Applied Economics, 3
D. Burde, Leigh Linden (2013)
Bringing Education to Afghan Girls: A Randomized Controlled Trial of Village-Based SchoolsAmerican Economic Journal: Applied Economics, 5
Kremer (2011)
?Spring Cleaning: Rural Water Impacts, Valuation, and Property Rights Institutions.?Quarterly Journal of Economics, 126
Cohen (2010)
?Free Distribution or Cost-Sharing? Evidence from a Randomized Malaria Prevention Experiment.?Quarterly Journal of Economics, 125
Glewwe (2010)
?Teacher Incentives.?American Economic Journal: Applied Economics, 2
Pradhan (2014)
?Improving Educational Quality through Enhancing Community Participation: Results from a Randomized Field Experiment in Indonesia.?American Economic Journal: Applied Economics, 6
Nava Ashraf, James Berry, Jesse Shapiro (2010)
Can Higher Prices Stimulate Product Use? Evidence from a Field Experiment in ZambiaThe American Economic Review, 100
Adhvaryu (2014)
?Learning, Misallocation, and Technology Adoption.?Review of Economic Studies, 81
X. Giné, J. Hotz, G. Imbens, Larry Katz, C. Knittel, Dan Levy, Konrad Menzel, E. Oster, R. Pande (2014)
Site Selection Bias in Program Evaluation
Michal Bauer, Julie Chytilová, J. Morduch (2012)
Behavioral Foundations of Microcredit: Experimental and Survey Evidence from Rural IndiaIRPN: Innovation & Finance (Topic)
P. Gertler, Sebastian Martinez, Marta Rubio-Codina (2006)
Investing Cash Transfers to Raise Long Term Living StandardsDevelopment Economics
Das (2013)
?School Inputs, Household Substitution, and Test Scores.?American Economic Journal: Applied Economics, 5
Abstract When properly implemented, Randomized Controlled Trials (RCT) achieve a high degree of internal validity. Yet, if an RCT is to inform policy, it is critical to establish external validity. This paper systematically reviews all RCTs conducted in developing countries and published in leading economic journals between 2009 and 2014 with respect to how they deal with external validity. Following Duflo, Glennerster, and Kremer (2008), we scrutinize the following hazards to external validity: Hawthorne effects, general equilibrium effects, specific sample problems, and special care in treatment provision. Based on a set of objective indicators, we find that the majority of published RCTs does not discuss these hazards and many do not provide the necessary information to assess potential problems. The paper calls for including external validity dimensions in a more systematic reporting on the results of RCTs. This may create incentives to avoid overgeneralizing findings and help policy makers to interpret results appropriately. In recent years, intense debate has taken place about the value of Randomized Controlled Trials (RCTs).1 Most notably in development economics, RCTs have assumed a dominant role. The striking advantage of RCTs is that they overcome self-selection into treatment and thus their internal validity is indisputably high. This merit is sometimes contrasted with shortcomings in external validity (Basu 2014; Deaton and Cartwright 2016). Critics state that establishing external validity is more difficult for RCTs than for studies based on observational data (Moffit 2004; Roe and Just 2009; and Temple 2010; Dehejia 2015; Muller 2015; Prittchet and Sandefur 2015). This is particularly true for RCTs in the development context that tend to be implemented at smaller scale and in a specific locality. Scaling an intervention is likely to change the treatment effects because the scaled program is typically implemented by resource-constrained governments, while the original RCT is often implemented by effective NGOs or the researchers themselves (Ravallion 2012; Bold et al. 2013; Banerjee et al. 2017; Deaton and Cartwright 2016). This does not question the enormous contribution that RCTs have made to existing knowledge about the effectiveness of policy interventions. Rather, it underscores that “research designs in economics offer no free lunches—no single approach universally solves problems of general validity without imposing other limitations,” (Roe and Just 2009). Indeed, Rodrik (2009) argues that RCTs require “credibility-enhancing arguments” to support their external validity—just as observational studies have to make a stronger case for internal validity. Against this background, the present paper examines how the results published from RCT-based evaluations are reported, whether external validity-relevant design features are made transparent, and whether potential limitations to transferability are discussed. To this end, we conduct a systematic review of policy evaluations based on RCTs published in top economic journals. We include all RCTs published between 2009 and 2014 in the American Economic Review, the Quarterly Journal of Economics, Econometrica, the Economic Journal, the Review of Economic Studies, the Review of Economics and Statistics, the Journal of Political Economy and the American Economic Journal: Applied Economics. In total, we identified 54 RCT-based papers that appeared in these journals. Since there is no uniform definition of external validity and its hazards in the literature, in a first step we establish a theoretical framework deducing the assumptions required to transfer findings from an RCT to another policy population. We do this based on a model from the philosophical literature on the probabilistic theory of causality provided by Cartwright (2010), and based on a seminal contribution to the economics literature, the toolkit for the implementation of RCTs by Duflo, Glennerster, and Kremer (2008). We identify four hazards to external validity: (a) Hawthorne and John Henry Effects; (b) general equilibrium effects; (c) specific sample problems; and (d) problems that occur when the treatment in the RCT is provided with special care compared to how it would be implemented under real-world conditions. As a second step, we scrutinized the reviewed papers with regard to how they deal with the four external validity dimensions and whether required assumptions are discussed. Along the lines of these hazards we formulated seven questions, then read all 54 papers carefully with an eye toward whether they address these seven questions. All questions can be objectively answered by “yes” or “no”; no subjective rating is involved. External validity is not necessary in some cases. For example, when RCTs are used for accountability reasons by a donor or a government, the results are only interpreted within the evaluated population. Yet, as soon as these findings are used to inform policy elsewhere or at larger scale, external validity becomes a pivotal element. Moreover, test-of-a-theory or proof of concept RCTs that set out to disprove a general theoretical proposition speak for themselves and do not need to establish external validity (Deaton and Cartwright 2016). However, in academic research most RCTs presumably intend to inform policy, and as we will also confirm in the review, the vast majority of included papers appear to generalize findings from the study population to a different policy population.2 Indeed, RCT proponents in the development community advocate in favor of RCTs in order to create “global public goods” that “can offer reliable guidance to international organizations, governments, donors, and NGOs beyond national borders,” (Duflo, Glennerster, and Kremer 2008). As early as 2005, during a symposium on “New directions in development economics: Theory or empirics?” Abhijit Banerjee acknowledged the requirement to establish external validity for RCTs and, like Rodrik, called for arguments that establish the external validity of RCTs (Banerjee 2005). Indeed, Banerjee and Rodrik seem to agree that external validity is never a self-evident fact in empirical research, and that RCTs in particular should discuss in how far results are generalizable. In the remainder of the paper we first present the theoretical framework and establish the four hazards to external validity. Following that, the methodological approach and the seven questions are discussed. The results are presented in the next section, followed by a discussion section. The subsequent section provides an overview on existing remedies for external validity problems and ways to deal with them in practice. The final section concludes. Theoretical Background and Definition of External Validity Theoretical Framework Understanding what external validity exactly is and how it might be threatened is not clearly defined in the literature. What we are interested in here is the degree to which an internally valid finding obtained in an RCT is relevant for policy makers who want to implement the same intervention in a different policy population. Cartwright (2010) defines external validity in a way that is similar to the understanding conveyed in Duflo, Glennerster, and Kremer (2008): “External validity has to do with whether the result that is established in the study will be true elsewhere.” Cartwright provides a model based on the probabilistic theory of causality. Using this model we identify the assumptions that have to be made when transferring the results from an RCT to what a policy maker can expect if she scales the intervention under real-world conditions. Suppose we are interested in whether a policy intervention C affects a certain outcome E. We can state that C causes E if \begin{eqnarray*} P({E|C \& {K_i}}) > P(E|\bar{C} \&{K_i}) \end{eqnarray*} where Ki describes the environment and intervention particularities under which the observation is made, and $$\bar{C}$$ denotes the absence of the intervention. Assume this causal relationship was observed in population A and we want to transfer it to a situation in which C is introduced to another population, A’. In this case, Cartwright points out that those observations, Ki, have to be identical in both populations A and A’ as soon as they interfere with the treatment effect. More specifically, Cartwright formulates the following assumptions that are required: (a) A needs to be a representative sample of A’; (b) C is introduced in A’ as it was in the experiment in A; (c) the introduction leaves the causal structure in A’ unchanged. In the following, we use the language that is widely used in the economics literature and refer to the toolkit for the implementation of RCTs by Duflo, Glennerster, and Kremer (2008). Similar to the Cartwright framework, Duflo, Glennerster, and Kremer introduce external validity as the question “[. . .] whether the impact we measure would carry over to other samples or populations. In other words, whether the results are generalizable and replicable”. The four hazards to external validity that are identified by Duflo, Glennerster, and Kremer are Hawthorne and John Henry Effects, general equilibrium effects, the specific sample problem, and the special care problem. The following section presents these hazards to external validity in more detail. Under the assumption that observational studies mostly evaluate policy interventions that would have been implemented in every case, Hawthorne/John Henry Effects and the special care problem are much more likely in RCTs, while general equilibrium effects and the specific sample problem equally occur in RCTs and observational studies. Potential Hazards to External Validity In order to guide the introduction to the different hazards of external validity we use a stylized intervention of a cash transfer given to young adults in an African village. Suppose the transfer is randomly assigned among young male adults in the village. The evaluation examines the consumption patterns of the recipients. We observe that the transfer receivers use the money to buy some food for their families, football shirts, and air time for their mobile phones. In comparison, those villagers who did not receive the transfer will not change their consumption patterns. What would this observation tell us about giving a cash transfer to people in different set-ups? The answer to this question depends on the assumptions identified in Duflo, Glennerster, and Kremers’ nomenclature. Hawthorne and John Henry effects might occur if the participants in an RCT know or notice that they are part of an experiment and are under observation.3 It is obvious that this could lead to altered behavior in the treatment group (Hawthorne effect) and/or the control group (John Henry effect).4 In the stylized cash transfer example, the recipient of the transfer can be expected to spend the money for other purposes in case he knows that his behavior is under observation. It is also obvious that such behavioral responses clearly differ between different experimental set-ups. If the experiment is embedded into a business-as-usual setup, distortions of participants’ behavior are less likely. In contrast, if the randomized intervention interferes noticeably with the participants’ daily life (e.g., an NGO appearing in an African village to randomize a certain training measure among the villagers), participants will probably behave differently than they would under non-experimental conditions.5 The special care problem refers to the fact that in RCTs, the treatment is provided differently from what would be done in a non-controlled program. In the stylized cash transfer example, a lump sum payment that is scaled up would perhaps be provided by a larger implementing agency with less personal contact. Bold et al. (2013) provide compelling evidence for the special care effect in an RCT that was scaled up based on positive effects observed in a smaller RCT conducted by Duflo, Kremer, and Robinson (2011b). The major difference is that the program examined in Bold et al. was implemented by the national government instead of an NGO, as was the case in the Duflo et al. study. The positive results observed in Duflo, Kremer, and Robinson (2011b) could not be replicated in Bold et al. (2013): “Our results suggest that scaling-up an intervention (typically defined at the school, clinic, or village level) found to work in a randomized trial run by a specific organization (often an NGO chosen for its organizational efficiency) requires an understanding of the whole delivery chain. If this delivery chain involves a government Ministry with limited implementation capacity or which is subject to considerable political pressures, agents may respond differently than they would to an NGO-led experiment.” Vivalt (2017) confirms the higher effectiveness of RCTs implemented by NGOs or the researchers themselves as compared to RCTs implemented by governments in a meta-analysis of published RCTs. Further evidence on the special care problem is provided by Allcott (2015), who shows that electricity providers that implemented RCTs in cooperation with a large research program to evaluate household energy conservation instruments are systematically different from those electricity providers that do not participate in this program. This hints at what Allcott refers to as “site selection bias”, whereby organizations that agree to cooperate with researchers on an RCT can be expected to be different compared to those that do not, for example because their staff are more motivated. This difference could translate into higher general effectiveness. Therefore, the effectiveness observed in RCTs is probably higher than it will be when the evaluated program is scaled to those organizations that did not initially cooperate with researchers. The third identified hazard arises from potential general equilibrium effects (GEE).6 Typically, such GEE only become noticeable if the program is scaled to a broader population or extended to a longer term. In the stylized cash transfer example provided above, GEE occur if not only a small number of people but many villagers receive the transfer payment. In this scaled version of the intervention, some of the products that young male villagers want to buy become scarcer, and thus more expensive. This also illustrates that GEE can affect non-treated villagers, as prices increase for them as well. Moreover, in the longer term if the cash transfer program is implemented permanently, certain norms and attitudes towards labor supply or educational investment might change.7 This example indicates that GEE in their entirety are difficult to capture. The severity of GEE, though, depends on some parameters like the regional coverage of the RCT, the time horizon of the measurements, and the impact indicators that the study examines. Very small-scale RCTs or those that measure outcomes after a few months only are unlikely to portray the change in norms and beliefs that the intervention might entail. Furthermore, market-based outcomes like wages or employment status will certainly be affected by adjustments in the general equilibrium if an intervention is scaled and implemented over many years. As a matter of course, it is beyond the scope of most studies to comprehensively account for such GEE, and RCTs that cleanly identify partial equilibrium effects can still be informative for policy. A profound discussion of GEE-relevant features is nonetheless necessary to avoid the ill-advised interpretation of results. Note that GEE are not particular to RCTs and, all else being equal, the generalizability of the results from observational studies is also exposed by potential GEE. Many RCTs, particularly in developing country contexts, are however, limited to a specific region, a relatively small sample size, and short monitoring horizon, and are thus more prone to GEE than country-wide representative panel-data based observational studies. In a similar vein, the fourth hazard to external validity, the specific sample problem, is not particular to RCTs but might be more pronounced in this setting. The problem occurs if the study population is different from the policy population in which the intervention will be brought to scale. Taking the cash transfer example, the treatment effect for young male adults can be expected to be different if the cash transfer is given to young female adults in the same village or to young male adults in a different part of the country. Methods and Data Review Approach We reviewed all RCTs conducted in developing countries and published between 2009 and 2014 in the leading journals in economics. We included the five most important economics journals, namely the American Economic Review, Econometrica, Quarterly Journal of Economics, Journal of Political Economy, the Review of Economic Studies, as well as further leading journals that publish empirical work using RCTs such as American Economic Journal: Applied Economics, Economic Journal, and Review of Economics and Statistics. We scrutinized all issues in the period, particularly all papers that mention either the terms “field experiment”, “randomized controlled trials”, or “experimental evidence” in either the title or the abstract, or which indicated in the abstract or the title that a policy intervention was randomly introduced. We excluded those papers that examine interventions in an OECD member country.8 In total, 73 papers were initially identified. Our focus is on policy evaluation and we therefore excluded mere test-of-a-theory papers.9 In most cases, the demarcation was very obvious and we subsequently excluded 19 papers. In total, we found 54 papers based on an RCT to evaluate a certain policy intervention in a developing country.10 The distribution across journals is uneven, with the vast majority being published in American Economic Journal: Applied Economics, American Economic Review and Quarterly Journal of Economics (see figure 1). Figure 1. View largeDownload slide Published RCTs Between 2009 and 2014 Note: A total of 54 studies were included, frequencies appear in bold. Figure 1. View largeDownload slide Published RCTs Between 2009 and 2014 Note: A total of 54 studies were included, frequencies appear in bold. Figure 2 depicts the regional coverage of the surveyed RCTs. The high number of RCTs implemented in Kenya is due to the strong connection that two of the most prominent organizations that conduct RCTs have to the country (Innovation for Poverty Action [IPA] and the Abdul Latif Jameel Poverty Action Lab [J-Pal]). Most of these studies were implemented in Kenya's Western Province by the Dutch NGO International Child Support (ICS), IPA, and J-Pal's cooperation partner in the country.11 Figure 2. View largeDownload slide Countries of Implementation Note: A total of 54 studies were included, frequencies appear in bold. Figure 2. View largeDownload slide Countries of Implementation Note: A total of 54 studies were included, frequencies appear in bold. We read all 54 papers carefully (including the online supplementary appendix) to determine whether each paper addressed seven objective yes/no-questions. An additional filter question addresses whether the paper has the ambition to generalize. This is necessary, because it is sometimes argued that not all RCTs intend to generate generalizable results and are rather designed to test a theoretical concept. In fact, 96 percent of included papers do generalize (see next section for details on the coding of this question). This is no surprise, since we intentionally excluded test-of-a-theory papers and focused on policy evaluations. The remaining seven questions all address the four hazards to external validity outlined in the first, and examine whether the “credibility-enhancing arguments” (Rodrik 2009) are provided to underpin the plausibility of external validity. Appendix A in the appendix shows the answers to the seven questions for all surveyed papers individually. In general, we answered the questions conservatively, that is, when in doubt we answered in favor of the paper. We abstained from applying subjective ratings in order to avoid room for arbitrariness. A simple report on each paper documents the answers to the seven questions and the quote from the paper underlying the respective answer. We sent these reports out to the lead authors of the included papers and asked them to review our answers for their paper(s).12 For 36 of the 54 papers we received feedback, based on which we changed an answer from “no” to “yes” in 9 cases (out of 378 questions and answers in total). The comments we received from the authors are included in the reports, if necessary followed by a short reply. The revised reports were sent again to the authors for their information and can be found in the online supplementary appendix to this paper. Seven Questions To elicit the extent the paper accounts for Hawthorne and John Henry effects, we first asked the following objective questions: Does the paper explicitly say whether participants are aware (or not) of being part of an experiment or a study? This question accounts for whether a paper provides the minimum information that is required to assess whether Hawthorne and John Henry effects might occur. More would be desirable: in order to make a substantiated assessment of Hawthorne-like distortions, information on the implementation of the experiment, the way participants were contacted, which specific explanations they received, and the extent to which they were aware of an experiment should be presented. We assume (and confirmed in the review) that papers that receive a “no” for question 1 do not discuss these issues because a statement on the participants’ awareness of the study is the obvious point of departure for this discussion. It is important to note that unlike laboratory or medical experiments, participants in social science RCTs are not always aware of their participation in an experiment. Only for those papers that receive a “yes” to question 1 do we additionally pose the following question: If people are aware of being part of an experiment or a study, does the paper (try to) account for Hawthorne or John Henry effects (in the design of the study, in the interpretation of the treatment/mechanisms, or in the interpret