You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.


In this special edition of Theory & Practice, six evaluation experts share their thoughts on how the field has progressed (or regressed) in the last 10 years and consider what the next steps should be.

Articles that appear in Theory & Practice occupy an important position and mission in The Evaluation Exchange. Tied directly to each issue’s theme, they lead off the issue and provide a forum for the introduction of compelling new ideas in evaluation with an eye toward their practical application. Articles identify trends and define topics that deserve closer scrutiny or warrant wider dissemination, and inspire evaluators and practitioners to work on their conceptual and methodological refinement.

An examination of the topics covered in Theory & Practice over the last decade reads like a chronicle of many of the major trends and challenges the evaluation profession has grappled with and advanced within that same timeframe—systems change, the rise of results-based accountability in the mid-1990s, advances in mixed methods, learning organizations, the proliferation of complex initiatives, the challenges of evaluating communications and policy change efforts, and the democratization of practices in evaluation methodology, among many others.

As this issue kicks off the tenth year of publication for The Evaluation Exchange, Harvard Family Research Project is devoting this section to a discussion of some of the trends—good and bad—that have impacted the theory and practice of evaluation over the last 10 years.

We asked six experts to reflect on their areas of expertise in evaluation and respond to two questions: (1) Looking through the lens of your unique expertise in evaluation, how is evaluation different today from what it was 10 years ago? and (2) In light of your response, how should evaluators or evaluation adapt to be better prepared for the future?

On Theory-Based Evaluation: Winning Friends and Influencing People
Carol Hirschon Weiss
Professor, Harvard University Graduate School of Education, Cambridge, Massachusetts

One of the amazing things that has happened to evaluation is that it has pervaded the program world. Just about every organization that funds, runs, or develops programs now calls for evaluation. This is true locally, nationally, and internationally; it is almost as true of foundations and voluntary organizations as it is of government agencies. The press for evaluation apparently arises from the current demand for accountability. Programs are increasingly called on to justify their existence, their expenditure of funds, and their achievement of objectives. Behind the calls for accountability is an awareness of the gap between almost unlimited social need and limited resources.

An exciting development in the last few years has been the emergence of evaluations based explicitly on the theory underlying the program. For some time there have been exhortations to base evaluations on program theory. I wrote a section in my 1972 textbook urging evaluators to use the program’s assumptions as the framework for evaluation.1 A number of other people have written about program theory as well, including Chen,2 Rossi (with Chen),3 Bickman,4 and Lipsey.5

For a while, nothing seemed to happen; evaluators went about their business in accustomed ways—or in new ways that had nothing to do with program theory. In 1997 I wrote an article, “How Can Theory-Based Evaluation Make Greater Headway?”6 Now theory-based evaluation seems to have blossomed forth. A number of empirical studies have recently been published with the words “theory-based” or “theory-driven” in the titles (e.g., Crew and Anderson,7 Donaldson and Gooler8). The evaluators hold the program up against the explicit claims and tacit assumptions that provide its rationale.

One of the current issues is that evaluators do not agree on what “theory” means. To some it refers to a down-to-earth version of social science theory. To others, it is the logical sequence of steps necessary for the program to get from here to there—say, the steps from the introduction of a new reading curriculum to improved reading. Or, when a violence prevention program is introduced into middle school, it’s what has to happen for students to reduce the extent of bullying and fighting. To others it is the plan for program activities from start to completion of program objectives, without much attention to intervening participant actions.

I tend to see “theory” as the logical series of steps that lays out the path from inputs to participant responses to further intervention to further participant responses and so on, until the goal is achieved (or breaks down along the way). But other evaluators have different conceptualizations. It is not necessary that we all agree, but it would be good to have more consensus on what the “theory” in theory-based evaluation consists of.

Program theory has become more popular, I think, because, first, it provides a logical framework for planning data collection. If a program is accomplishing what it intends to at the early stages, it is worth following it further. If the early phases are not realized (e.g., if residents in a low-income community do not attend the meetings that community developers call to mobilize their energies for school improvement), then evaluators can give early feedback to the program about the shortfall. They need not wait to collect data on intermediate and long-term outcomes if the whole process has already broken down.

A second reason has to do with complex programs where randomized assignment is impossible. In these cases, evaluators want some way to try to attribute causality. They want to be able to say, “The program caused these outcomes.” Without randomized assignment, causal statements are suspect. But if evaluators can show that the program moved along its expected sequence of steps, and that participants responded in expected ways at each step of the process, then they can claim a reasonable approximation of causal explanation.

A third advantage of theory-based evaluation is that it helps the evaluator tell why and how the program works. The evaluator can follow each phase posited by the theory and tell which steps actually connect to positive outcomes and which ones are wishful fancies.

I intended to end with an inspiring call to evaluators to test out theory-based evaluation more widely and share their experiences with all of us. When we understand the problems that beset efforts at theory-based evaluation, perhaps we can improve our understanding and techniques. But I’ve run out of space.

On Methodology: Rip Van Evaluation and the Great Paradigm War
Saumitra SenGupta
Research Psychologist, Department of Public Health, City & County of San Francisco

Rip Van Evaluation fell asleep in 1991 right after the American Evaluation Association (AEA) conference in Chicago. The qualitative-quantitative battle of the Great Paradigm War was raging all around him. Lee Sechrest, a supporter of the quantitative approach, had just delivered his presidential speech at the conference9 in response to the one given by qualitative advocate Yvonna Lincoln10 the year before.

Fast-forward a dozen years to 2003. At the AEA conference in Reno, the battle lines had become blurred and evaluators were no longer picking sides. David Fetterman was presenting on empowerment evaluation at the business meeting of the Theory-Driven Evaluation topical interest group,11 interspersed among presentations by Huey-Tsyh Chen,12 Stewart Donaldson,13 Mel Mark,14 and John Gargani.15 Rip was awakened by the loud applause while Jennifer Greene was receiving AEA’s esteemed Lazarsfeld award for contributions to evaluation theory and mixed methods. Rip’s shock at what he was witnessing caps the last 10 years in evaluation.

The past decade can be rightfully characterized as one where qualitative methods, and consequently mixed-method designs, came of age, coinciding with and contributing to a different understanding of what evaluative endeavor means. Understanding, recognizing, and appreciating the context and dimensions of such endeavors have become more salient and explicit. The “value-addedness” of qualitative methods has consequently become more apparent to evaluators, which in turn has made mixed-method designs commonplace—the currency of the day.

Rossi notes that the root of this debate lies in the 1960s16 and, while the “long-standing antagonism”17 was somewhat suppressed with the formation of the AEA, it became more prominent during the early 1990s through the Lincoln-Sechrest debate, characterized as “the wrestlers” by Datta.18 Reichardt and Rallis credit David Cordray, AEA President in 1992, for initiating a process for synthesis and reconciliation of the two traditions.19

House cautions the evaluator against becoming fixated with methods and the accompanying paradigm war. While acknowledging the importance of methods, House argues for giving the content of the evaluative endeavor the limelight it deserves.20 Datta provides a more historical perspective, arguing that the contrast between qualitative and quantitative methods was not as dichotomous as it was being made out to be.21 Nor was it accurate to portray a theorist-specific espousal of a particular methodology, as most of these theorists and practitioners have from time to time advocated for and employed other methods.

The successful integration started taking root with the pragmatists proposing incorporation of both types of methods for the purposes of triangulation, expansion, and validation, among others.22 These efforts were reinforced at a conceptual level by Cordray23 and were truly integrated by methods such as concept mapping whereby a fundamentally quantitative method is used in conjunction with focus groups and interpretive methods.24

The fruit of this early sowing can be seen in what Rip Van Evaluation woke up to in Reno. There have been many notable developments and advancements in evaluation theory and practice in areas such as ethics, cultural competence, use of evaluation, and theory-driven evaluation. The relationship between organizational development and evaluation theory has also been better defined, through areas such as learning organizations. But the acceptance of mixed-method designs and qualitative methods as coequal and appropriate partners has been a giant step.

On Evaluation Use: Evaluative Thinking and Process Use
Michael Quinn Patton
Faculty, Union Institute and University, Minneapolis, Minnesota

A major development in evaluation in the last decade has been the emergence of process use as an important evaluative contribution.25 Process use is distinguished from findings use and is indicated by changes in thinking and behavior, and program or organizational changes in procedures and culture stemming from the learning that occurs during the evaluation process. Evidence of process use is represented by the following kind of statement after an evaluation: “The impact on our program came not just from the findings, but also from going through the thinking process that the evaluation required.”

This means an evaluation can have dual tracks of impact: (1) use of findings and (2) helping people in programs learn to think and engage each other evaluatively.

Teaching evaluative thinking can leave a more enduring impact from an evaluation than use of specific findings. Specific findings typically have a small window of relevance. In contrast, learning to think evaluatively can have an ongoing impact. Those stakeholders actively involved in an evaluation develop an increased capacity to interpret evidence, draw conclusions, and make judgments.

Process use can contribute to the quality of dialogue in community and program settings as well as to deliberations in the national policy arena. It is not enough to have trustworthy and accurate information (the informed part of the informed citizenry). People must also know how to use information, that is, to weigh evidence, consider contradictions and inconsistencies, articulate values, and examine assumptions, to note but a few of the things meant by thinking evaluatively.

Philosopher Hannah Arendt was especially attuned to critical thinking as the foundation of democracy. Having experienced and escaped Hitler’s totalitarianism, she devoted much of her life to studying how totalitarianism is built on and sustained by deceit and thought control. In order to resist efforts by the powerful to deceive and control thinking, Arendt believed that people needed to practice thinking.

Toward that end she developed eight exercises in political thought. Her exercises do not contain prescriptions on what to think, but rather on the critical processes of thinking. She thought it important to help people think conceptually, to “discover the real origins of original concepts in order to distill from them anew their original spirit which has so sadly evaporated from the very keywords of political language—such as freedom and justice, authority and reason, responsibility and virtue, power and glory—leaving behind empty shells.”26 We might add to her conceptual agenda for examination and public dialogue such terms as performance indicators and best practices, among many evaluation jargon possibilities.

From this point of view, might we also consider every evaluation an opportunity for those involved to practice thinking? Every utilization-focused evaluation, by actively involving intended users in the process, would teach people how to think critically, thereby offering an opportunity to strengthen democracy locally and nationally.

This approach opens up new training opportunities for the evaluation profession. Most training is focused on training evaluators, that is, on the supply side of our profession. But we also need to train evaluation users, to build up the demand side, as well as broaden the general public capacity to think evaluatively.

On Evaluation Utilization: From Studies to Streams
Ray C. Rist
Senior Evaluation Officer, The World Bank, Washington, D.C.

For nearly three decades, evaluators have debated the variety of uses for evaluation. An evaluation has been generally understood to be a self-contained intellectual or practical product intended to answer the information needs of an intended user. The unit of analysis for much of this navel gazing has been the single evaluation, performed by either an internal or external evaluator and presumably used by stakeholders in expanding concentric circles. The debate about the use—and abuse—of evaluations has thus hinged on what evidence can be mustered to support evaluations’ direct, instrumental “impact” or “enlightenment.” Evaluators have attempted to identify other forms of use as well, such as conceptual/illuminative, persuasive, and interactional. These debates have reflected a notion of evaluations as discrete studies producing discrete “findings.”

The most recent developments have focused on new notions of process use or, notably, “influence.” Still, this debate among evaluators on the use of evaluations has been essentially a closed one, with evaluators talking only among themselves. Sadly, those involved seem oblivious to fundamental changes in the intellectual landscape of public management, organizational theory, information technology, and knowledge management.

But this era of endless debates on evaluation utilization should now end. New realities ask for, indeed demand, a different conceptualization about evaluation utilization.

We are in a new period where ever-accelerating political and organizational demands are reframing our thinking about the definition of what, fundamentally, constitutes evaluation and what we understand as its applications. This period is characterized by at least two important considerations. The first is the emergence of an increasingly global set of pressures for governments to perform effectively—to go beyond concerns with efficiency—and to demonstrate that their performance is producing desired results. The second is the spread of information technology, which allows enormous quantities of information to be stored, sorted, analyzed, and made available at little or no cost. The result is that where governments, civil societies, and policymakers are concerned, the value of individual evaluations is rapidly diminishing.

The issue is no longer the lack of evidence for instrumental use by those in positions of power who could make a difference. Rather, users of evaluative knowledge are now confronted with growing rivers of information and analysis systematically collected through carefully built monitoring systems. Users are fed with streams of information from the public, private, and nonprofit sectors in country after country across the globe. Witness the following four examples:

  1. The budget system in Chile (and soon in Mexico as well), which links evaluation performance information to budget allocations on a yearly basis.
  2. The 24-hour monitoring system in New York City on policing patterns and their relation to crime prevention and reduction.
  3. Databases in the United States continuously updated on the performance of medical devices.
  4. Continuous assessment of different poverty alleviation strategies in developing countries (Uganda being notable in this regard).

These examples suggest the range of evaluative information systems currently being built and deployed. None depends on individual, discrete evaluation studies.

Increasingly, a large proportion of contemporary organizations and institutions thrive on their participation in the knowledge processing cycle. (By focusing on the multiple dimensions of the knowledge processing cycle, they are seeking to bypass the endless discussions on what is knowledge and what is information. They want to stay out of that cul-de-sac.) These organizations and institutions understand and define themselves as knowledge-based organizations, whatever else they may do, be it sell insurance, teach medical students, fight AIDS, or build cell phones. In fact, and somewhat ironically, these organizations now talk not about scarcity, but about managing the information deluge. Use becomes a matter of applying greater and greater selectivity to great rivers of information.

Far from concentrating on producing more and more individual evaluation studies, we see that governments, nongovernmental organizations, and the private sector are all using new means of generating real-time, continuous flows of evaluative knowledge for management and corporate decisions. These new realities completely blur and make obsolete the distinctions between direct and indirect use, between instrumental and enlightenment use, and between short and long term use.

The views expressed here are those of the author and no endorsement by the World Bank Group is intended or should be inferred.

On Community-Based Evaluation: Two Trends
Gerri Spilka
Co-Director, OMG Center for Collaborative Learning, Philadelphia, Pennsylvania

With over 10 years of community-based evaluation experience under our belts at OMG, I look back at a range of evaluations—from formative to impact, and from ones focused on local and area-wide programs to broader national initiatives and “cluster evaluations” that reviewed entire areas of grantmaking intended to change a particular system in a region. Examples of these include evaluations of the Comprehensive Community Revitalization Program (CCRP) initiative in New York’s South Bronx, the Annie E. Casey Foundation’s Rebuilding Communities Initiative (RCI), the Fannie Mae Foundation’s Sustained Excellence Awards Program, and, more recently, a citywide cluster evaluation of grantmaking to support the development of various community responsive databases. As I look back, two big trends stand out. One represents a big shift; the other, a shift that has not gone far enough.

The big change has to do with who is listening to the conversations and the evaluation reports about community-based work. When OMG first completed the CCRP evaluation in 1993, we had a limited audience that included only a select group of other evaluators, grantmakers, and community-based activists. But around that time the knowledge dissemination potential of the Internet was rapidly becoming apparent, a change that helped support the expanding use of professional networks. Professionals in the field were developing a huge appetite for new practical knowledge of effective strategies and the Internet now provided the means to share it easily. As evaluators we became credible sources of opinion about effective community programs and in many cases we found ourselves brokering information as a new, valued commodity.

Also during this time, for a number of reasons, policymakers started listening to us more. They read our reports and checked our sites; we had their ear. Eager to advance change in their own communities, they wanted evidence of successful programs to turn them into policy. It became even more critical for us to demonstrate benefits in real outcomes—real numbers and real dollars saved.

Another trend that has appeared over the past decade, but that has thus far not borne enough fruit, is the increasing attention to outcomes thinking throughout the field. The problem here is that despite the new outcomes fascination, progress has been slow in harnessing this thinking to improve practice. Particularly troubling is our own and our clients’ inability to be realistic about what kinds of outcomes we can expect from the work of the best minds and hearts of community activists within the timeframes of most grants and programs.

Ten years ago, five million dollars in a community over eight years seemed like a lot of money and a long commitment. We hoped we would see incredible outcomes as the result of these investments. Our first-generation logic models for comprehensive community revitalization efforts included, among many others, changes such as “reductions in drug-related crime, existence of an effective resident governance of public human services, and increases in employment.” Our good intentions, sense of mission, and optimism set us up to expect dramatic neighborhood change in spite of decades of public neglect. Nor did we always realistically factor in other community variables at play. In many cases, we also expected inexperienced and under-capacitated community-based organizations to collect data for us—an assumption that, not surprisingly, led to great disappointment in the quality of the data collected.

Sadly, despite lots of experience to draw from, we have not yet developed a thorough understanding of what constitute reasonable outcomes for these initiatives, nor have we come to agree on the most effective ways to collect the data essential to sound evaluation. As a result, we still run the risk of making poor cases for the hard and passionate work of those struggling to improve communities. Gary Walker and Jean Baldwin Grossman recently captured this dilemma well. They argue that the outcomes movement has come of age and that never before have foundations and government been so focused on accountability and outcomes. Accountability and learning about what works is a good thing. However, “even successful…programs rarely live up to all the expectations placed in them.”27

As we look to the future, being realistic about outcomes and measuring them effectively remain challenges. In the near-term political environment it may prove harder to make our case. But we do now have the ears of policymakers in an unprecedented way. We have the means to influence more people about what is possible with the resources available. We must be rigorous, not just about measuring results, but also about setting expectations for what is possible.

On Evaluation and Philanthropy: Evaluation in a New Gilded Age
John Bare
Director of Planning and Evaluation, John S. and James L. Knight Foundation, Miami, Florida

If we’re in a new Gilded Age, it’s distinguished not by any new human frailty. Mark Twain’s 19th century observation that man’s chief end is to get rich, “dishonestly if we can, honestly if we must,” certainly pertains in the 21st. What’s different are management and evaluation practices that help us construct falsely precise measures in order to allocate resources to our liking.

Today’s “corporate carnage,” as The Wall Street Journal puts it, lays bare the myth that Fortune 500 management, and evaluation, will deliver philanthropy from the wilderness. Philanthropy has been urged to adopt practices that have contributed to, or at least made possible, Fortune 500 thievery. Adopted by governments, these practices gave us the Houston dropout scandal. No longer protected by tenure, principals ordered to make dropouts disappear—or else—reacted as rationally as Wall Street executives: They reported phony numbers.

Yet promoters keep hawking management and evaluation games that promise relief from hard-nosed questions of validity, internal and external. And I’ll be damned if we aren’t biting. Like a sucker on the carnival midway, philanthropy’s booby prize is a cluster of pint-sized tables and graphics, called a “dashboard” for its mimicry of the gauge displays in cars. This innovation satisfies foundation trustees who refuse more than a page of explanation about knotty social change strategies.

The most promising remedies sound like riddles. To reject single-minded claims of measurement certainty does not require us to also reject the obligation to demonstrate value to society. Ducking traps at the other extreme, we can value results without devaluing process. Both matter—what we accomplish and how we accomplish it—because values matter. When one man gets rich by stealing and another by hard work, the only thing separating them is how they got it done. The how matters, but only to the degree that it’s connected to the what.

Wise voices are rising up. Michigan psychology professor Karl Weick tells Harvard Business Review that effective organizations “refuse to simplify reality.” These “high-reliability organizations,” or HROs, remain “fixed on failure. HROs are also fiercely committed to resilience and sensitive to operations.”28 Daniel Kahneman, the Princeton psychology professor who won the 2002 Nobel Prize in economics, explains in Harvard Business Review how an “outside view” can counter internal biases. Without it, Kahneman’s “planning fallacy” takes hold, giving us “decisions based on delusional optimism.”29

Delusions swell expectations, which in turn ratchet up pressure to cook the numbers, as illustrated by USA Today’s item on Coca-Cola whistle-blower Matthew Whitley: “Just before midnight at the end of each quarter in 2002, Whitley alleges, fully loaded Coke trucks ‘would be ordered to drive about two feet away from the loading dock’ so the company could book ‘phantom’ syrup sales as part of a scheme to inflate revenue by tens of millions of dollars.”30

Embracing the same management and evaluation practices, philanthropy will be ripe for the same whistle-blowing. Salvation lies in the evaluation paradox. Distilled, it is this: Our only hope for doing well rests on rewarding news about and solutions for whatever it is we’re doing poorly.

1 Weiss, C. H. (1972). Evaluation research: Methods of assessing program effectiveness. Englewood Cliffs, NJ: Prentice-Hall.
2 Chen, H. T. (1990). Issues in constructing program theory. New Directions for Program Evaluation, 47, 7–18.
3 Chen, H. T., & Rossi, P. (1983). Evaluating with sense: The theory-driven approach. Evaluation Review, 7(3), 283–302.
4 Bickman, L. (Ed.). (1987). Using program theory in evaluation. New Directions for Program Evaluation, 33, 5–18.
5 Lipsey, M. W. (1993). Theory as method: Small theories of treatments. New Directions for Program Evaluation, 57, 5–38.
6 Weiss, C. H. (1997). How can theory-based evaluation make greater headway? Evaluation Review, 21(4), 501–524.
7 Crew, R. E., Jr., & Anderson, M. R. (2003). Accountability and performance in charter schools in Florida: A theory-based evaluation. American Journal of Evaluation, 24(2), 189–212.
8 Donaldson, S. I., & Gooler, L. E. (2003). Theory-driven evaluation in action: Lessons from a $20 million statewide work and health initiative. Evaluation and Program Planning, 26(4), 355–366.
9 Sechrest, L. (1992). Roots: Back to our first generations. Evaluation Practice, 13(1), 1–7.
10 Lincoln, Y. (1991). The arts and sciences of program evaluation. Evaluation Practice, 12(1), 1–7.
11 Fetterman, D. M. (2003). Theory-driven evaluation with an empowerment perspective. Paper presented at the American Evaluation Association annual conference, Reno, NV.
12 Chen, H. T. (2003). Taking the perspective one step further: Providing a taxonomy for theory-driven evaluators. Paper presented at the American Evaluation Association annual conference, Reno, NV.
13 Donaldson, S. I. (2003). The current status of theory-driven evaluation: How it works and where it is headed in the future. Paper presented at the American Evaluation Association annual conference, Reno, NV.
14 Mark, M. M. (2003). Discussant’s remarks presented at the Theory-Driven Evaluation Topical Interest Group business meeting at the American Evaluation Association annual conference, Reno, NV.
15 Gargani, J. (2003). The history of theory-based evaluation: 1909 to 2003. Paper presented at the American Evaluation Association annual conference, Reno, NV.
16 Rossi, P. (1994). The war between the quals and the quants: Is a lasting peace possible? New Directions for Program Evaluation, 61, 23–36.
17 Reichardt, C. S. & Rallis, S. F. (1994). Editors’ notes. New Directions for Program Evaluation, 61, 1.
18 Datta, L. E. (1994). Paradigm wars: A basis for peaceful coexistence and beyond. New Directions for Program Evaluation, 61, 53–70.
19 Reichardt & Rallis 1.
20 House, E. R. (1994). Integrating the quantitative and qualitative. New Directions for Program Evaluation, 61, 13–22.
21 Datta 53–70.
22 Mathison, S. (1988). Why triangulate? Educational Researcher, 17(2), 13–17; Greene, J. C., Caracelli, B. J., & Graham, W. F. (1989). Toward a conceptual framework for mixed-method evaluation designs. Educational Evaluation and Policy Analysis, 11(3), 255–274.
23 Cordray, D. S. (1993). Synthesizing evidence and practices. Evaluation Practice, 14(1), 1–8.
24 Trochim, W. M. K. (1989). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning, 12(1), 1–16.
25 Patton, M. Q. (1997). Utilization-focused evaluation: The new century text (3rd ed.). Thousand Oaks, CA: Sage.
26 Arendt, H. (1968). Between past and future: Eight exercises in political thought (pp. 14–15). New York: The Viking Press.
27 Walker, G., & Grossman, J. B. (1999). Philanthropy and outcomes: Dilemmas in the quest for accountability. Philadelphia: Public Private Ventures.
28 Weick, K. E, & Coutu, D. L. (2003, April). Sense and reliability: A conversation with celebrated psychologist Karl E. Weick. Harvard Business Review, 84–90.
29 Lovallo, D., & Kahneman, D. (2003, July). Delusions of success: How optimism undermines executives’ decisions. Harvard Business Review, 56–63.
30 Smith, E. B. (2003, May 21). Coke investigates internal fraud allegations. USA Today, p. 1B.

‹ Previous Article | Table of Contents | Next Article ›

Home |  HGSE Home |  Site Map |  Site Help |  Privacy |  Contact Us |  RSS

© 2017 Presidents and Fellows of Harvard College
Published by Harvard Family Research Project