A1 and A6 of The Program Evaluation Standards regard both Justified Conclusions and Decisions, as well as Sound Designs and Analyses. Where A1 asks that evaluation conclusions and decisions be explicitly justified in the cultures and contexts where they have consequences, A6 asks that evaluations employ technically adequate designs and analyses that are appropriate for the evaluation purposes (Yarbrough et al., 2011, p. 165-167). In these instances, we regard standards which impact the potential accuracy of an evaluation. When discussing strategies for mitigating the hazards associated with these standards, previous coverage elucidated suitable actions ranging from integrating stakeholder knowledge frameworks, to clarifying roles amid the evaluation team, to properly defining what is meant by accuracy in the context of a given evaluation. Extant strategies discussed also include selecting designs based on the evaluation’s purpose, while still including enough flexibility in the design that compromise and uncertainty can be permitted during this iterative process. Here we discuss strategies in addition to those previously mentioned, and will instead focus on mitigating the hazards associated with the accuracy standards by exploring both the concept of triangulation, and of establishing validation in practice.
Triangulation is employed across quantitative, qualitative, and mixed methods research alike, as a means with which to prevent such common errors among research as establishing conclusions based on samples which are not representative of their stated population, and permits the reduction of confirmation bias among findings. As per Patton (2002), “Triangulation is ideal. It can also be expensive. A study’s limited budget and time frame will affect the amount of triangulation that is practical, as will political constraints in an evaluation. Certainly, one important strategy for inquiry is to employ multiple methods, measures, researchers, and perspectives – but to do so reasonably and practically” (p. 247). This operates as a strategy for mitigating hazards among the accuracy standards, as the primary intent of those standards is to provide context-specific conclusions which are defensible, and inclusive of those stakeholders involved in the process. Risks to accuracy are addressed by this strategy, as while a single data point, or collection method may provide a pertinent picture of the evaluation’s efficacy, the ability o further substantiate those results via additional methods and additional data only serve to strengthen what conclusions are reached. In addition to ensuring data integrity via a multi-method approach to data collection, establishing validation in practice remains an addition strategy for mitigating the hazards of our accuracy standards.
On establishing validation in practice, Goldstein & Behuniak (2011) comment, “In the Standards, validity is defined as the ‘degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests… The interrelationships among the interpretations and proposed uses of test scores and the sources of validity evidence define the validity argument for an assessment” (p. 180). What then becomes pertinent among the evaluation team’s efforts is to ensure any/all relevant validity evidence is collected, alongside the proposed uses of the methods selected and employed for a particular evaluation. There remains an onus upon the evaluation team to not only substantiate the conclusions borne of the team’s data collection and analysis of the immediate program and environment, there equally remains an onus upon this same team to first establish the validity of the methods and instruments chosen for that data collection and analysis. Simply requiring stakeholders to trust in the expert judgment of an evaluation team’s select of methods and instruments is cause for concern, as this does not permit the kind of inclusion of stakeholder knowledge frameworks mentioned above. Rather, to ensure that accuracy standards are upheld an inclusive process of iterative reviews of the proposed design and execution of the design with stakeholders groups predicates the necessary level of holism required to conclude a design and its instruments accurate. The evaluation team absolutely brings with it the knowledge, experience, and technical prowess necessary to perform a successful evaluation, yet doing so without consulting the knowledge and experience of stakeholders provides an opportunity for research to be performed which is not in alignment with what is intended of those employing such a team. Stakeholders, while not technical or content experts of AEA per se, would have as much to contribute on the selection of data points, methods employed, and analysis performed as the team itself, based solely on stakeholder involvement with the program directly and the experiences that interaction brings with it. They would not serve as the primary source of suggestions for design, yet would serve to discern which might serve the current situation best. Said of this paradox, Booth, Colomb, & Williams (2008) note, “A responsible researcher supports a claim with reasons based on evidence. But unless your readers think exactly as you do, they may draw a different conclusion or even think of evidence you haven’t” (p. 112).
Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed.). Chicago, IL: The University of Chicago Press.
Goldstein, J., & Behuniak, P. (2011). Assumptions in alternate assessment: An argument-based approach to validation. Assessment for Effective Intervention, 36(3), 179–191.
Patton, M. Q. (2002). Qualitative research & evaluation methods. Thousand Oaks, CA: Sage Publications, Inc.
Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.