Post-Doc Blogpost: Engaging Evaluation Stakeholders

When it comes to addressing the potential challenges of working to engage stakeholders in the evaluation process, I believe those limitations exist along a number of dimensions. Those dimensions include the organization, the program, and the program’s evaluation. I begin with the organization as regardless of the number of steps taken to factor needs, include stakeholders, and design great evaluations, there remains the potential for lackluster results if the organization is structured in such a way that stakeholders are prevented from working with you or sharing critical input. We have asserted that organizational goals are established through coalition behavior, we have done so on grounds that organizations are interdependent with task-environment elements and that organizational components are also independent with one another; unilateral action is not compatible with interdependence, and the pyramid headed by the single all-powerful individual has become a symbol of complex organization but through historical and misleading accident (Thompson, 2004, p. 132). I mentioned work by Thompson in the past when writing on the task environment, this time I mention Thompson when regarding the structure of organizations themselves. At times the very structure of an organization can limit the opportunity for evaluators to reach all pertinent stakeholders. In highly structured, greatly hierarchical organizations it may require nothing less than multi-level approvals to seek the input of one highly critical yet lower-level stakeholder. The potential inclusion strategy to correct for this, seek first to establish a matrix organization within the larger organization, formed for the sole purpose of the continued efforts of the program evaluation. Seeking the involvement of an interdepartmental, matrix team of members as a cross-section through many levels of the organization and representing as many functions/divisions as possible, negates much of the limitation brought on by normative command and control structures, while ensuring representation is realized from nearly every corner of the organization.

The program also presents certain limitations to stakeholder inclusion and therefore engagement. A common difficulty with information from these sources is a lack of the specificity and concreteness necessary to clearly identify specific outcomes measures; for the evaluator’s purposes, an outcomes description must indicate the pertinent characteristic, behavior, or condition that the program is expected to change (Rossi, Lipsey, & Freeman, 2004, p. 209). The very design of the program, and the dissemination of its intent, can provide a limited view of the program to evaluators. This limits the ability of stakeholder inclusion, as an under-representative description of the program will deliver an under-representative description of required stakeholders. Where this is the case the strategy to ensure stakeholder inclusion rests on garnering a better understanding of the program itself. The strategy has less to do with identifying individuals, and more about identifying pertinent processes and impacts, and only then can relevant individuals be identified and included relative to those processes.

A third and final consideration for stakeholder engagement is the design of the evaluation, as much of the limitation of research deals with the limitations embedded in its design. When it comes to the complexities of interviewing, remember that the more you plan by determining exactly what you want to know, the more efficiently you will get what you need; you don’t always need to script an interview around a set list of questions, but prepare so that you don’t question your source aimlessly (Booth, Colomb, & Williams, 2008, p. 82). Just as specifying the program further permits the evaluator to know more about the program which is under evaluation and therefore who to ask about the program, it is equally effective to consider exactly what you wish to know from stakeholders prior to engaging them amid the evaluation process. The difference between aimlessly questioning a stakeholder for an hour, and purposively questioning a stakeholder for ten minutes is engagement. The latter interview begets an engaged stakeholder, where the former begets a rambling dialogue rife with detractors and partial information alike. This in mind the final strategy for engaging stakeholders more fully is to arrive better prepared for the evaluation, and for each interview. While this strategy also has little to do with something done to a stakeholder, or how the stakeholder is selected, based on the literature it is found what is done with a stakeholder when you have one makes all the difference.

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed.). Chicago, IL: The University of Chicago Press.

Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A systematic approach (7th Ed.). Thousand Oaks, CA: Sage Publications, Inc.

Thompson, J. D. (2004). Organizations in action: Social science bases of administrative theory. New Brunswick, NJ: Transaction Publishers.

Post-Doc Blogpost: Machine Scoring & Student Performance

I believe one concept at the heart of considering the validity of a writing assessment via automated essay scoring (AES) as a measure of student performance in a graduate program is fidelity. Fidelity has to do with the degree to which the task, response mode, and what is actually scored matches the requirements found in the real world (Zane, 2009, p. 87). This includes consideration for such elements as context, structure, and the item’s parameters. What lends to granting the validity of such an assessment is the fact that much of what is expected of a writing assessment mirrors the college experience. From considerations surrounding grammar and sentence structure, to critical thinking and conceptual integration, an automated writing assessment permits for greater fidelity among an exercise which emulates the student experience in myriad ways. Where attention must be paid, however, regards the validity of objective scoring provided by an AES system.

Machine scores are based on a limited set of quantifiable features in an essay while human holistic scores are based on a broader set of features, including many, such as the logical consistency of an argument, which cannot yet be evaluated by a machine (Bridgeman, Trapani, & Attali, 2012, p. 28). The assessment itself, then, is was provides much of the fidelity to the student experience. What has yet to be developed is a holistic way of interpreting and subsequently scoring essay responses with the same level of depth and consideration for elements not contained within a designed algorithm as that with human raters. Human raters can probe creativity, innovation, and integrative thinking. Human raters can identify off-topic content just as an AES system would, yet rather than reducing the score by default as would be the design per AES, the human rater can form an individual opinion regarding off-topic content and its relevance to the response. Where one is asked about the concept of truth in such an environment as Accuplacer testing, the scoring mechanism might deduct for the use of vernacular proximal to art for example, to which the author refers as art and truth alike are subjective interpretations. What AES does well, and human raters do not, however, is score essay responses with an equal level of consistency across multiple raters and multiple ratings.

As AES models often formed by using more than two raters, studies that have evaluated interrater agreement have usually showed that the agreement coefficients between the computer and human raters is at least as high or higher than among human raters themselves (Shermis, Burstein, Higgins, & Zechner, 2010, p. 22). This gives pause to doubt cast on an AES systems’ ability to accurately and reliably score work. Yet this reliability does not necessarily denote validity as we have previously discussed. Thus, an environment where at least one AES score and one human rater score are considered in conjunction, presents the most promising synthesis of both approaches regarding a single assessment item and score. Further understanding of rater cognition is necessary to have a more thorough understanding of what is implied by the direct calibration and evaluation of AES against human scores and what, precisely, is represented in a human score for an essay (Ramineni & Williamson, 2013, p. 37). This in mind it is critical that those designing such assessments remain attentive to differences between scores among human raters, the difference in scores between machine and human rater, and any differences which exist in the AES model’s ability to measure student performance against a particular rubric. Should the rubric cover such concepts as creativity, an element to which AES is disadvantaged, this emphasis on content will be the driving force to inform the decision to use AES. Ultimately, it will depend on the context, grading criteria as explained by the rubric, and the opportunity for joint assessment via human rater which drive a decision of whether AES is a valid source of objective writing assessment.

Bridgeman, B., Trapani, C., & Attali, Y. (2012). Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country. Applied Measurement in Education, 25(1), 27–40.

Ramineni, C., & Williamson, D. M. (2013). Automated essay scoring: Psychometric guidelines and practices. Assessing Writing, 18(1), 25–39.

Shermis, M. D., Burstein, J., Higgins, D., & Zechner, K. (2010). Automated essay scoring: Writing assessment and instruction. In P. Peterson, E. Baker, and B. McGaw (Eds.), International encyclopedia of education (3rd ed.), 20–26. Oxford, UK: Elsevier.

Zane, T. W. (2009). Performance assessment design principles gleaned from constructivist learning theory (Part 2). TechTrends, 53(3), 86–94.

Post-Doc Blogpost: Validity & Reliability in Performing Assessments

Designing a question for an instrument is designing a measure, an answer given to a question is of no intrinsic interest, and the answer is valuable only to the extent that it can be shown to have a predictable relationship to facts or subjective states that are of interest (Fowler, 2009, p. 87). Sweeping satisfaction questions such as with one’s job or degree program are, inherently, of little value to an assessment in their extant form. They contain limited value as there are a number of differing subjective states which might be experienced by a student throughout his/her degree program as an example. A student can find great value in his/her first courses, or potentially an entire year, yet subsequently find little value in those courses which remain. This alone would indicate the item’s inability to measure these changes in a student’s state, and would therefore not exhibit a predictable relationship to the state of interest. An objective test item is defined as one for which the scoring rules are so exhaustive and specific that they do not allow scorers to make subjective inferences or judgments (Murayama, 2012, para. 1). Requiring students to infer what period in time to which the item refers constitutes subjectivity, and negates the item’s ability to deliver anything other than a highly subjective, highly summative, point of view.

Many teachers believe that they need strong measurement skills, and report that they are confident in their ability to produce valid and reliable tests (Frey, Petersen, Edwards, Pedrotti, & Peyton, 2005, p. 2). Yet this contention remains at-issue, both as standards for establishing validity remain disparate and interspersed throughout the literature on item-writing, and as the research also shows limited assessment training as required curriculum among teaching certification programs. What is then needed to determine whether items possess required validity, are standards for the validity of each assessment item. Of an identified 40 different item-writing rules, each falls into one or more of few categories, including potentially confusing wording or ambiguous requirements, guessing, rules addressing test-taking efficiency, and rules designed to control testwiseness (Frey, Petersen, Edwards, Pedrotti, & Peyton, 2005, p. 4). Each category includes a number of item-writing rules all intended to address differing concerns for validity. Potentially confusing wording or ambiguous requirements is a category which speaks to a confidence of whether every respondent will understand a question the same way. Guessing in this instances refers to the exclusion of responses where respondents simply chose a correct answer by chance, and therefore the probability of this occurring must be reduced. Rules addressing test-taking efficiency have to do with designing items in such a way that their structure does not impede, their form is simple, completing each is brief, and options are made clear. Finally, regarding rules designed to control testwiseness, this refers to designing items so (to the largest extent possible) items are answered using only knowledge, ability, or a combination of the two, rather than identifying patterns or other unintended characteristics of an item which may lead respondents to accidentally identify a correct answer without knowing why it is the correct answer. In order to infuse greater validity into the item discussed above, considerations for all four categories are prudent. Yet tantamount to many is to alter the item in such a way that ambiguous requirement is corrected, and an appropriate span of time is delineated of the contexts of the question.

Where validity deals with the relationship between each item and an area of interest, reliability deals with the relationship between each item and the consistency of results each time a measurement is taken. In discussing whether scores resulting from an item demonstrate reliability, look for whether the items’ responses are consistent across constructs, whether scores are stable over time when the instrument is administered a second time, and whether there is consistency among test administration and scoring (Creswell, 2009, p. 149). While researchers often address validity and reliability as separate considerations, I feel their interrelationship cannot be described strongly enough. Returning to the example item on program satisfaction above, if the validity of the measurement is compromised as in creating confusion among respondents for when and how much of the program it is intended to describe, this will then heighten the probability of inconsistent responses, which then directly threatens reliability. If one respondent can answer the same question multiple ways, and do so defensibly each time while regarding another aspect of the same context, we are now measuring the same condition multiple times, and arriving at multiple and quite different results. This is especially problematic with either a true/false or multiple choice item, as either presents a very limited list of potential responses. Altering response patterns among respondents based on poorly worded items leaves the reliability of the instrument in question, as with each subsequent administration it is entirely likely different responses among those available are selected, and the percentage of each item chosen (and therefore its description of a percent of a population assessed) is unreliable. It is only after the ambiguities inherent in the item’s wording are addressed, and consistent responses collected across multiple administrations, can this item begin to be described as either valid or reliable or both.

Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches (3rd ed.). Thousand Oaks, CA: Sage.

Fowler, F. J. (2009). Survey research methods (4th Ed.). Thousand Oaks, CA: Sage Publications, Inc.

Frey, B. B., Petersen, S., Edwards, L. M., Pedrotti, J. T., & Peyton, V. (2005). Item-writing rules: Collective wisdom. Teaching and Teacher Education, 21(4), 357–364.

Murayama, K. (2012). Objective test items. Retrieved December 24, 2013 from http://www.education.com/reference/article/objective-test-items/.

Post-Doc Blogpost: Reciprocal Recognition – Permitting Evaluation to Evoke Advocacy

I had the recent privilege of attending a presentation by Dr. Raymond Cheng, who spoke of the topic of degree equivalency across the world. This presentation covered such qualifiers as degree programs either being quick, cheap, or recognized, yet not any combination of more than two of these. The purpose of this emphasis is for learners and potential graduates to conduct an informed review of how equivalency is regarded when particularly looking at a given degree and its sister recognitions in other parts of the world. Where this becomes meaningful for program evaluation, is when we look beyond the confines of either the evaluation of process outcomes or program outcomes, and look at the issue of how program evaluation is designed to either prevent, or permit, evaluator as advocate for the learners served by these programs.

To describe our ultimate intent Dane (2011) remarks, “Evaluation involves the use of behavioral research methods to assess the conceptualization, design, implementation, and utility of intervention programs. In order to be effectively evaluated, a program should have specific procedures and goals, although formative evaluation can be used to develop them. Summative evaluations deal with program outcomes” (p. 314). At this level Dane permits our understanding that while many interests can converge upon a program’s process and programmatic outcomes, it will be its goals which drive what of the program is evaluated, and to what end. Do we wish to evaluate whether equivalency is a primary goal of a given degree program? Are we instead concerned with the political factors affecting the equivalency evaluation process? These and other considerations are seen as potential intervening variables amid the process of evaluating a program’s efficacy. Yet it may just be that student advocacy is the goal, not an outgrowth of the goal. We return to Dane (2011) to summarize, “Because the researcher is probably the most fully informed about the results, the researcher may be called upon to make policy recommendations. Personal interests may lead a researcher to adopt an advocate role. Although policy recommendations and advocacy are not unethical themselves, care must be taken to separate them from the results” (p. 314).

Evaluations should employ technically adequate designs and analyses that are appropriate for the evaluation purposes (Yarbrough et al., 2011, p. 201). While many of us understand this point anecdotally, combining this concept with advocacy allows us to then understand that it may not be simply a program’s learning outcomes which are the greatest goal. What of the international student who wishes to complete an MBA in the US, have that degree recognized as an MBA in Hong Kong, such that he/she may make a global impact as either consultant or scholar? If the extant evaluation process does not take this goal into account, mere learning outcomes centered on financial analysis, strategic planning, or marketing management as an MBA program would include could potentially be for not when considering the long-term prospects of this student in specific. We, then, as program evaluation personnel are given a critical task when in the design phase, as well as the formative evaluation phase, determining whether the stated goals of a program are in alignment with the goals of those who stand to benefit from that program.

Several factors can influence the role of an evaluator, including the purpose of the evaluation, stakeholders’ information needs, the evaluator’s epistemological preferences, and the evaluation approach used (Fleischer & Christie, 2009, p. 160). This conclusion in mind we see the inherent design equation is only complicated by the involvement of the evaluator him/herself. How this person regards the creation of new knowledge is among the considerations given among the inherent design process. Yet where concepts such as reciprocal recognition, evaluator epistemology, negotiated purposes, and defensible evaluation design converge, is upon the goals established not just by the program’s administrators alone, but the stated goals inclusive of those established by those who stand to benefit most from a program’s existence. Thank you for this reminder, and apt coverage of this topic Dr. Cheng.

Dane, F. C. (2011). Evaluating research: Methodology for people who need to read research. Thousand Oaks, CA: Sage Publications, Inc.

Fleischer, D. N., & Christie, C. A. (2009). Evaluation use: Results from a survey of U.S. American Evaluation Association members. American Journal of Evaluation, 30(2), 158–175.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Post-Doc Blogpost: Issue Polarization & Evaluator Credibility

On the topic of ideology and polarization Contandriopoulos & Brousselle (2012) note, “Converging theoretical and empirical data on knowledge use suggest that, when a user’s understanding of the implications of a given piece of information runs contrary to his or her opinions or preferences, this information will be ignored, contradicted, or, at the very least, subjected to strong skepticism and low use” (p. 63). Program evaluation, as with any other form of research and analysis, must be evaluated in context.  Yet context is not simply defined as the setting of the evaluation, nor the intent of the evaluation alone.  Instead, context must also include consideration for the evaluation’s design, and the very credibility of the evaluator him/herself as well. Evaluations should be conducted by qualified people who establish and maintain credibility in the evaluation context (Yarbrough et al., 2011, p. 15). This points to the need to not only ensure an audience capable of reception of the ideas/findings brought forth by the evaluation, yet to the equally necessary inclusion of evaluator’s capability of preserving the credibility of the study by purporting their own professional credibility as well.

An example of this in action was at a program evaluation session as part of the Orange County Alliance for Community Health Research last year.  This event, presented at UC Irvine, included a three hour presentation on program evaluation delivered by Michelle Berelowitz, MSW (UC Irvine, 2012). MS Berelowitz spoke at length on the broader purpose of program evaluation, the process for designing and conducting program evaluation, and the potential applications of program evaluation. This event was attended by a multitude of program directors, and other leaders of health and human services agencies in proximity to the university, intending to both learn of this process and to network with other agencies as well. Where polarization was introduced, and therefore the first instance of calling evaluator credibility into question, was during the introduction of MS Berelowitz’ presentation.  She, in very plain language, asked the audience who among them was motivated when it came time to perform evaluations of their programs each year.  This question was posed, to which none replied as being motivated, and a general consensus of disregard for the annualized process instead loomed. This calls the evaluator’s credibility into question, as the process itself is only as valuable as it is perceived by its audience, and program evaluation is only meaningful, when it can impact decisions and affect change.

If, during a presentation intended to inform others of the very merits of this program evaluation process the evaluator’s credibility is called into question, strategies must be enacted to counteract this stifling critique and inattention to the process’ value. To briefly return to the value in identifying and addressing polarization among stakeholders Contandriopoulos & Brousselle (2012) remark, “as the level of consensus among participants drops, polarization increases and the potential for resolving differences through rational arguments diminishes as debates tend toward a political form wherein the goal is not so much to convince the other as to impose one’s opinion” (p. 63).  Thus, in a room where a presentation on the merits of program evaluation is to be received with tepid acceptance, the evaluator holds the responsibility to convey the process in a way which fosters consensus, and restores credibility to the process.

One means of establishing greater evaluator credibility is by ensuring inclusion. This remains of no surprise as much of the literature regarding program evaluation centers upon a focus on stakeholder inclusion.  Yet to specifically address how this relates to evaluator credibility Yarbrough et al. (2011) write, “Build good working relationships, and listen, observe, and clarify. Making better communication a priority during stakeholder interactions can reduce anxiety and make the evaluation processes and activities more cooperative” (p. 18).  This was masterfully exercised by MS Berelowitz, as throughout the presentation she was found to be engaging, she drew insights from multiple attendees of the presentation, she worked to incorporate many of the attendees own issues into the presentation’s material, and she was thoughtfully respondent to attendee questions and further paradigm inquiry.

Another of the methods by which evaluator credibility can be restored, is in ensuring the design of the research is one where the audience can be receptive of the work performed. On this topic Creswell (2009) writes, “In planning a research project, researchers need to identify whether they will employ a qualitative, quantitative, or mixed methods design. This design is based on bringing together a worldview or assumptions about research, the specific strategies of inquiry, and research methods” (p. 20). Yet these conclusions impact not only the design of the research itself – and by extension design of program evaluation – yet are considerations which impact an evaluator’s ability to design meaningful research which conveys information according to long established assumptions about research. In this instance, MS Berelowitz conveyed a presentation on program evaluation which was deeply supported by the extant literature, was a presentation which purported her worldview and assumptions on research and thus program evaluation quite clearly, and was delivered in such a way that attendees were permitted to witness both the technical and practical merits of navigating the program evaluation process in the way presented. In both ensuring the inclusion of stakeholders, and ensuring worldview, inherent assumptions, and defensible design, the presentation was ultimately a success, and one where attendees left conveying motivation of the program evaluation process ahead.

Contandriopoulos, D., & Brousselle, A. (2012). Evaluation models and evaluation use. Evaluation, 18(1), 61–77.

Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches. Thousand Oaks, CA: Sage Publications, Inc.

UC Irvine. (2012). Program evaluation. Retrieved October 22, 2013 from http://www.youtube.com/watch?v=XD-FVzeQ6NM.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Post-Doc Blogpost: On Explicit Evaluation Reasoning

Evaluation reasoning leading from information and analyses to findings, interpretations, conclusions, and judgments should be clearly and completely documented (Yarbrough et al., 2011, p. 209). This standard arises not solely for the purpose of ensuring one’s conclusions are logical, rather this standard additionally emerges to function as both a final filter and an ultimate synthesizer of the results of all other accuracy standards. A7 – Explicit Evaluation Reasoning as per The Program Evaluation Standards serves to make known the efficacy of the process by which conclusions are reached. Said of this standard Yarbrough et al. (2011) continue, “If the descriptions of the program from our stakeholders are adequately representative and truthful, and if we have collected adequate descriptions from all important subgroups (have sufficient scope), then we can conclude that our documentation is (more) likely to portray the program accurately” (p. 209).  This level of holism leaves us with a critical imperative, to serve the program we are evaluating well, and to serve the negotiated purposes of the evaluation to their utmost.

Said of the need to ensure clarity, logic, and transparency of one’s process Booth, Colomb, & Williams (2008) elucidate, “[Research] is a profoundly social activity that connects you both to those who will use your research and to those who might benefit – or suffer – from that use” (p. 273). We then have a responsibility as evaluators and as researchers, to conduct ourselves and to document our process explicitly.  Doing so preserves such attributes tantamount to quality research as reproducibility, generalizability, and transferability. Yet there are also more specific considerations at-play. On the topic of this standard’s importance to current/future professional practice, we use the example of an extant job posting for a Program Evaluator with the State of Connecticut Department of Education. The description for this position includes the following, “A program evaluation, measurement, and assessment expert is sought to work with a team of professionals developing accountability measures for educator preparation program approval. Key responsibilities will include the development of quantitative and qualitative outcome measures, including performance-based assessments and feedback surveys, and the establishment and management of key databases for annual reporting purposes” (AEA Career, n.d., para. 2). This position covers a wide range of AEA responsibilities, and makes clear from only the second paragraph the sheer scope of responsibility under this position.  And while the required qualifications include mention of expertise in program evaluation, qualitative and quantitative data analyses, as well as research methods, it more importantly concludes with mention of the need to ‘develop and maintain cooperative working relationships’ and demonstrate skill in working ‘collaboratively and cooperatively with internal colleagues and external stakeholders’. What is required, then, is not solely a researcher with broad technical expertise, nor simply a methodologist with program evaluation background, but instead a member of the research community who can deliver on the palpable need to produce defensible conclusions from explicit reasoning in a way which connects with a broad audience of users and stakeholders.

Explicit reasoning, expressed in a way digestible by readers, defensible to colleagues, and actionable by program participants, requires the researcher be comfortable with where he/she is positioned in relation to the research itself when communicating both process and results.  This is also known among as the literature as positionality. Andres (2012) speaks of this in saying, “This positionality usually involves identifying your many selves that are relevant to the research on dimensions such as gender, sexual orientation, race/ethnicity, education attainment, occupation, parental status, and work and life experience” (p. 18). And yet why so many admissions solely for the purpose locating one’s self among the research? Because positionality has as much to do with the researcher, as it does the researcher’s position and its impact on program evaluation outcomes. An example of this need for clarity comes to us from critical action research.  Kemmis & McTaggart (2005) describe, “Critical action research is strongly represented in the literatures of educational action research, and there it emerges from dissatisfaction with classroom action research that typically does not take a broad view of the role of the relationship between education and social change… It has a strong commitment to participation as well as to the social analyses in the critical social science tradition that reveal the disempowerment and injustice created in industrialized societies” (p. 561). This in mind, it stands to reason that one can only be successful in such a position, if the researcher him/herself is made clear, his/her position to the research is clear, his/her stance on justice as only one example is considered, the process by which the research is conducted is clear, and how this person in relation to this research then renders subsequent judgment on data collected.  For this Program Evaluator role, just as many others like it, must be permitted to serve as both researcher and advocate, exercising objective candor throughout.

American Evaluation Association. (n.d.). Career. Retrieved October 9, 2013 from http://www.eval.org/p/cm/ld/fid=113.

Andres, L. (2012). Designing & doing survey research. London, England: Sage Publications Ltd

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed.). Chicago, IL: The University of Chicago Press.

Kemmis, S. & McTaggart, R. (2005). Participatory action research. In Denzin, N. K. & Lincoln, Y.S., The sage handbook of qualitative research (3rd Ed.), (559-604). Thousand Oaks, CA: Sage Publications, Inc.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Post-Doc Blogpost: Meaningful Products & Practical Procedures in Program Evaluation

Of the Program Evaluation Standards, standard U6 regards meaningful processes and products, whereas standard F2 is on practical procedures.  To begin, the definition of U6 as described by Yarbrough et al. (2011) includes, “Evaluations should construct activities, descriptions, and judgments in ways that encourage participants to rediscover, reinterpret, or revise their understandings and behaviors” (p. 51). The authors continue when discussing F2, “Evaluation procedures should be practical and responsive to the way the program operates” (p. 87). While U6 is among the utility standards, and F2 the feasibility standards, I very sincerely believe they share much of the same intent as it pertains to program evaluation.  They can be viewed as facets of a singular whole, where U6 on meaningful products and practical procedures discusses the real need for evaluation audiences to have the ability to not only interpret what findings are shared, yet be able to make positive change from those results as well.  F2 reflects pointedly regarding respecting the program’s existing operations to the point of requesting that evaluators act in a way practical when in comparison to what is already in-place.  In their combined essence, U6 asks for findings that mean something to audiences, and F2 asks those findings take existing conditions into account.  Yet is this not already a fundamental requirement of any successful change initiative?

A related position in the field I am quite interested in is the work that institutional research (IR) teams perform.  Working simultaneously and directly with members of the Office of Institutional Research and Assessment from one university, and the Office of Assessment with another university, I have garnered a deep respect for the work they are collectively performing for their relative institutions.  As Howard, McLaughlin, & Knight (2012) define the profession, “two of the most widely accepted definitions are Joe Saupe’s (1990) notion of IR as decision support – a set of activities that provide support for institutional planning, policy formation, and decision making – and Cameron Fincher’s (1978) description of IR as organizational intelligence” (p. 22).  In both cases the focus is on data, and the use of data for an institution to know more about itself in the future than it knows in the present.  Whether this is in the form of performance metrics, operational efficiency analysis, forward-looking planning exercises, or simply the evaluation of data which has been better culled, processed, cleaned, and presented than in the past.  Yet amid each of these steps is the very real need to not only provide a value-added process which is practical, it also must speak to the existing system for its findings to warrant merit.  This is how we return to both U6 and F2, the standards of meaningful processes and products, as well as practical procedures.

Forging a palpable relationship between the evaluation process and relevant stakeholders including sponsors, evaluators, implementers, evaluation participants, and intended users is key. This permits greater understanding of the processes employed and products sought, and permits for greater buy-in when results are later shared.  Kaufman et al. (2006) presents a review of the evaluation plan used to review the outcomes of a family violence initiative for the purpose of promoting positive social change. On meaningful products and practical procedures Kaufman et al. (2006) remark, “Evaluations are most likely to be utilized if they are theory driven, emphasize stakeholder participation, employ multiple methods and have scientific rigor… In our work, we also place a strong emphasis on building evaluation capacity” (p. 191).  This evaluation, focusing first on creating a logical model for the purpose of articulating a highly-defined program concept, was used to then cascade data-driven decisions from the model constructed. With the combined efforts of project management, the program’s staff, and the evaluation team, an evaluation plan was crafted.  This plan permitted broad buy-in based on the involvement of many stakeholders. In the end, this also enabled the work of the evaluation team to continue in a less acrimonious environment and reemphasized for the team the importance of working in collaboration with key stakeholders from the beginning so that stakeholders bought into and supported the evaluation process (Kaufman et al., 2006, p. 195). The results of this collaborative, synthesizing process included greater stakeholder participation, a heightened level of rigor in addition to increased capacity, and most importantly for the program the use of ‘common measures’ across the program.

Yet increased stakeholder involvement is not the only benefit to ensuring meaningful processes & products as well as practical procedures. This heightened use of pragmatism among processes also lends to greater design efficacy. Said of a study of 209 PharmD students at the University of Arizona College of Pharmacy (UACOP), “Curriculum mapping is a consideration of when, how, and what is taught, as well as the assessment measures utilized to explain achievement of expected student learning outcomes” (Plaza, 2007, p. 1). This curriculum mapping exercise was intended to review the juxtaposition of the ‘designed curriculum’ versus the ‘delivered curriculum’ versus the ‘experienced curriculum’. The results of this study show great concordance among student and faculty perception, reinforcing not only sound program evaluation design to permit concordance, yet an effective program as well as measured by these graphical outcomes.  Equally said of the design aspects of pragmatism in program evaluation, Berlowitz et al. (2010) developed, “a system-wide approach to the evaluation of existing programs… This evaluation demonstrates the feasibility of a highly coordinated “whole of system” evaluation. Such an approach may ultimately contribute to the development of evidence-based policy” (p. 148). This study, and the rigorous data collection among existing datasets was not designed for the purpose of purporting a new means of gathering and aggregating data alike, simply a new method for taking advantage of the data already largely available while ensuring a broader yet more actionable resulting series of conclusions strong enough to further input on policy decisions.

Where the above considerations for the utility of meaningful processes and products, and the feasibility of practical procedures then comes together, is among its application in a position held in the office of institutional research.  Said of the role IR has in ensuring pragmatism in program evaluation, Howard et al. (2012) note, “Driven by the winds of accountability, accreditation, quality assurance, and competition, institutions of higher education throughout the world are making large investments in their analytical and research capacities” (p. 25). These investments remain critical to the sustainability of their investing institutions, and require that what is discovered in and among these offices is then instituted through the very departments and programs evaluated.  Institutional research is not a constituent which exists solely for the purpose of performance evaluation, nor do IPEDS or accreditation reporting responsibilities overshadow the need for actionable data among those programs under review.  Rather, we seek an environment where IR analysts and researchers are permitted to utilize practical processes for the purpose of ensuring greater stakeholder buy-in and effective design.

An environment where practical procedures are used is equally one where replicability, generalizability, and transferability all exist in a more stable ecosystem of program evaluation. Said of this need, in a 2 year follow-up study of needs and attitudes related to peer evaluation, DiVall et al. (2012) write, “All faculty members reported receiving a balance of positive and constructive feedback; 78% agreed that peer observation and evaluation gave them concrete suggestions for improving their teaching; and 89% felt that the benefits of peer observation and evaluation outweighed the effort of participating” (p. 1). These are not results achieved from processes where faculty found no direct application of the peer review program, these data were gathered of a program where a respect for the existing process was maintained and the product was seen as having more value than the time and effort required to participate as expressed in opportunity cost. Finally, using a portfolio evaluation tool which measured student achievement of a nursing program’s goals and objectives, Kear et al. (2007) state, “faculty reported that although students found writing the comprehensive self-assessment sometimes daunting, in the end, it was a rewarding experience to affirm their personal accomplishments and professional growth” (p. 113). This only further affirms the very real need for evaluators to continue to discover means of collecting, aggregating, and analyzing data that speak to existing processes.  This also further reinforces the need for program evaluations to result in a series of conclusions, or recommendations that make the greatest use of existing process, allowing for sweeping institutionalization as was seen with the Kear et al study.  Finally, said of this need for practicality among process and product within program evaluation and research as a whole, Booth, Colomb, & Williams, (2008) remark, “When you do research, you learn something that others don’t know. So when you report it, you must think of your reader as someone who doesn’t know it but needs to and yourself as someone who will give her reason to want to know it “(p. 18).

Berlowitz, D. J. & Graco, M. (2010). The development of a streamlined, coordinated and sustainable evaluation methodology for a diverse chronic disease management program. Australian Health Review, 34(2), 148-51. Retrieved from http://search.proquest.com/docview/366860672?accountid=14872

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed). Chicago, IL: The University of Chicago Press.

DiVall, M., Barr, J., Gonyeau, M., Matthews, S. J., Van Amburgh, J., Qualters, D., & Trujillo, J. (2012). Follow-up assessment of a faculty peer observation and evaluation program. American Journal of Pharmaceutical Education, 76(4), 1-61. Retrieved from http://search.proquest.com/docview/1160465084?accountid=14872

Howard, R.D., McLaughlin, G.W., & Knight, W.E. (2012). The handbook of institutional research. San Francisco, CA: John Wiley & Sons, Inc.

Kaufman, J. S., Crusto, C. A., Quan, M., Ross, E., Friedman, S. R., O’Reilly, K., & Call, S. (2006). Utilizing program evaluation as a strategy to promote community change: Evaluation of a comprehensive, community-based, family violence initiative. American Journal of Community Psychology, 38(3-4), 191-200. doi:http://dx.doi.org/10.1007/s10464-006-9086-8

Kear, M. & Bear, M. (2007). Using portfolio evaluation for program outcome assessment. Journal of Nursing Education, 46(3), 109-14. Retrieved from http://search.proquest.com/docview/203971441?accountid=14872

Plaza, C., Draugalis, J. R., Slack, M. K., Skrepnek, G. H., & Sauer, K. A. (2007). Curriculum mapping in program assessment and evaluation. American Journal of Pharmaceutical Education, 71(2), 1-20. Retrieved from http://search.proquest.com/docview/211259301?accountid=14872

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

The Juxtaposition of Social/Ethical Responsibility across Disciplines

As I approach full speed in the post-doctoral program, I equally approach my first opportunity to share publicly insights derived from this study of assessment, evaluation, and accountability. The American Evaluation Association (AEA) identifies its established social and ethical responsibilities of evaluators.  In juxtaposition, the social and ethical responsibilities of institutional research as an education-based area of interest, are expressed by the Association for Institutional Research (AIR).  Yet first, a personal introduction as requested by this assignment.  I have chosen institutional research as my professional education-based area of interest, as research and analysis have been at the heart of much of what I’ve done for the past decade or more.

Spanning a period easily covering ten years, I have straddled industry and academe for the purpose of not only remaining a lifelong learner but continuing to leverage what I take from each course and apply it as readily as possible to my working world in industry and in the classroom to the benefit of my employers and my students.  As mentioned elsewhere in my ‘about’ page, my work includes a multitude of projects focused on distilling a clear view of institutional effectiveness and program performance. Roles have included senior outcomes analyst, management analyst, operations analyst, assessor, and faculty member for organizations in industries ranging from higher education to hardware manufacturing and business intelligence.  With each position a new opportunity to assimilate new methods for assessing data.  With each new industry a new opportunity to learn a new language, adhere to new practices, and synthesize the combined/protracted experience that is the sum of their parts.  Yet in each instance I do not feel as though I remain with a steadfast understanding that I’ve learned more and therefore have less left to learn.  In each instance I instead feel as though I know and have experienced even less of what the world has to offer.  Focusing this indefinite thirst to speak to assessment and evaluation specifically, the task becomes pursuing ever-greater growth, and ever-greater success across a wide range of applications, industries, and instances, while equally remaining true to guiding principles which serve those who benefit from any lesson I learn or analysis I perform.

The Program Evaluation Standards intimate standards statements regarding propriety which include responsive and inclusive orientation, formal agreements, human rights and respect, clarity and fairness, transparency and disclosure, conflicts of interest, and fiscal responsibility.  At the heart of these Yarbrough, Shulha, Hopson, & Caruthers (2011) remark, “Ethics encompasses concerns about the rights, responsibilities, and behaviors of evaluators and evaluation stakeholders… All people have innate rights that should be respected and recognized” (p. 106).  This is then compared with a like-minded statement from the AEA directly in stating, “Evaluators have the responsibility to understand and respect differences among participants, such as differences in their culture, religion, gender, disability, age, sexual orientation and ethnicity, and to account for potential implications of these differences when planning, conducting, analyzing, and reporting evaluations” (Guiding Principles for Evaluators, n.d., para. 40).  Finally, in juxtaposition we have Howard, McLaughlin, & Knight with The Handbook of Institutional Research (2012) who write, “All employees should be treated fairly, the institutional research office and its function should be regularly evaluated, and all information and reports should be secure, accurate, and properly reported… The craft of institutional research should be upheld by a responsibility to the integrity of the profession” (p. 42). Thus, in the end, while this work had intended to explore a juxtaposition, the chosen word implies some paradoxical behavior at least to a slight degree, in actuality shows none of the sort.  Rather, we find congruence, and we find agreement.

It is important to uphold standards for the ethical behavior of evaluators, as the very profession is one steeped in a hard focus on data, and the answers data provide.  We as human beings, however, tend to this profession while flawed. We make mistakes, we miscalculate, we deviate from design, and we inadvertently insert bias into our findings.  None of this may be done on purpose, and certainly not all transgressions are present in every study.  The implication is there, though, that we can make mistakes and are indeed fallible.  At the same time we are of a profession which is tasked with identifying what is data and what is noise, what programs work and which curriculum does not, which survey shows desired outcomes and which employees are underperforming.  These are questions which beget our best efforts, our most scientific of endeavors, and our resolute of trajectories to identify only truths however scarce, amid the many opportunities to be tempted toward manufacturing alternate – albeit perhaps more beneficial – realities for we as evaluators and our stakeholders. All participants have rights, all evaluators have rights, and all sponsors have rights.  It is our task to serve in the best collective interest, using the best methods available to ensure a properly informed future.

American Evaluation Association. (n.d.). Guiding principles for evaluators. Retrieved September 11, 2013 from http://www.eval.org/p/cm/ld/fid=51

Howard, R.D., McLaughlin, G.W., & Knight, W.E. (2012). The handbook of institutional research. San Francisco, CA: Jossey-Bass.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

How Management Styles are Changing

The organization defined is a goal-directed, boundary-maintaining, and socially constructed system of human activity (Aldrich & Ruef, 1979).  In its essence, the construct of management is also met with a clear definition including the activities of planning, organizing, leading, and controlling.  When considering these concepts in juxtaposition, the foundations of management do not waver, and the activities associated at a conceptual level remain the same.  What has changed, however, is how each of these activities is carried out, for the sake of the sustainability of each goal-directed, boundary-maintaining system of activity.  How the activities of management are performed is a question of style, and style has indeed evolved.  This evolution in style – and the proliferation of additional, nuanced styles – is a result of a combination of advances in technology, organizational form, as well as shifts in the prevailing workforce demographic of each organization.

To say that technology has changed what management is would be giving credit where it is not due. Technology has instead changed the way management is performed, while its functions remain stable.  The prevailing literature, both academic and practitioner, now discuss advances in technology as shepherds of a global, diverse, tightly linked, and transparent collection of organizations, which are home to managers who use every affiliation necessary to further the progress of one’s guiding objectives.  Those objectives are no longer driven by a desire to simply control aggregated activity, as was the focus both during and immediately following the industrial revolution.  Objectives, and the style of today’s manager, are driven by purpose.  This has given way to such works as Hamel’s The Future of Management, Benko & Anderson’s The Corporate Lattice, Hallowell’s Shine, as well as Pascale, Sternin, & Sternin’s The Power of Positive Deviance to name a few.  While these texts describe very different facets of organizational life, they share the common thread of managers doing everything they can to identify what is working in an organization, how best practice can be both identified and spread throughout the organization, and place the focus on the potential of a workforce, rather than upon controlling its activities.  Management style has thus changed from choosing between varying levels of commanding/controlling resources, to instead choosing between varying levels of interaction with the value chain of an organization and the resources associated with that value chain.  In its essence, management style is now most impacted by considerations for epistasis, where the critical question is how the manager will choose to leverage his/her unique talents to influence the organization’s ecosystem.  Rather than ask simply what part of an organization he/she is responsible for, the manager now instead seeks knowledge of influence networks, as both organizational knowledge and responsibility are interspersed.

– Justin

Amid the Tumult, the Purposive Manager

Management as a practice can be seen as a combination of art, craft, and science, which take place on an information plane, a people plane, and an action plane (Mintzberg, 2009). A good manager, then, is someone who moves beyond the traditional confines of seeing one’s function as planning/organizing/leading/controlling each in isolation at a specific point in time, and instead sees managing as using all aspects of one’s intuition, training, and talents at once and in perpetuity.

Where managers are now described in the literature as operating in an environment wherein interruptions can be encountered up to every 48 seconds of the day, and the manager’s attention is thus piecemeal and scattered across multiple tasks as well as decisions in a single hour, it remains important that a “good manager” use this as a strength not as what defines their work. Rather than using the bustle of today’s business environment as an excuse for surface-level consideration of every decision encountered, it is instead an opportunity to convey consistency in message and purpose with every new decision. A day can be filled with hundreds of isolated decisions made at-a-glance, or they can all be made while guided by a single thread of focused purpose and attention to the direction he/she wishes to push their ecosystem within the manager’s given sphere of influence. If the manager wishes to develop a team guided by thoughtful analysis, each decision made can be an interruption prior to returning to this task, or it can be a way to substantiate this wish by emphasizing thoughtful analysis in each decision. A good manager thus uses technical skills to facilitate the professional and technical aspects of daily work, while also using those same technical skills to describe how best to support and influence the technical aspects of others’ work as well. Soft skills are equally important to ensure not only that influence is purported, yet the purpose and direction as the manger sees fitting is communicated within his/her network in a way which delivers a lasting impression.

An Operations Manager uses his technical knowledge of the business to drive operations, while also using this technical knowledge to guide others’ work within the scope of process variation, selection, and retention. His soft skills are important as the potential global, and very likely diverse workforce with which he works must be influenced and led, not directed and controlled alone. The Finance Manager must use her technical skills to drive the fiduciary sustainability of an organization and her team, but must also use her technical skills to seek an efficacious value chain to sustain the organization’s competitive advantage. Her soft skills thus are what provide the conduit for this process, and technical skills the information necessary for her network to later develop the tacit knowledge necessary for this to occur. A “good manager”, then, is aware of the organization’s ecosystem, his/her influence on this ecosystem, and will put to use all intuition, training, and talents present, to help others see how an organization’s value chain drives its purpose.

– Justin