Speak Less, Say More, To Keep Your Strategy On-Track

Microphones at the podium

This did not begin well. For the past two months I have been teaching a group of senior executives and high level technical professionals the finer points of communicating strategy to their organizations. Day 1 was a travesty, I was all but drawn and quartered by their perfunctory comments relative to impressions of how confusing it was that I assumed their having a strategy did not necessarily equal their having an ability to communicate that strategy.

To some extent they were right, yet to a larger extent we all still had quite a lot to learn. While I’ve been teaching college students for the better part of the last 12 years, I still went home that first night of this class just weeks ago entirely deflated. While confident in my ability to both recall and relate the material, I had a larger lesson to learn about the inauthentic way in which I arrived that first night. This was not a class of first year freshman looking for strong leadership. These are seasoned leaders themselves and in their last class prior to graduating with a Master’s degree in, of all things, leadership.

So what did I do differently to turn things around? I stopped trying to talk my way out of it. Instead, I just recognized my role as they did, a bridge between theory and practice yet nothing more. I created the boundaries for their communication and learning and set them free within those boundaries. My communication with them has also since changed. I toned down the sage on the stage. I listened more intently. I adjusted both style and content on the fly. I adjusted my approach to leverage their experience and my own, all while tying in the content when I could through substantive yet bite-sized key takeaways to keep it memorable. In the process I realized that as I was teaching them to become more effective communicators, such that they may communicate their strategies more effectively, I too was learning to be more effective with my own communication. This in mind, I just wanted to take a minute to share a dozen of my key takeaways from this class on communicating strategy, that I think you may just find helpful across communication applications of all types.

  1. Effective Leaders Lead Strategy and Tactics
  2. Lead with Logic & Emotion, Not Logic or Emotion
  3. Everyone Can Be a Change Agent
  4. Effective Alignment Requires A Common Message
  5. Keep the Strategy Message Bite-Sized & Repeatable
  6. Reach Them Through Intrinsic Motivations
  7. Identify Motivations Through Open Discussion
  8. Connect Strategy to a Destination
  9. Measure the Business to Measure the Communication
  10. Always Reward Those Supporting the Strategy
  11. Expect of Yourself What You Expect of Others
  12. Your People Are Your Primary Communication Vehicle

You may entirely disagree with some of what is listed and that is certainly for you to decide. There may be some (or many) which you find context-specific and are of no use to you. Yet I still believe knowing what content matters and what does not is one of our first steps toward great communicating as a leader, so thank you for reading.

Research in Business is Everyone’s Business

Collective-Consciousness

I am a firm believer that having a greater number of college degrees does not necessarily mean you’re smarter than those with fewer. I am unapologetic in my stance, as I believe the role of the university is not to increase your IQ (arguably a number with little flux). The role of the university is instead to train you, largely in a particular discipline or process or both. Yes, some programs require a greater degree of raw intelligence, and the purpose of this post is not to draw those lines. The purpose instead is to understand how we can walk away from the misconception that only those with a research background can perform business research. What connects these two dots? In short, the conclusion that just because someone has a PhD, it does not mean they know more about your business than you do. In fact, the opposite is usually true. If they are trained in the process, and you are intimate with your business, I would like to make a suggestion. When seeking a greater understanding of your business’ either process, program, or product performance, team up instead to form a symbiotic relationship between the business and a researcher so you both can accomplish more and do the research together.

The Why – The Interpretive Approach

Among the approaches to organization studies which exist, these include the interpretive approach. As described by Aldrich and Ruef (2009):

The interpretive approach focuses on the meaning social actions have for participants at the micro level of analysis. It emphasizes the socially constructed nature of organizational reality and the processes by which participants negotiate the meanings of their actions, rather than taking them as a given. Unlike institutional theorists, interpretive theorists posit a world in which actors build meaning with locally assembled materials through their interaction with socially autonomous others. (p. 43)

If this is true, then a lone researcher cannot simply be transplanted from one organization to the next, all the while delivering revenue-trajectory-altering research in a vacuum. The research is to be built on great questions, those may just come from the business, and the very meaning of the business and the data it generates is embedded within the interactions and the actors in the business itself.

The What – A Symbiotic Relationship

A relationship where you – representing the business – provide the context, maybe even help gather some of the data, and are there to take part in the interpretation once the researcher has completed a substantive portion of his/her analysis. You’re a team, the researcher is not a gun for hire. Which also means, if you’re a team, you’re a researcher too. This approach is important for many reasons, among which includes your store of tacit knowledge. As we are reminded by Colquitt, Lepine, and Wesson (2013), “Tacit knowledge [is] what employees can typically learn only through experience. It’s not easily communicated but could very well be the most important aspect of what we learn in organizations. In fact, it’s been argued that up to 90 percent of the knowledge contained in organizations occurs in tacit form” (p. 239). That is a vast amount of available information the researcher simply will not have if you do not team up and start working together.

The How – A Cue from Empowerment Evaluation

We can draw a number of conclusions on how best to form this reciprocal relationship between business and researcher as one team, and many come from the literature on empowerment evaluation. As put by Fetterman and Wandersman (2005):

If the group does not adopt an inclusive and capacity-building orientation with some form of democratic participation, then it is not an empowerment evaluation. However, if the community takes charge of the goals of the evaluation, is emotionally and intellectually linked to the effort, but is not actively engaged in the various data collection and analysis steps, then it probably is either at the early developmental stages of empowerment evaluation or it represents a minimal level of commitment. (p. 9)

There is a final, critical subtext to all of the above. In essence, there must be a consistent flow of ideas between the researcher and the business. Research in business is everyone’s business, yet only in environments when the researcher can share his/her craft, and the business more informed can help to grant the researcher access to the knowledge only they possess. For a final thought on the merits of this proposed team I defer to the literature on constructing grounded theory. Therein Charmaz 2014 reminds us that, “We need to think about the direction we aim to travel and the kinds of data our tools enable us to gather… Attending to how you gather data will ease your journey and bring you to your destination with a stronger product” (p. 22).

About the Author:

Senior decision support analyst for Healthways, and current adjunct faculty member for Allied American University, Grand Canyon University, South University, and Walden University, Dr. Barclay is a multi-method researcher, institutional assessor, and program evaluator. His work seeks to identify those insights from among enterprise data which are critical to sustaining an organization’s ability to complete. That work spans the higher education, government, nonprofit, and corporate sectors. His current research is in the areas of employee engagement, faculty engagement, factors affecting self-efficacy, and teaching in higher education with a focus on online instruction.

Mitigating Hazards in Justified Conclusions & Sound Design

A1 and A6 of The Program Evaluation Standards regard both Justified Conclusions and Decisions, as well as Sound Designs and Analyses. Where A1 asks that evaluation conclusions and decisions be explicitly justified in the cultures and contexts where they have consequences, A6 asks that evaluations employ technically adequate designs and analyses that are appropriate for the evaluation purposes (Yarbrough et al., 2011, p. 165-167). In these instances, we regard standards which impact the potential accuracy of an evaluation. When discussing strategies for mitigating the hazards associated with these standards, previous coverage elucidated suitable actions ranging from integrating stakeholder knowledge frameworks, to clarifying roles amid the evaluation team, to properly defining what is meant by accuracy in the context of a given evaluation. Extant strategies discussed also include selecting designs based on the evaluation’s purpose, while still including enough flexibility in the design that compromise and uncertainty can be permitted during this iterative process. Here we discuss strategies in addition to those previously mentioned, and will instead focus on mitigating the hazards associated with the accuracy standards by exploring both the concept of triangulation, and of establishing validation in practice.

Triangulation is employed across quantitative, qualitative, and mixed methods research alike, as a means with which to prevent such common errors among research as establishing conclusions based on samples which are not representative of their stated population, and permits the reduction of confirmation bias among findings.  As per Patton (2002), “Triangulation is ideal. It can also be expensive. A study’s limited budget and time frame will affect the amount of triangulation that is practical, as will political constraints in an evaluation. Certainly, one important strategy for inquiry is to employ multiple methods, measures, researchers, and perspectives – but to do so reasonably and practically” (p. 247). This operates as a strategy for mitigating hazards among the accuracy standards, as the primary intent of those standards is to provide context-specific conclusions which are defensible, and inclusive of those stakeholders involved in the process.  Risks to accuracy are addressed by this strategy, as while a single data point, or collection method may provide a pertinent picture of the evaluation’s efficacy, the ability o further substantiate those results via additional methods and additional data only serve to strengthen what conclusions are reached. In addition to ensuring data integrity via a multi-method approach to data collection, establishing validation in practice remains an addition strategy for mitigating the hazards of our accuracy standards.

On establishing validation in practice, Goldstein & Behuniak (2011) comment, “In the Standards, validity is defined as the ‘degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests… The interrelationships among the interpretations and proposed uses of test scores and the sources of validity evidence define the validity argument for an assessment” (p. 180). What then becomes pertinent among the evaluation team’s efforts is to ensure any/all relevant validity evidence is collected, alongside the proposed uses of the methods selected and employed for a particular evaluation.  There remains an onus upon the evaluation team to not only substantiate the conclusions borne of the team’s data collection and analysis of the immediate program and environment, there equally remains an onus upon this same team to first establish the validity of the methods and instruments chosen for that data collection and analysis.  Simply requiring stakeholders to trust in the expert judgment of an evaluation team’s select of methods and instruments is cause for concern, as this does not permit the kind of inclusion of stakeholder knowledge frameworks mentioned above. Rather, to ensure that accuracy standards are upheld an inclusive process of iterative reviews of the proposed design and execution of the design with stakeholders groups predicates the necessary level of holism required to conclude a design and its instruments accurate.  The evaluation team absolutely brings with it the knowledge, experience, and technical prowess necessary to perform a successful evaluation, yet doing so without consulting the knowledge and experience of stakeholders provides an opportunity for research to be performed which is not in alignment with what is intended of those employing such a team.  Stakeholders, while not technical or content experts of AEA per se, would have as much to contribute on the selection of data points, methods employed, and analysis performed as the team itself, based solely on stakeholder involvement with the program directly and the experiences that interaction brings with it. They would not serve as the primary source of suggestions for design, yet would serve to discern which might serve the current situation best.  Said of this paradox, Booth, Colomb, & Williams (2008) note, “A responsible researcher supports a claim with reasons based on evidence. But unless your readers think exactly as you do, they may draw a different conclusion or even think of evidence you haven’t” (p. 112).

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed.). Chicago, IL: The University of Chicago Press.

Goldstein, J., & Behuniak, P. (2011). Assumptions in alternate assessment: An argument-based approach to validation. Assessment for Effective Intervention, 36(3), 179–191.

Patton, M. Q. (2002). Qualitative research & evaluation methods. Thousand Oaks, CA: Sage Publications, Inc.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Justified Conclusions & Sound Evaluation Design

Among the program evaluation standards are the two accuracy standards A1: Justified Conclusions and Decisions, and A6: Sound Designs and Analyses. A1 regarding justified conclusions and decisions is defined by Yarbrough, Shulha, Hopson, & Caruthers (2011) as, “Evaluation conclusions and decisions should be explicitly justified in the cultures and contexts where they have consequences” (p. 165). The associated hazards among this standard are many, they include assumptions of accuracy among evaluation teams, the ignoring of cultural cues and perspectives, assumptions of transferability, and finally a number of hazards concerning the emphasis on technical accuracy at the expense of cultural inclusivity and immediate environmental context in combination (Yarbrough et al., 2011, p. 167). Where this particular standard is most consequential concerns the sociological factors inherent in any assessment. Said of such factors, Ennis (2010) notes, “It is the liking part – the emotional, aesthetic, or subjective decision to actively cooperate with an institution’s assessment regime – that suggests the difficulties inherent in coupling the success of an assessment program to the establishment of an assessment culture” (p. 2). Thus, an assessment culture cannot be established through rigor or display of technical prowess alone.

An institutions’ absorptive capacity is not directly correlated with the rate of acceptance of new knowledge. Rather, the effect of hazards such as disregarding extant culture/subcultures, disregarding the needs of the immediate environment, and disregarding transferability all pose a direct threat to both acceptance and adoption of whatever findings an assessment produces. The recommendations for correcting for this therefore include (1) clarifying which stakeholders will form conclusions and permit the integration of those stakeholders’ knowledge frameworks; (2) clarify the roles and responsibilities of evaluation team members; (3) ensure findings reflect the theoretical terminology as defined by those who will draw conclusions; (4) identify the many definitions of accuracy as per assessment users; and (5) make effective choices regarding depth, breadth, and representation of the program (Yarbrough et al., 2011, p. 166).

A6, the standard of sound designs and analyses is defined by Yarbrough et al. (2011) as, “Evaluations should employ technically adequate designs and analyses that are appropriate for the evaluation purposes” (p. 201). The associated hazards for this standard include a number of considerations for responsiveness to the features, factors, and purpose(s) of a given program. Such hazards include choosing a design based on status/reputation rather than their ability to provide high quality conclusions, a lack of preparation for potentially disappointing evaluation findings, a lack of consideration for the many feasibility/propriety/utility standards, a lack of customization of design to the current environment, and a lack of broad-based consultant with stakeholders at multiple levels (Yarbrough et al., 2011, p. 204). The effects of a lacking, misguided, or inappropriate design can be devastating to the overall efficacy of a given assessment. Said of the need for sound design Booth, Colomb, & Williams (2008) comment, “In a research report, you must switch the roles of student and teacher. When you do research, you learn something that others don’t know. So when you report it, you must think of your reader as someone who doesn’t know it but needs to and yourself as someone who will give her reason to want to know it” (p. 19).

Performing an assessment based solely on the popularity of the design employed misses the point of assessing the program at-hand, which is to formulate a strategy for better understanding the unique program under study, and relate gathered data in a way both digestible and actionable by those who hold a stake. One can employ a procedure which reliably gathers data, yet if unrelated data, or unnecessary data, the design lacks both utility and in this instance accuracy. So how can accuracy be increased and applicability restored? It is suggested to instead select designs based on the evaluation’s purpose, secure adequate expertise, closely evaluate any designs which are in contention, choose framework(s) which provide justifiable conclusions, allow for compromise and uncertainty, and consider the possibility of ongoing/iterative modifications to the design over protracted periods to ensure currency (Yarbrough et al., 2011, p. 204). Doing so will not only ensure that your audience receives actionable results, yet that same audience can also hold the design and collective opinion of the efficacy of the assessment in greater confidence, as their understanding of it is equally elevated.

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed.). Chicago, IL: The University of Chicago Press.

Ennis, D. (2010). Contra assessment culture. Assessment Update, 22(2), 1–16.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Post-Doc Blogpost: What Sets Empowerment Evaluation Apart

According to Middaugh (2010), “There is no shortage of frustration with the inability of the American higher education system to adequately explain how it operates… The Spellings Commission (2006) chastised higher education officials for lack of transparency and accountability in discussing the relationship between the cost of a college education and demonstrable student learning outcomes” (p. 109). As we continue to look into instigating and preserving positive change, I find Middaugh’s words to go beyond that of a burning platform, and instead resonate as a call to action. Empowerment evaluation is not simply about urging faculty and administrators to perform better evaluations, improving analytical or reporting skills, or increasing skills in evaluative inquiry. Empowerment evaluation in this environment is about being able to better understand one’s self, in a highly regulated and highly monitored environment where the stakes remain quite high.

As three other views into this relevance we begin with Fetterman from his article Empowerment evaluation: Building communities of practice and a culture of learning. Therein Fetterman (2002) describes, “Empowerment evaluation has an unambiguous value orientation – it is designed to help people help themselves and improve their programs using a form of self-evaluation and reflection” (p. 89). Empowerment evaluation is therefore neither strictly formative nor summative, particularly as it is not evaluation performed by evaluation personnel. Rather, it creates enhanced opportunities for sustainability as empowerment evaluation permits stakeholders to conduct their own evaluations. What is greatly advantageous about this approach, regards its direct relationship to change processes. Just as with a guiding vision in most established change processes, empowerment evaluation begins with organizational mission. Fetterman (2002) notes, “An empowerment evaluator facilitates an open session with as many staff member and participants as possible. They are asked to generate key phrases that capture the mission of the program or project” (p. 91).

Party to this line of thinking is also the step of first determining present state before defining future state. As described by Worthington (1999), “First, it is highly collaborative, with input from program stakeholders at every stage of the evaluation process. The four steps or stages of empowerment evaluation are: (1) “taking stock,” during which stage program participants rate themselves on performance; …” (p. 2). This article is equally instructive for helping to make sense of how one takes an organization from present state to future state. As Worthington (1999) later describes, “Empowerment evaluation contains elements of all three forms of participatory research. It is a reciprocal, developmental process that aims to produce ‘illumination’ and ‘liberation’ from role constraints among participants; it shares with action research a commitment to providing tools for analysis to program participants; and the evaluator takes a less directive, collaborative role” (p. 7).

Finally among the list of support articles for this post we find A framework for characterizing the practice of evaluation, with application to empowerment evaluation. This article is meaningful because it looks to provide a line of demarcation separating empowerment evaluation from other forms of evaluation. As per Smith (1999), “A useful first step in clarifying the diversity of evaluation practice might be the development of a comprehensive framework with which to compare and contrast fundamental attributes of any evaluation approach” (p. 43). Such characteristics of this framework include consideration for context, role, interest, rules, justification, and sanctions. Smith (1999) therefore continues, “This analysis of Empowerment Evaluation illustrates how the aspects of the framework (context, purpose and social role, phenomena of interest, procedural rules, methods of justification, and sanctions), are highly interrelated… The primary phenomena of interest in Empowerment Evaluation are participant self-determination, illumination, and liberation, and not the worth of programs” (p. 63). This becomes important as the line of demarcation appears not to be a single line, but one containing many interrelated facets. Yet if the core of empowerment evaluation is to focus on the increased capability of evaluative inquiry among organizational stakeholders, this becomes a differentiated form of evaluation reporting which remains tantamount to determining whether one was successful in inciting such a transformation in reality. I will thus continue my search for not only continued understanding of how empowerment evaluation differs from other forms of evaluation, yet will equally focus on how communicating results and instigating change affect the very outputs of this process in specific.

Fetterman, D. M. (2002). Empowerment evaluation: Building communities of practice and a culture of learning. American Journal of Community Psychology, 30(1), 89-102.

Fetterman, D. M. & Wandersman, A. (2005). Empowerment evaluation: Principles in practice. New York, NY: The Guilford Press.

Middaugh, M. F. (2010). Planning and assessment in higher education: Demonstrating institutional effectiveness. San Francisco, CA: Jossey-Bass.

Smith, N. L. (1999). A framework for characterizing the practice of evaluation, with application to empowerment evaluation. The Canadian Journal of Program Evaluation, 14, 39-68.

Worthington, C. (1999). Empowerment evaluation: Understanding the theory behind the framework. The Canadian Journal of Program Evaluation, 14(1), 1-28.

Post-Doc Blogpost: Machine Scoring & Student Performance

I believe one concept at the heart of considering the validity of a writing assessment via automated essay scoring (AES) as a measure of student performance in a graduate program is fidelity. Fidelity has to do with the degree to which the task, response mode, and what is actually scored matches the requirements found in the real world (Zane, 2009, p. 87). This includes consideration for such elements as context, structure, and the item’s parameters. What lends to granting the validity of such an assessment is the fact that much of what is expected of a writing assessment mirrors the college experience. From considerations surrounding grammar and sentence structure, to critical thinking and conceptual integration, an automated writing assessment permits for greater fidelity among an exercise which emulates the student experience in myriad ways. Where attention must be paid, however, regards the validity of objective scoring provided by an AES system.

Machine scores are based on a limited set of quantifiable features in an essay while human holistic scores are based on a broader set of features, including many, such as the logical consistency of an argument, which cannot yet be evaluated by a machine (Bridgeman, Trapani, & Attali, 2012, p. 28). The assessment itself, then, is was provides much of the fidelity to the student experience. What has yet to be developed is a holistic way of interpreting and subsequently scoring essay responses with the same level of depth and consideration for elements not contained within a designed algorithm as that with human raters. Human raters can probe creativity, innovation, and integrative thinking. Human raters can identify off-topic content just as an AES system would, yet rather than reducing the score by default as would be the design per AES, the human rater can form an individual opinion regarding off-topic content and its relevance to the response. Where one is asked about the concept of truth in such an environment as Accuplacer testing, the scoring mechanism might deduct for the use of vernacular proximal to art for example, to which the author refers as art and truth alike are subjective interpretations. What AES does well, and human raters do not, however, is score essay responses with an equal level of consistency across multiple raters and multiple ratings.

As AES models often formed by using more than two raters, studies that have evaluated interrater agreement have usually showed that the agreement coefficients between the computer and human raters is at least as high or higher than among human raters themselves (Shermis, Burstein, Higgins, & Zechner, 2010, p. 22). This gives pause to doubt cast on an AES systems’ ability to accurately and reliably score work. Yet this reliability does not necessarily denote validity as we have previously discussed. Thus, an environment where at least one AES score and one human rater score are considered in conjunction, presents the most promising synthesis of both approaches regarding a single assessment item and score. Further understanding of rater cognition is necessary to have a more thorough understanding of what is implied by the direct calibration and evaluation of AES against human scores and what, precisely, is represented in a human score for an essay (Ramineni & Williamson, 2013, p. 37). This in mind it is critical that those designing such assessments remain attentive to differences between scores among human raters, the difference in scores between machine and human rater, and any differences which exist in the AES model’s ability to measure student performance against a particular rubric. Should the rubric cover such concepts as creativity, an element to which AES is disadvantaged, this emphasis on content will be the driving force to inform the decision to use AES. Ultimately, it will depend on the context, grading criteria as explained by the rubric, and the opportunity for joint assessment via human rater which drive a decision of whether AES is a valid source of objective writing assessment.

Bridgeman, B., Trapani, C., & Attali, Y. (2012). Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country. Applied Measurement in Education, 25(1), 27–40.

Ramineni, C., & Williamson, D. M. (2013). Automated essay scoring: Psychometric guidelines and practices. Assessing Writing, 18(1), 25–39.

Shermis, M. D., Burstein, J., Higgins, D., & Zechner, K. (2010). Automated essay scoring: Writing assessment and instruction. In P. Peterson, E. Baker, and B. McGaw (Eds.), International encyclopedia of education (3rd ed.), 20–26. Oxford, UK: Elsevier.

Zane, T. W. (2009). Performance assessment design principles gleaned from constructivist learning theory (Part 2). TechTrends, 53(3), 86–94.

Post-Doc Blogpost: Validity & Reliability in Performing Assessments

Designing a question for an instrument is designing a measure, an answer given to a question is of no intrinsic interest, and the answer is valuable only to the extent that it can be shown to have a predictable relationship to facts or subjective states that are of interest (Fowler, 2009, p. 87). Sweeping satisfaction questions such as with one’s job or degree program are, inherently, of little value to an assessment in their extant form. They contain limited value as there are a number of differing subjective states which might be experienced by a student throughout his/her degree program as an example. A student can find great value in his/her first courses, or potentially an entire year, yet subsequently find little value in those courses which remain. This alone would indicate the item’s inability to measure these changes in a student’s state, and would therefore not exhibit a predictable relationship to the state of interest. An objective test item is defined as one for which the scoring rules are so exhaustive and specific that they do not allow scorers to make subjective inferences or judgments (Murayama, 2012, para. 1). Requiring students to infer what period in time to which the item refers constitutes subjectivity, and negates the item’s ability to deliver anything other than a highly subjective, highly summative, point of view.

Many teachers believe that they need strong measurement skills, and report that they are confident in their ability to produce valid and reliable tests (Frey, Petersen, Edwards, Pedrotti, & Peyton, 2005, p. 2). Yet this contention remains at-issue, both as standards for establishing validity remain disparate and interspersed throughout the literature on item-writing, and as the research also shows limited assessment training as required curriculum among teaching certification programs. What is then needed to determine whether items possess required validity, are standards for the validity of each assessment item. Of an identified 40 different item-writing rules, each falls into one or more of few categories, including potentially confusing wording or ambiguous requirements, guessing, rules addressing test-taking efficiency, and rules designed to control testwiseness (Frey, Petersen, Edwards, Pedrotti, & Peyton, 2005, p. 4). Each category includes a number of item-writing rules all intended to address differing concerns for validity. Potentially confusing wording or ambiguous requirements is a category which speaks to a confidence of whether every respondent will understand a question the same way. Guessing in this instances refers to the exclusion of responses where respondents simply chose a correct answer by chance, and therefore the probability of this occurring must be reduced. Rules addressing test-taking efficiency have to do with designing items in such a way that their structure does not impede, their form is simple, completing each is brief, and options are made clear. Finally, regarding rules designed to control testwiseness, this refers to designing items so (to the largest extent possible) items are answered using only knowledge, ability, or a combination of the two, rather than identifying patterns or other unintended characteristics of an item which may lead respondents to accidentally identify a correct answer without knowing why it is the correct answer. In order to infuse greater validity into the item discussed above, considerations for all four categories are prudent. Yet tantamount to many is to alter the item in such a way that ambiguous requirement is corrected, and an appropriate span of time is delineated of the contexts of the question.

Where validity deals with the relationship between each item and an area of interest, reliability deals with the relationship between each item and the consistency of results each time a measurement is taken. In discussing whether scores resulting from an item demonstrate reliability, look for whether the items’ responses are consistent across constructs, whether scores are stable over time when the instrument is administered a second time, and whether there is consistency among test administration and scoring (Creswell, 2009, p. 149). While researchers often address validity and reliability as separate considerations, I feel their interrelationship cannot be described strongly enough. Returning to the example item on program satisfaction above, if the validity of the measurement is compromised as in creating confusion among respondents for when and how much of the program it is intended to describe, this will then heighten the probability of inconsistent responses, which then directly threatens reliability. If one respondent can answer the same question multiple ways, and do so defensibly each time while regarding another aspect of the same context, we are now measuring the same condition multiple times, and arriving at multiple and quite different results. This is especially problematic with either a true/false or multiple choice item, as either presents a very limited list of potential responses. Altering response patterns among respondents based on poorly worded items leaves the reliability of the instrument in question, as with each subsequent administration it is entirely likely different responses among those available are selected, and the percentage of each item chosen (and therefore its description of a percent of a population assessed) is unreliable. It is only after the ambiguities inherent in the item’s wording are addressed, and consistent responses collected across multiple administrations, can this item begin to be described as either valid or reliable or both.

Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches (3rd ed.). Thousand Oaks, CA: Sage.

Fowler, F. J. (2009). Survey research methods (4th Ed.). Thousand Oaks, CA: Sage Publications, Inc.

Frey, B. B., Petersen, S., Edwards, L. M., Pedrotti, J. T., & Peyton, V. (2005). Item-writing rules: Collective wisdom. Teaching and Teacher Education, 21(4), 357–364.

Murayama, K. (2012). Objective test items. Retrieved December 24, 2013 from http://www.education.com/reference/article/objective-test-items/.

Post-Doc Blogpost: Reciprocal Recognition – Permitting Evaluation to Evoke Advocacy

I had the recent privilege of attending a presentation by Dr. Raymond Cheng, who spoke of the topic of degree equivalency across the world. This presentation covered such qualifiers as degree programs either being quick, cheap, or recognized, yet not any combination of more than two of these. The purpose of this emphasis is for learners and potential graduates to conduct an informed review of how equivalency is regarded when particularly looking at a given degree and its sister recognitions in other parts of the world. Where this becomes meaningful for program evaluation, is when we look beyond the confines of either the evaluation of process outcomes or program outcomes, and look at the issue of how program evaluation is designed to either prevent, or permit, evaluator as advocate for the learners served by these programs.

To describe our ultimate intent Dane (2011) remarks, “Evaluation involves the use of behavioral research methods to assess the conceptualization, design, implementation, and utility of intervention programs. In order to be effectively evaluated, a program should have specific procedures and goals, although formative evaluation can be used to develop them. Summative evaluations deal with program outcomes” (p. 314). At this level Dane permits our understanding that while many interests can converge upon a program’s process and programmatic outcomes, it will be its goals which drive what of the program is evaluated, and to what end. Do we wish to evaluate whether equivalency is a primary goal of a given degree program? Are we instead concerned with the political factors affecting the equivalency evaluation process? These and other considerations are seen as potential intervening variables amid the process of evaluating a program’s efficacy. Yet it may just be that student advocacy is the goal, not an outgrowth of the goal. We return to Dane (2011) to summarize, “Because the researcher is probably the most fully informed about the results, the researcher may be called upon to make policy recommendations. Personal interests may lead a researcher to adopt an advocate role. Although policy recommendations and advocacy are not unethical themselves, care must be taken to separate them from the results” (p. 314).

Evaluations should employ technically adequate designs and analyses that are appropriate for the evaluation purposes (Yarbrough et al., 2011, p. 201). While many of us understand this point anecdotally, combining this concept with advocacy allows us to then understand that it may not be simply a program’s learning outcomes which are the greatest goal. What of the international student who wishes to complete an MBA in the US, have that degree recognized as an MBA in Hong Kong, such that he/she may make a global impact as either consultant or scholar? If the extant evaluation process does not take this goal into account, mere learning outcomes centered on financial analysis, strategic planning, or marketing management as an MBA program would include could potentially be for not when considering the long-term prospects of this student in specific. We, then, as program evaluation personnel are given a critical task when in the design phase, as well as the formative evaluation phase, determining whether the stated goals of a program are in alignment with the goals of those who stand to benefit from that program.

Several factors can influence the role of an evaluator, including the purpose of the evaluation, stakeholders’ information needs, the evaluator’s epistemological preferences, and the evaluation approach used (Fleischer & Christie, 2009, p. 160). This conclusion in mind we see the inherent design equation is only complicated by the involvement of the evaluator him/herself. How this person regards the creation of new knowledge is among the considerations given among the inherent design process. Yet where concepts such as reciprocal recognition, evaluator epistemology, negotiated purposes, and defensible evaluation design converge, is upon the goals established not just by the program’s administrators alone, but the stated goals inclusive of those established by those who stand to benefit most from a program’s existence. Thank you for this reminder, and apt coverage of this topic Dr. Cheng.

Dane, F. C. (2011). Evaluating research: Methodology for people who need to read research. Thousand Oaks, CA: Sage Publications, Inc.

Fleischer, D. N., & Christie, C. A. (2009). Evaluation use: Results from a survey of U.S. American Evaluation Association members. American Journal of Evaluation, 30(2), 158–175.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Post-Doc Blogpost: Issue Polarization & Evaluator Credibility

On the topic of ideology and polarization Contandriopoulos & Brousselle (2012) note, “Converging theoretical and empirical data on knowledge use suggest that, when a user’s understanding of the implications of a given piece of information runs contrary to his or her opinions or preferences, this information will be ignored, contradicted, or, at the very least, subjected to strong skepticism and low use” (p. 63). Program evaluation, as with any other form of research and analysis, must be evaluated in context.  Yet context is not simply defined as the setting of the evaluation, nor the intent of the evaluation alone.  Instead, context must also include consideration for the evaluation’s design, and the very credibility of the evaluator him/herself as well. Evaluations should be conducted by qualified people who establish and maintain credibility in the evaluation context (Yarbrough et al., 2011, p. 15). This points to the need to not only ensure an audience capable of reception of the ideas/findings brought forth by the evaluation, yet to the equally necessary inclusion of evaluator’s capability of preserving the credibility of the study by purporting their own professional credibility as well.

An example of this in action was at a program evaluation session as part of the Orange County Alliance for Community Health Research last year.  This event, presented at UC Irvine, included a three hour presentation on program evaluation delivered by Michelle Berelowitz, MSW (UC Irvine, 2012). MS Berelowitz spoke at length on the broader purpose of program evaluation, the process for designing and conducting program evaluation, and the potential applications of program evaluation. This event was attended by a multitude of program directors, and other leaders of health and human services agencies in proximity to the university, intending to both learn of this process and to network with other agencies as well. Where polarization was introduced, and therefore the first instance of calling evaluator credibility into question, was during the introduction of MS Berelowitz’ presentation.  She, in very plain language, asked the audience who among them was motivated when it came time to perform evaluations of their programs each year.  This question was posed, to which none replied as being motivated, and a general consensus of disregard for the annualized process instead loomed. This calls the evaluator’s credibility into question, as the process itself is only as valuable as it is perceived by its audience, and program evaluation is only meaningful, when it can impact decisions and affect change.

If, during a presentation intended to inform others of the very merits of this program evaluation process the evaluator’s credibility is called into question, strategies must be enacted to counteract this stifling critique and inattention to the process’ value. To briefly return to the value in identifying and addressing polarization among stakeholders Contandriopoulos & Brousselle (2012) remark, “as the level of consensus among participants drops, polarization increases and the potential for resolving differences through rational arguments diminishes as debates tend toward a political form wherein the goal is not so much to convince the other as to impose one’s opinion” (p. 63).  Thus, in a room where a presentation on the merits of program evaluation is to be received with tepid acceptance, the evaluator holds the responsibility to convey the process in a way which fosters consensus, and restores credibility to the process.

One means of establishing greater evaluator credibility is by ensuring inclusion. This remains of no surprise as much of the literature regarding program evaluation centers upon a focus on stakeholder inclusion.  Yet to specifically address how this relates to evaluator credibility Yarbrough et al. (2011) write, “Build good working relationships, and listen, observe, and clarify. Making better communication a priority during stakeholder interactions can reduce anxiety and make the evaluation processes and activities more cooperative” (p. 18).  This was masterfully exercised by MS Berelowitz, as throughout the presentation she was found to be engaging, she drew insights from multiple attendees of the presentation, she worked to incorporate many of the attendees own issues into the presentation’s material, and she was thoughtfully respondent to attendee questions and further paradigm inquiry.

Another of the methods by which evaluator credibility can be restored, is in ensuring the design of the research is one where the audience can be receptive of the work performed. On this topic Creswell (2009) writes, “In planning a research project, researchers need to identify whether they will employ a qualitative, quantitative, or mixed methods design. This design is based on bringing together a worldview or assumptions about research, the specific strategies of inquiry, and research methods” (p. 20). Yet these conclusions impact not only the design of the research itself – and by extension design of program evaluation – yet are considerations which impact an evaluator’s ability to design meaningful research which conveys information according to long established assumptions about research. In this instance, MS Berelowitz conveyed a presentation on program evaluation which was deeply supported by the extant literature, was a presentation which purported her worldview and assumptions on research and thus program evaluation quite clearly, and was delivered in such a way that attendees were permitted to witness both the technical and practical merits of navigating the program evaluation process in the way presented. In both ensuring the inclusion of stakeholders, and ensuring worldview, inherent assumptions, and defensible design, the presentation was ultimately a success, and one where attendees left conveying motivation of the program evaluation process ahead.

Contandriopoulos, D., & Brousselle, A. (2012). Evaluation models and evaluation use. Evaluation, 18(1), 61–77.

Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches. Thousand Oaks, CA: Sage Publications, Inc.

UC Irvine. (2012). Program evaluation. Retrieved October 22, 2013 from http://www.youtube.com/watch?v=XD-FVzeQ6NM.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Post-Doc Blogpost: On Explicit Evaluation Reasoning

Evaluation reasoning leading from information and analyses to findings, interpretations, conclusions, and judgments should be clearly and completely documented (Yarbrough et al., 2011, p. 209). This standard arises not solely for the purpose of ensuring one’s conclusions are logical, rather this standard additionally emerges to function as both a final filter and an ultimate synthesizer of the results of all other accuracy standards. A7 – Explicit Evaluation Reasoning as per The Program Evaluation Standards serves to make known the efficacy of the process by which conclusions are reached. Said of this standard Yarbrough et al. (2011) continue, “If the descriptions of the program from our stakeholders are adequately representative and truthful, and if we have collected adequate descriptions from all important subgroups (have sufficient scope), then we can conclude that our documentation is (more) likely to portray the program accurately” (p. 209).  This level of holism leaves us with a critical imperative, to serve the program we are evaluating well, and to serve the negotiated purposes of the evaluation to their utmost.

Said of the need to ensure clarity, logic, and transparency of one’s process Booth, Colomb, & Williams (2008) elucidate, “[Research] is a profoundly social activity that connects you both to those who will use your research and to those who might benefit – or suffer – from that use” (p. 273). We then have a responsibility as evaluators and as researchers, to conduct ourselves and to document our process explicitly.  Doing so preserves such attributes tantamount to quality research as reproducibility, generalizability, and transferability. Yet there are also more specific considerations at-play. On the topic of this standard’s importance to current/future professional practice, we use the example of an extant job posting for a Program Evaluator with the State of Connecticut Department of Education. The description for this position includes the following, “A program evaluation, measurement, and assessment expert is sought to work with a team of professionals developing accountability measures for educator preparation program approval. Key responsibilities will include the development of quantitative and qualitative outcome measures, including performance-based assessments and feedback surveys, and the establishment and management of key databases for annual reporting purposes” (AEA Career, n.d., para. 2). This position covers a wide range of AEA responsibilities, and makes clear from only the second paragraph the sheer scope of responsibility under this position.  And while the required qualifications include mention of expertise in program evaluation, qualitative and quantitative data analyses, as well as research methods, it more importantly concludes with mention of the need to ‘develop and maintain cooperative working relationships’ and demonstrate skill in working ‘collaboratively and cooperatively with internal colleagues and external stakeholders’. What is required, then, is not solely a researcher with broad technical expertise, nor simply a methodologist with program evaluation background, but instead a member of the research community who can deliver on the palpable need to produce defensible conclusions from explicit reasoning in a way which connects with a broad audience of users and stakeholders.

Explicit reasoning, expressed in a way digestible by readers, defensible to colleagues, and actionable by program participants, requires the researcher be comfortable with where he/she is positioned in relation to the research itself when communicating both process and results.  This is also known among as the literature as positionality. Andres (2012) speaks of this in saying, “This positionality usually involves identifying your many selves that are relevant to the research on dimensions such as gender, sexual orientation, race/ethnicity, education attainment, occupation, parental status, and work and life experience” (p. 18). And yet why so many admissions solely for the purpose locating one’s self among the research? Because positionality has as much to do with the researcher, as it does the researcher’s position and its impact on program evaluation outcomes. An example of this need for clarity comes to us from critical action research.  Kemmis & McTaggart (2005) describe, “Critical action research is strongly represented in the literatures of educational action research, and there it emerges from dissatisfaction with classroom action research that typically does not take a broad view of the role of the relationship between education and social change… It has a strong commitment to participation as well as to the social analyses in the critical social science tradition that reveal the disempowerment and injustice created in industrialized societies” (p. 561). This in mind, it stands to reason that one can only be successful in such a position, if the researcher him/herself is made clear, his/her position to the research is clear, his/her stance on justice as only one example is considered, the process by which the research is conducted is clear, and how this person in relation to this research then renders subsequent judgment on data collected.  For this Program Evaluator role, just as many others like it, must be permitted to serve as both researcher and advocate, exercising objective candor throughout.

American Evaluation Association. (n.d.). Career. Retrieved October 9, 2013 from http://www.eval.org/p/cm/ld/fid=113.

Andres, L. (2012). Designing & doing survey research. London, England: Sage Publications Ltd

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed.). Chicago, IL: The University of Chicago Press.

Kemmis, S. & McTaggart, R. (2005). Participatory action research. In Denzin, N. K. & Lincoln, Y.S., The sage handbook of qualitative research (3rd Ed.), (559-604). Thousand Oaks, CA: Sage Publications, Inc.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.