Speak Less, Say More, To Keep Your Strategy On-Track

Microphones at the podium

This did not begin well. For the past two months I have been teaching a group of senior executives and high level technical professionals the finer points of communicating strategy to their organizations. Day 1 was a travesty, I was all but drawn and quartered by their perfunctory comments relative to impressions of how confusing it was that I assumed their having a strategy did not necessarily equal their having an ability to communicate that strategy.

To some extent they were right, yet to a larger extent we all still had quite a lot to learn. While I’ve been teaching college students for the better part of the last 12 years, I still went home that first night of this class just weeks ago entirely deflated. While confident in my ability to both recall and relate the material, I had a larger lesson to learn about the inauthentic way in which I arrived that first night. This was not a class of first year freshman looking for strong leadership. These are seasoned leaders themselves and in their last class prior to graduating with a Master’s degree in, of all things, leadership.

So what did I do differently to turn things around? I stopped trying to talk my way out of it. Instead, I just recognized my role as they did, a bridge between theory and practice yet nothing more. I created the boundaries for their communication and learning and set them free within those boundaries. My communication with them has also since changed. I toned down the sage on the stage. I listened more intently. I adjusted both style and content on the fly. I adjusted my approach to leverage their experience and my own, all while tying in the content when I could through substantive yet bite-sized key takeaways to keep it memorable. In the process I realized that as I was teaching them to become more effective communicators, such that they may communicate their strategies more effectively, I too was learning to be more effective with my own communication. This in mind, I just wanted to take a minute to share a dozen of my key takeaways from this class on communicating strategy, that I think you may just find helpful across communication applications of all types.

  1. Effective Leaders Lead Strategy and Tactics
  2. Lead with Logic & Emotion, Not Logic or Emotion
  3. Everyone Can Be a Change Agent
  4. Effective Alignment Requires A Common Message
  5. Keep the Strategy Message Bite-Sized & Repeatable
  6. Reach Them Through Intrinsic Motivations
  7. Identify Motivations Through Open Discussion
  8. Connect Strategy to a Destination
  9. Measure the Business to Measure the Communication
  10. Always Reward Those Supporting the Strategy
  11. Expect of Yourself What You Expect of Others
  12. Your People Are Your Primary Communication Vehicle

You may entirely disagree with some of what is listed and that is certainly for you to decide. There may be some (or many) which you find context-specific and are of no use to you. Yet I still believe knowing what content matters and what does not is one of our first steps toward great communicating as a leader, so thank you for reading.

Research in Business is Everyone’s Business

Collective-Consciousness

I am a firm believer that having a greater number of college degrees does not necessarily mean you’re smarter than those with fewer. I am unapologetic in my stance, as I believe the role of the university is not to increase your IQ (arguably a number with little flux). The role of the university is instead to train you, largely in a particular discipline or process or both. Yes, some programs require a greater degree of raw intelligence, and the purpose of this post is not to draw those lines. The purpose instead is to understand how we can walk away from the misconception that only those with a research background can perform business research. What connects these two dots? In short, the conclusion that just because someone has a PhD, it does not mean they know more about your business than you do. In fact, the opposite is usually true. If they are trained in the process, and you are intimate with your business, I would like to make a suggestion. When seeking a greater understanding of your business’ either process, program, or product performance, team up instead to form a symbiotic relationship between the business and a researcher so you both can accomplish more and do the research together.

The Why – The Interpretive Approach

Among the approaches to organization studies which exist, these include the interpretive approach. As described by Aldrich and Ruef (2009):

The interpretive approach focuses on the meaning social actions have for participants at the micro level of analysis. It emphasizes the socially constructed nature of organizational reality and the processes by which participants negotiate the meanings of their actions, rather than taking them as a given. Unlike institutional theorists, interpretive theorists posit a world in which actors build meaning with locally assembled materials through their interaction with socially autonomous others. (p. 43)

If this is true, then a lone researcher cannot simply be transplanted from one organization to the next, all the while delivering revenue-trajectory-altering research in a vacuum. The research is to be built on great questions, those may just come from the business, and the very meaning of the business and the data it generates is embedded within the interactions and the actors in the business itself.

The What – A Symbiotic Relationship

A relationship where you – representing the business – provide the context, maybe even help gather some of the data, and are there to take part in the interpretation once the researcher has completed a substantive portion of his/her analysis. You’re a team, the researcher is not a gun for hire. Which also means, if you’re a team, you’re a researcher too. This approach is important for many reasons, among which includes your store of tacit knowledge. As we are reminded by Colquitt, Lepine, and Wesson (2013), “Tacit knowledge [is] what employees can typically learn only through experience. It’s not easily communicated but could very well be the most important aspect of what we learn in organizations. In fact, it’s been argued that up to 90 percent of the knowledge contained in organizations occurs in tacit form” (p. 239). That is a vast amount of available information the researcher simply will not have if you do not team up and start working together.

The How – A Cue from Empowerment Evaluation

We can draw a number of conclusions on how best to form this reciprocal relationship between business and researcher as one team, and many come from the literature on empowerment evaluation. As put by Fetterman and Wandersman (2005):

If the group does not adopt an inclusive and capacity-building orientation with some form of democratic participation, then it is not an empowerment evaluation. However, if the community takes charge of the goals of the evaluation, is emotionally and intellectually linked to the effort, but is not actively engaged in the various data collection and analysis steps, then it probably is either at the early developmental stages of empowerment evaluation or it represents a minimal level of commitment. (p. 9)

There is a final, critical subtext to all of the above. In essence, there must be a consistent flow of ideas between the researcher and the business. Research in business is everyone’s business, yet only in environments when the researcher can share his/her craft, and the business more informed can help to grant the researcher access to the knowledge only they possess. For a final thought on the merits of this proposed team I defer to the literature on constructing grounded theory. Therein Charmaz 2014 reminds us that, “We need to think about the direction we aim to travel and the kinds of data our tools enable us to gather… Attending to how you gather data will ease your journey and bring you to your destination with a stronger product” (p. 22).

About the Author:

Senior decision support analyst for Healthways, and current adjunct faculty member for Allied American University, Grand Canyon University, South University, and Walden University, Dr. Barclay is a multi-method researcher, institutional assessor, and program evaluator. His work seeks to identify those insights from among enterprise data which are critical to sustaining an organization’s ability to complete. That work spans the higher education, government, nonprofit, and corporate sectors. His current research is in the areas of employee engagement, faculty engagement, factors affecting self-efficacy, and teaching in higher education with a focus on online instruction.

Mitigating Hazards in Justified Conclusions & Sound Design

A1 and A6 of The Program Evaluation Standards regard both Justified Conclusions and Decisions, as well as Sound Designs and Analyses. Where A1 asks that evaluation conclusions and decisions be explicitly justified in the cultures and contexts where they have consequences, A6 asks that evaluations employ technically adequate designs and analyses that are appropriate for the evaluation purposes (Yarbrough et al., 2011, p. 165-167). In these instances, we regard standards which impact the potential accuracy of an evaluation. When discussing strategies for mitigating the hazards associated with these standards, previous coverage elucidated suitable actions ranging from integrating stakeholder knowledge frameworks, to clarifying roles amid the evaluation team, to properly defining what is meant by accuracy in the context of a given evaluation. Extant strategies discussed also include selecting designs based on the evaluation’s purpose, while still including enough flexibility in the design that compromise and uncertainty can be permitted during this iterative process. Here we discuss strategies in addition to those previously mentioned, and will instead focus on mitigating the hazards associated with the accuracy standards by exploring both the concept of triangulation, and of establishing validation in practice.

Triangulation is employed across quantitative, qualitative, and mixed methods research alike, as a means with which to prevent such common errors among research as establishing conclusions based on samples which are not representative of their stated population, and permits the reduction of confirmation bias among findings.  As per Patton (2002), “Triangulation is ideal. It can also be expensive. A study’s limited budget and time frame will affect the amount of triangulation that is practical, as will political constraints in an evaluation. Certainly, one important strategy for inquiry is to employ multiple methods, measures, researchers, and perspectives – but to do so reasonably and practically” (p. 247). This operates as a strategy for mitigating hazards among the accuracy standards, as the primary intent of those standards is to provide context-specific conclusions which are defensible, and inclusive of those stakeholders involved in the process.  Risks to accuracy are addressed by this strategy, as while a single data point, or collection method may provide a pertinent picture of the evaluation’s efficacy, the ability o further substantiate those results via additional methods and additional data only serve to strengthen what conclusions are reached. In addition to ensuring data integrity via a multi-method approach to data collection, establishing validation in practice remains an addition strategy for mitigating the hazards of our accuracy standards.

On establishing validation in practice, Goldstein & Behuniak (2011) comment, “In the Standards, validity is defined as the ‘degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests… The interrelationships among the interpretations and proposed uses of test scores and the sources of validity evidence define the validity argument for an assessment” (p. 180). What then becomes pertinent among the evaluation team’s efforts is to ensure any/all relevant validity evidence is collected, alongside the proposed uses of the methods selected and employed for a particular evaluation.  There remains an onus upon the evaluation team to not only substantiate the conclusions borne of the team’s data collection and analysis of the immediate program and environment, there equally remains an onus upon this same team to first establish the validity of the methods and instruments chosen for that data collection and analysis.  Simply requiring stakeholders to trust in the expert judgment of an evaluation team’s select of methods and instruments is cause for concern, as this does not permit the kind of inclusion of stakeholder knowledge frameworks mentioned above. Rather, to ensure that accuracy standards are upheld an inclusive process of iterative reviews of the proposed design and execution of the design with stakeholders groups predicates the necessary level of holism required to conclude a design and its instruments accurate.  The evaluation team absolutely brings with it the knowledge, experience, and technical prowess necessary to perform a successful evaluation, yet doing so without consulting the knowledge and experience of stakeholders provides an opportunity for research to be performed which is not in alignment with what is intended of those employing such a team.  Stakeholders, while not technical or content experts of AEA per se, would have as much to contribute on the selection of data points, methods employed, and analysis performed as the team itself, based solely on stakeholder involvement with the program directly and the experiences that interaction brings with it. They would not serve as the primary source of suggestions for design, yet would serve to discern which might serve the current situation best.  Said of this paradox, Booth, Colomb, & Williams (2008) note, “A responsible researcher supports a claim with reasons based on evidence. But unless your readers think exactly as you do, they may draw a different conclusion or even think of evidence you haven’t” (p. 112).

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed.). Chicago, IL: The University of Chicago Press.

Goldstein, J., & Behuniak, P. (2011). Assumptions in alternate assessment: An argument-based approach to validation. Assessment for Effective Intervention, 36(3), 179–191.

Patton, M. Q. (2002). Qualitative research & evaluation methods. Thousand Oaks, CA: Sage Publications, Inc.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Faculty Accountability through Individual Assessment Data

In what way(s) can we as faculty hold ourselves increasingly accountable for the learning outcomes of our students, evidenced though increasing means of individual assessment rooted in both quantitative and qualitative measures alike?

This broader phraseology is used not because I could not think of a more specified question, yet instead is written in such a way that takes into account whatever context and existing levels of assessment each faculty member at an individual level already employs. I have worked with organizations where the faculty member’s performance is judged using a triangulation of supervisor feedback on progress to meeting established goals, in conjunction with the inclusion of a series of measures against student scores in-class, combined with the feedback from an ongoing student survey process. Yet does this triangulation process provide enough data to truly carry out individual assessment at a level which demonstrates sufficient accountability? By setting clear and ambitious goals, each institution can determine and communicate how it can best contribute to the realization of the potential of all its students (Association of American Colleges & Universities, 2008, p. 2). This in mind our first consideration must be less about the process by which individual assessment is carried out, and instead must first consider whether the goals as they are currently established are sufficient for the purpose of holding individual accountability to a sufficient standard. Were the goal to simply ensure a high proportion of students pass each class, this completion goal is one met with low levels of accountability for how that goal is met. Alternatively, a goal which includes reference to areas of assessment, areas of professional development, areas of curricular review, all while targeting student success then begets a goal which holds faculty to a higher level of accountability both for the content and method(s) of individual assessment and performance.

Another strategy for holding faculty to a higher level of individual accountability in assessment concerns the data points collected. Outcomes, pedagogy, and measurement methods must all correspond, both for summative assessment such as demonstrating students have achieved certain levels, and formative assessment such as improving student learning, teaching, and programs (Banta, Griffin, Flateby, & Kahn, 2009, p. 6). In considering how such a dynamic process is then implemented, we can consider such concepts as a community of practice, and community of learning, and instead consider the implementation of a community of assessment. Holding each other mutually accountability for formative and summative assessment alike is one way a faculty member can gain more in-depth data during his/her individual assessment process, by eliciting the feedback of supervisors, peers, other colleagues, and students collectively in order to form a community of assessment. One extant method of this today is the 360 degree feedback process. This process asks for the performance feedback regarding one individual, sought from positions proximal to the individual in all directions, ranging from those the person works for, to those he/she works with, to those who serve him/her. Such a process can help instigate a community of assessment by sharing the individual assessment process among many, permitting both richer data for individual assessment, and a subsequent means for theming data across individuals as well. Such a process can combine feedback from students and fellow faculty to learn how a particular program is serving the community, while equally assessing say teaching style and whether/how this impacts a faculty member’s ability to teach. The implications of such a process are promising, not only because there are already a great number of tools available to implement such an evaluation process, yet equally promising as the individual assessment process is then served by a multifaceted data collection procedure. One best way of asserting the merits of the academy is to implement an assessment of learning assessment system that simultaneously helps improve student-learning and provides clear evidence of institutional efficacy that satisfies appropriate calls for accountability (Hersh, 2004, p. 3).

– Justin

Association of American Colleges & Universities (AAC&U) Council for Higher Education Accreditation (CHEA). (2008). New leadership student learning accountability: A statement of principles, commitments to action. Washington: DC. Retrieved from http://www.newleadershipalliance.org/images/uploads/new%20leadership%20principles.pdf.

Banta, T. W., Griffin, M., Flateby, T. L., & Kahn, S. (2009). Three promising alternatives for assessing college students’ knowledge and skills. (NILOA Occasional Paper No.2). Urbana, IL: University of Illinois and Indiana University, National Institute for Learning Outcomes Assessment (NILOA). Retrieved http://learningoutcomesassessment.org/documents/AlternativesforAssessment.pdf.

Hersh, R. H. (2004). Assessment and accountability: Unveiling value added assessment in higher education. A paper presented at the AAHE National Assessment Conference, Denver, CO. Retrieved from http://www.aacu.org/resources/assessment/Hershpaper.pdf.

Justified Conclusions & Sound Evaluation Design

Among the program evaluation standards are the two accuracy standards A1: Justified Conclusions and Decisions, and A6: Sound Designs and Analyses. A1 regarding justified conclusions and decisions is defined by Yarbrough, Shulha, Hopson, & Caruthers (2011) as, “Evaluation conclusions and decisions should be explicitly justified in the cultures and contexts where they have consequences” (p. 165). The associated hazards among this standard are many, they include assumptions of accuracy among evaluation teams, the ignoring of cultural cues and perspectives, assumptions of transferability, and finally a number of hazards concerning the emphasis on technical accuracy at the expense of cultural inclusivity and immediate environmental context in combination (Yarbrough et al., 2011, p. 167). Where this particular standard is most consequential concerns the sociological factors inherent in any assessment. Said of such factors, Ennis (2010) notes, “It is the liking part – the emotional, aesthetic, or subjective decision to actively cooperate with an institution’s assessment regime – that suggests the difficulties inherent in coupling the success of an assessment program to the establishment of an assessment culture” (p. 2). Thus, an assessment culture cannot be established through rigor or display of technical prowess alone.

An institutions’ absorptive capacity is not directly correlated with the rate of acceptance of new knowledge. Rather, the effect of hazards such as disregarding extant culture/subcultures, disregarding the needs of the immediate environment, and disregarding transferability all pose a direct threat to both acceptance and adoption of whatever findings an assessment produces. The recommendations for correcting for this therefore include (1) clarifying which stakeholders will form conclusions and permit the integration of those stakeholders’ knowledge frameworks; (2) clarify the roles and responsibilities of evaluation team members; (3) ensure findings reflect the theoretical terminology as defined by those who will draw conclusions; (4) identify the many definitions of accuracy as per assessment users; and (5) make effective choices regarding depth, breadth, and representation of the program (Yarbrough et al., 2011, p. 166).

A6, the standard of sound designs and analyses is defined by Yarbrough et al. (2011) as, “Evaluations should employ technically adequate designs and analyses that are appropriate for the evaluation purposes” (p. 201). The associated hazards for this standard include a number of considerations for responsiveness to the features, factors, and purpose(s) of a given program. Such hazards include choosing a design based on status/reputation rather than their ability to provide high quality conclusions, a lack of preparation for potentially disappointing evaluation findings, a lack of consideration for the many feasibility/propriety/utility standards, a lack of customization of design to the current environment, and a lack of broad-based consultant with stakeholders at multiple levels (Yarbrough et al., 2011, p. 204). The effects of a lacking, misguided, or inappropriate design can be devastating to the overall efficacy of a given assessment. Said of the need for sound design Booth, Colomb, & Williams (2008) comment, “In a research report, you must switch the roles of student and teacher. When you do research, you learn something that others don’t know. So when you report it, you must think of your reader as someone who doesn’t know it but needs to and yourself as someone who will give her reason to want to know it” (p. 19).

Performing an assessment based solely on the popularity of the design employed misses the point of assessing the program at-hand, which is to formulate a strategy for better understanding the unique program under study, and relate gathered data in a way both digestible and actionable by those who hold a stake. One can employ a procedure which reliably gathers data, yet if unrelated data, or unnecessary data, the design lacks both utility and in this instance accuracy. So how can accuracy be increased and applicability restored? It is suggested to instead select designs based on the evaluation’s purpose, secure adequate expertise, closely evaluate any designs which are in contention, choose framework(s) which provide justifiable conclusions, allow for compromise and uncertainty, and consider the possibility of ongoing/iterative modifications to the design over protracted periods to ensure currency (Yarbrough et al., 2011, p. 204). Doing so will not only ensure that your audience receives actionable results, yet that same audience can also hold the design and collective opinion of the efficacy of the assessment in greater confidence, as their understanding of it is equally elevated.

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed.). Chicago, IL: The University of Chicago Press.

Ennis, D. (2010). Contra assessment culture. Assessment Update, 22(2), 1–16.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Post-Doc Blogpost: What Sets Empowerment Evaluation Apart

According to Middaugh (2010), “There is no shortage of frustration with the inability of the American higher education system to adequately explain how it operates… The Spellings Commission (2006) chastised higher education officials for lack of transparency and accountability in discussing the relationship between the cost of a college education and demonstrable student learning outcomes” (p. 109). As we continue to look into instigating and preserving positive change, I find Middaugh’s words to go beyond that of a burning platform, and instead resonate as a call to action. Empowerment evaluation is not simply about urging faculty and administrators to perform better evaluations, improving analytical or reporting skills, or increasing skills in evaluative inquiry. Empowerment evaluation in this environment is about being able to better understand one’s self, in a highly regulated and highly monitored environment where the stakes remain quite high.

As three other views into this relevance we begin with Fetterman from his article Empowerment evaluation: Building communities of practice and a culture of learning. Therein Fetterman (2002) describes, “Empowerment evaluation has an unambiguous value orientation – it is designed to help people help themselves and improve their programs using a form of self-evaluation and reflection” (p. 89). Empowerment evaluation is therefore neither strictly formative nor summative, particularly as it is not evaluation performed by evaluation personnel. Rather, it creates enhanced opportunities for sustainability as empowerment evaluation permits stakeholders to conduct their own evaluations. What is greatly advantageous about this approach, regards its direct relationship to change processes. Just as with a guiding vision in most established change processes, empowerment evaluation begins with organizational mission. Fetterman (2002) notes, “An empowerment evaluator facilitates an open session with as many staff member and participants as possible. They are asked to generate key phrases that capture the mission of the program or project” (p. 91).

Party to this line of thinking is also the step of first determining present state before defining future state. As described by Worthington (1999), “First, it is highly collaborative, with input from program stakeholders at every stage of the evaluation process. The four steps or stages of empowerment evaluation are: (1) “taking stock,” during which stage program participants rate themselves on performance; …” (p. 2). This article is equally instructive for helping to make sense of how one takes an organization from present state to future state. As Worthington (1999) later describes, “Empowerment evaluation contains elements of all three forms of participatory research. It is a reciprocal, developmental process that aims to produce ‘illumination’ and ‘liberation’ from role constraints among participants; it shares with action research a commitment to providing tools for analysis to program participants; and the evaluator takes a less directive, collaborative role” (p. 7).

Finally among the list of support articles for this post we find A framework for characterizing the practice of evaluation, with application to empowerment evaluation. This article is meaningful because it looks to provide a line of demarcation separating empowerment evaluation from other forms of evaluation. As per Smith (1999), “A useful first step in clarifying the diversity of evaluation practice might be the development of a comprehensive framework with which to compare and contrast fundamental attributes of any evaluation approach” (p. 43). Such characteristics of this framework include consideration for context, role, interest, rules, justification, and sanctions. Smith (1999) therefore continues, “This analysis of Empowerment Evaluation illustrates how the aspects of the framework (context, purpose and social role, phenomena of interest, procedural rules, methods of justification, and sanctions), are highly interrelated… The primary phenomena of interest in Empowerment Evaluation are participant self-determination, illumination, and liberation, and not the worth of programs” (p. 63). This becomes important as the line of demarcation appears not to be a single line, but one containing many interrelated facets. Yet if the core of empowerment evaluation is to focus on the increased capability of evaluative inquiry among organizational stakeholders, this becomes a differentiated form of evaluation reporting which remains tantamount to determining whether one was successful in inciting such a transformation in reality. I will thus continue my search for not only continued understanding of how empowerment evaluation differs from other forms of evaluation, yet will equally focus on how communicating results and instigating change affect the very outputs of this process in specific.

Fetterman, D. M. (2002). Empowerment evaluation: Building communities of practice and a culture of learning. American Journal of Community Psychology, 30(1), 89-102.

Fetterman, D. M. & Wandersman, A. (2005). Empowerment evaluation: Principles in practice. New York, NY: The Guilford Press.

Middaugh, M. F. (2010). Planning and assessment in higher education: Demonstrating institutional effectiveness. San Francisco, CA: Jossey-Bass.

Smith, N. L. (1999). A framework for characterizing the practice of evaluation, with application to empowerment evaluation. The Canadian Journal of Program Evaluation, 14, 39-68.

Worthington, C. (1999). Empowerment evaluation: Understanding the theory behind the framework. The Canadian Journal of Program Evaluation, 14(1), 1-28.

Post-Doc Blogpost: Engaging Evaluation Stakeholders

When it comes to addressing the potential challenges of working to engage stakeholders in the evaluation process, I believe those limitations exist along a number of dimensions. Those dimensions include the organization, the program, and the program’s evaluation. I begin with the organization as regardless of the number of steps taken to factor needs, include stakeholders, and design great evaluations, there remains the potential for lackluster results if the organization is structured in such a way that stakeholders are prevented from working with you or sharing critical input. We have asserted that organizational goals are established through coalition behavior, we have done so on grounds that organizations are interdependent with task-environment elements and that organizational components are also independent with one another; unilateral action is not compatible with interdependence, and the pyramid headed by the single all-powerful individual has become a symbol of complex organization but through historical and misleading accident (Thompson, 2004, p. 132). I mentioned work by Thompson in the past when writing on the task environment, this time I mention Thompson when regarding the structure of organizations themselves. At times the very structure of an organization can limit the opportunity for evaluators to reach all pertinent stakeholders. In highly structured, greatly hierarchical organizations it may require nothing less than multi-level approvals to seek the input of one highly critical yet lower-level stakeholder. The potential inclusion strategy to correct for this, seek first to establish a matrix organization within the larger organization, formed for the sole purpose of the continued efforts of the program evaluation. Seeking the involvement of an interdepartmental, matrix team of members as a cross-section through many levels of the organization and representing as many functions/divisions as possible, negates much of the limitation brought on by normative command and control structures, while ensuring representation is realized from nearly every corner of the organization.

The program also presents certain limitations to stakeholder inclusion and therefore engagement. A common difficulty with information from these sources is a lack of the specificity and concreteness necessary to clearly identify specific outcomes measures; for the evaluator’s purposes, an outcomes description must indicate the pertinent characteristic, behavior, or condition that the program is expected to change (Rossi, Lipsey, & Freeman, 2004, p. 209). The very design of the program, and the dissemination of its intent, can provide a limited view of the program to evaluators. This limits the ability of stakeholder inclusion, as an under-representative description of the program will deliver an under-representative description of required stakeholders. Where this is the case the strategy to ensure stakeholder inclusion rests on garnering a better understanding of the program itself. The strategy has less to do with identifying individuals, and more about identifying pertinent processes and impacts, and only then can relevant individuals be identified and included relative to those processes.

A third and final consideration for stakeholder engagement is the design of the evaluation, as much of the limitation of research deals with the limitations embedded in its design. When it comes to the complexities of interviewing, remember that the more you plan by determining exactly what you want to know, the more efficiently you will get what you need; you don’t always need to script an interview around a set list of questions, but prepare so that you don’t question your source aimlessly (Booth, Colomb, & Williams, 2008, p. 82). Just as specifying the program further permits the evaluator to know more about the program which is under evaluation and therefore who to ask about the program, it is equally effective to consider exactly what you wish to know from stakeholders prior to engaging them amid the evaluation process. The difference between aimlessly questioning a stakeholder for an hour, and purposively questioning a stakeholder for ten minutes is engagement. The latter interview begets an engaged stakeholder, where the former begets a rambling dialogue rife with detractors and partial information alike. This in mind the final strategy for engaging stakeholders more fully is to arrive better prepared for the evaluation, and for each interview. While this strategy also has little to do with something done to a stakeholder, or how the stakeholder is selected, based on the literature it is found what is done with a stakeholder when you have one makes all the difference.

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed.). Chicago, IL: The University of Chicago Press.

Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A systematic approach (7th Ed.). Thousand Oaks, CA: Sage Publications, Inc.

Thompson, J. D. (2004). Organizations in action: Social science bases of administrative theory. New Brunswick, NJ: Transaction Publishers.

Post-Doc Blogpost: Machine Scoring & Student Performance

I believe one concept at the heart of considering the validity of a writing assessment via automated essay scoring (AES) as a measure of student performance in a graduate program is fidelity. Fidelity has to do with the degree to which the task, response mode, and what is actually scored matches the requirements found in the real world (Zane, 2009, p. 87). This includes consideration for such elements as context, structure, and the item’s parameters. What lends to granting the validity of such an assessment is the fact that much of what is expected of a writing assessment mirrors the college experience. From considerations surrounding grammar and sentence structure, to critical thinking and conceptual integration, an automated writing assessment permits for greater fidelity among an exercise which emulates the student experience in myriad ways. Where attention must be paid, however, regards the validity of objective scoring provided by an AES system.

Machine scores are based on a limited set of quantifiable features in an essay while human holistic scores are based on a broader set of features, including many, such as the logical consistency of an argument, which cannot yet be evaluated by a machine (Bridgeman, Trapani, & Attali, 2012, p. 28). The assessment itself, then, is was provides much of the fidelity to the student experience. What has yet to be developed is a holistic way of interpreting and subsequently scoring essay responses with the same level of depth and consideration for elements not contained within a designed algorithm as that with human raters. Human raters can probe creativity, innovation, and integrative thinking. Human raters can identify off-topic content just as an AES system would, yet rather than reducing the score by default as would be the design per AES, the human rater can form an individual opinion regarding off-topic content and its relevance to the response. Where one is asked about the concept of truth in such an environment as Accuplacer testing, the scoring mechanism might deduct for the use of vernacular proximal to art for example, to which the author refers as art and truth alike are subjective interpretations. What AES does well, and human raters do not, however, is score essay responses with an equal level of consistency across multiple raters and multiple ratings.

As AES models often formed by using more than two raters, studies that have evaluated interrater agreement have usually showed that the agreement coefficients between the computer and human raters is at least as high or higher than among human raters themselves (Shermis, Burstein, Higgins, & Zechner, 2010, p. 22). This gives pause to doubt cast on an AES systems’ ability to accurately and reliably score work. Yet this reliability does not necessarily denote validity as we have previously discussed. Thus, an environment where at least one AES score and one human rater score are considered in conjunction, presents the most promising synthesis of both approaches regarding a single assessment item and score. Further understanding of rater cognition is necessary to have a more thorough understanding of what is implied by the direct calibration and evaluation of AES against human scores and what, precisely, is represented in a human score for an essay (Ramineni & Williamson, 2013, p. 37). This in mind it is critical that those designing such assessments remain attentive to differences between scores among human raters, the difference in scores between machine and human rater, and any differences which exist in the AES model’s ability to measure student performance against a particular rubric. Should the rubric cover such concepts as creativity, an element to which AES is disadvantaged, this emphasis on content will be the driving force to inform the decision to use AES. Ultimately, it will depend on the context, grading criteria as explained by the rubric, and the opportunity for joint assessment via human rater which drive a decision of whether AES is a valid source of objective writing assessment.

Bridgeman, B., Trapani, C., & Attali, Y. (2012). Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country. Applied Measurement in Education, 25(1), 27–40.

Ramineni, C., & Williamson, D. M. (2013). Automated essay scoring: Psychometric guidelines and practices. Assessing Writing, 18(1), 25–39.

Shermis, M. D., Burstein, J., Higgins, D., & Zechner, K. (2010). Automated essay scoring: Writing assessment and instruction. In P. Peterson, E. Baker, and B. McGaw (Eds.), International encyclopedia of education (3rd ed.), 20–26. Oxford, UK: Elsevier.

Zane, T. W. (2009). Performance assessment design principles gleaned from constructivist learning theory (Part 2). TechTrends, 53(3), 86–94.

Post-Doc Blogpost: Validity & Reliability in Performing Assessments

Designing a question for an instrument is designing a measure, an answer given to a question is of no intrinsic interest, and the answer is valuable only to the extent that it can be shown to have a predictable relationship to facts or subjective states that are of interest (Fowler, 2009, p. 87). Sweeping satisfaction questions such as with one’s job or degree program are, inherently, of little value to an assessment in their extant form. They contain limited value as there are a number of differing subjective states which might be experienced by a student throughout his/her degree program as an example. A student can find great value in his/her first courses, or potentially an entire year, yet subsequently find little value in those courses which remain. This alone would indicate the item’s inability to measure these changes in a student’s state, and would therefore not exhibit a predictable relationship to the state of interest. An objective test item is defined as one for which the scoring rules are so exhaustive and specific that they do not allow scorers to make subjective inferences or judgments (Murayama, 2012, para. 1). Requiring students to infer what period in time to which the item refers constitutes subjectivity, and negates the item’s ability to deliver anything other than a highly subjective, highly summative, point of view.

Many teachers believe that they need strong measurement skills, and report that they are confident in their ability to produce valid and reliable tests (Frey, Petersen, Edwards, Pedrotti, & Peyton, 2005, p. 2). Yet this contention remains at-issue, both as standards for establishing validity remain disparate and interspersed throughout the literature on item-writing, and as the research also shows limited assessment training as required curriculum among teaching certification programs. What is then needed to determine whether items possess required validity, are standards for the validity of each assessment item. Of an identified 40 different item-writing rules, each falls into one or more of few categories, including potentially confusing wording or ambiguous requirements, guessing, rules addressing test-taking efficiency, and rules designed to control testwiseness (Frey, Petersen, Edwards, Pedrotti, & Peyton, 2005, p. 4). Each category includes a number of item-writing rules all intended to address differing concerns for validity. Potentially confusing wording or ambiguous requirements is a category which speaks to a confidence of whether every respondent will understand a question the same way. Guessing in this instances refers to the exclusion of responses where respondents simply chose a correct answer by chance, and therefore the probability of this occurring must be reduced. Rules addressing test-taking efficiency have to do with designing items in such a way that their structure does not impede, their form is simple, completing each is brief, and options are made clear. Finally, regarding rules designed to control testwiseness, this refers to designing items so (to the largest extent possible) items are answered using only knowledge, ability, or a combination of the two, rather than identifying patterns or other unintended characteristics of an item which may lead respondents to accidentally identify a correct answer without knowing why it is the correct answer. In order to infuse greater validity into the item discussed above, considerations for all four categories are prudent. Yet tantamount to many is to alter the item in such a way that ambiguous requirement is corrected, and an appropriate span of time is delineated of the contexts of the question.

Where validity deals with the relationship between each item and an area of interest, reliability deals with the relationship between each item and the consistency of results each time a measurement is taken. In discussing whether scores resulting from an item demonstrate reliability, look for whether the items’ responses are consistent across constructs, whether scores are stable over time when the instrument is administered a second time, and whether there is consistency among test administration and scoring (Creswell, 2009, p. 149). While researchers often address validity and reliability as separate considerations, I feel their interrelationship cannot be described strongly enough. Returning to the example item on program satisfaction above, if the validity of the measurement is compromised as in creating confusion among respondents for when and how much of the program it is intended to describe, this will then heighten the probability of inconsistent responses, which then directly threatens reliability. If one respondent can answer the same question multiple ways, and do so defensibly each time while regarding another aspect of the same context, we are now measuring the same condition multiple times, and arriving at multiple and quite different results. This is especially problematic with either a true/false or multiple choice item, as either presents a very limited list of potential responses. Altering response patterns among respondents based on poorly worded items leaves the reliability of the instrument in question, as with each subsequent administration it is entirely likely different responses among those available are selected, and the percentage of each item chosen (and therefore its description of a percent of a population assessed) is unreliable. It is only after the ambiguities inherent in the item’s wording are addressed, and consistent responses collected across multiple administrations, can this item begin to be described as either valid or reliable or both.

Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches (3rd ed.). Thousand Oaks, CA: Sage.

Fowler, F. J. (2009). Survey research methods (4th Ed.). Thousand Oaks, CA: Sage Publications, Inc.

Frey, B. B., Petersen, S., Edwards, L. M., Pedrotti, J. T., & Peyton, V. (2005). Item-writing rules: Collective wisdom. Teaching and Teacher Education, 21(4), 357–364.

Murayama, K. (2012). Objective test items. Retrieved December 24, 2013 from http://www.education.com/reference/article/objective-test-items/.

Post-Doc Blogpost: Reciprocal Recognition – Permitting Evaluation to Evoke Advocacy

I had the recent privilege of attending a presentation by Dr. Raymond Cheng, who spoke of the topic of degree equivalency across the world. This presentation covered such qualifiers as degree programs either being quick, cheap, or recognized, yet not any combination of more than two of these. The purpose of this emphasis is for learners and potential graduates to conduct an informed review of how equivalency is regarded when particularly looking at a given degree and its sister recognitions in other parts of the world. Where this becomes meaningful for program evaluation, is when we look beyond the confines of either the evaluation of process outcomes or program outcomes, and look at the issue of how program evaluation is designed to either prevent, or permit, evaluator as advocate for the learners served by these programs.

To describe our ultimate intent Dane (2011) remarks, “Evaluation involves the use of behavioral research methods to assess the conceptualization, design, implementation, and utility of intervention programs. In order to be effectively evaluated, a program should have specific procedures and goals, although formative evaluation can be used to develop them. Summative evaluations deal with program outcomes” (p. 314). At this level Dane permits our understanding that while many interests can converge upon a program’s process and programmatic outcomes, it will be its goals which drive what of the program is evaluated, and to what end. Do we wish to evaluate whether equivalency is a primary goal of a given degree program? Are we instead concerned with the political factors affecting the equivalency evaluation process? These and other considerations are seen as potential intervening variables amid the process of evaluating a program’s efficacy. Yet it may just be that student advocacy is the goal, not an outgrowth of the goal. We return to Dane (2011) to summarize, “Because the researcher is probably the most fully informed about the results, the researcher may be called upon to make policy recommendations. Personal interests may lead a researcher to adopt an advocate role. Although policy recommendations and advocacy are not unethical themselves, care must be taken to separate them from the results” (p. 314).

Evaluations should employ technically adequate designs and analyses that are appropriate for the evaluation purposes (Yarbrough et al., 2011, p. 201). While many of us understand this point anecdotally, combining this concept with advocacy allows us to then understand that it may not be simply a program’s learning outcomes which are the greatest goal. What of the international student who wishes to complete an MBA in the US, have that degree recognized as an MBA in Hong Kong, such that he/she may make a global impact as either consultant or scholar? If the extant evaluation process does not take this goal into account, mere learning outcomes centered on financial analysis, strategic planning, or marketing management as an MBA program would include could potentially be for not when considering the long-term prospects of this student in specific. We, then, as program evaluation personnel are given a critical task when in the design phase, as well as the formative evaluation phase, determining whether the stated goals of a program are in alignment with the goals of those who stand to benefit from that program.

Several factors can influence the role of an evaluator, including the purpose of the evaluation, stakeholders’ information needs, the evaluator’s epistemological preferences, and the evaluation approach used (Fleischer & Christie, 2009, p. 160). This conclusion in mind we see the inherent design equation is only complicated by the involvement of the evaluator him/herself. How this person regards the creation of new knowledge is among the considerations given among the inherent design process. Yet where concepts such as reciprocal recognition, evaluator epistemology, negotiated purposes, and defensible evaluation design converge, is upon the goals established not just by the program’s administrators alone, but the stated goals inclusive of those established by those who stand to benefit most from a program’s existence. Thank you for this reminder, and apt coverage of this topic Dr. Cheng.

Dane, F. C. (2011). Evaluating research: Methodology for people who need to read research. Thousand Oaks, CA: Sage Publications, Inc.

Fleischer, D. N., & Christie, C. A. (2009). Evaluation use: Results from a survey of U.S. American Evaluation Association members. American Journal of Evaluation, 30(2), 158–175.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.