Post-Doc Blogpost: Validity & Reliability in Performing Assessments

Designing a question for an instrument is designing a measure, an answer given to a question is of no intrinsic interest, and the answer is valuable only to the extent that it can be shown to have a predictable relationship to facts or subjective states that are of interest (Fowler, 2009, p. 87). Sweeping satisfaction questions such as with one’s job or degree program are, inherently, of little value to an assessment in their extant form. They contain limited value as there are a number of differing subjective states which might be experienced by a student throughout his/her degree program as an example. A student can find great value in his/her first courses, or potentially an entire year, yet subsequently find little value in those courses which remain. This alone would indicate the item’s inability to measure these changes in a student’s state, and would therefore not exhibit a predictable relationship to the state of interest. An objective test item is defined as one for which the scoring rules are so exhaustive and specific that they do not allow scorers to make subjective inferences or judgments (Murayama, 2012, para. 1). Requiring students to infer what period in time to which the item refers constitutes subjectivity, and negates the item’s ability to deliver anything other than a highly subjective, highly summative, point of view.

Many teachers believe that they need strong measurement skills, and report that they are confident in their ability to produce valid and reliable tests (Frey, Petersen, Edwards, Pedrotti, & Peyton, 2005, p. 2). Yet this contention remains at-issue, both as standards for establishing validity remain disparate and interspersed throughout the literature on item-writing, and as the research also shows limited assessment training as required curriculum among teaching certification programs. What is then needed to determine whether items possess required validity, are standards for the validity of each assessment item. Of an identified 40 different item-writing rules, each falls into one or more of few categories, including potentially confusing wording or ambiguous requirements, guessing, rules addressing test-taking efficiency, and rules designed to control testwiseness (Frey, Petersen, Edwards, Pedrotti, & Peyton, 2005, p. 4). Each category includes a number of item-writing rules all intended to address differing concerns for validity. Potentially confusing wording or ambiguous requirements is a category which speaks to a confidence of whether every respondent will understand a question the same way. Guessing in this instances refers to the exclusion of responses where respondents simply chose a correct answer by chance, and therefore the probability of this occurring must be reduced. Rules addressing test-taking efficiency have to do with designing items in such a way that their structure does not impede, their form is simple, completing each is brief, and options are made clear. Finally, regarding rules designed to control testwiseness, this refers to designing items so (to the largest extent possible) items are answered using only knowledge, ability, or a combination of the two, rather than identifying patterns or other unintended characteristics of an item which may lead respondents to accidentally identify a correct answer without knowing why it is the correct answer. In order to infuse greater validity into the item discussed above, considerations for all four categories are prudent. Yet tantamount to many is to alter the item in such a way that ambiguous requirement is corrected, and an appropriate span of time is delineated of the contexts of the question.

Where validity deals with the relationship between each item and an area of interest, reliability deals with the relationship between each item and the consistency of results each time a measurement is taken. In discussing whether scores resulting from an item demonstrate reliability, look for whether the items’ responses are consistent across constructs, whether scores are stable over time when the instrument is administered a second time, and whether there is consistency among test administration and scoring (Creswell, 2009, p. 149). While researchers often address validity and reliability as separate considerations, I feel their interrelationship cannot be described strongly enough. Returning to the example item on program satisfaction above, if the validity of the measurement is compromised as in creating confusion among respondents for when and how much of the program it is intended to describe, this will then heighten the probability of inconsistent responses, which then directly threatens reliability. If one respondent can answer the same question multiple ways, and do so defensibly each time while regarding another aspect of the same context, we are now measuring the same condition multiple times, and arriving at multiple and quite different results. This is especially problematic with either a true/false or multiple choice item, as either presents a very limited list of potential responses. Altering response patterns among respondents based on poorly worded items leaves the reliability of the instrument in question, as with each subsequent administration it is entirely likely different responses among those available are selected, and the percentage of each item chosen (and therefore its description of a percent of a population assessed) is unreliable. It is only after the ambiguities inherent in the item’s wording are addressed, and consistent responses collected across multiple administrations, can this item begin to be described as either valid or reliable or both.

Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches (3rd ed.). Thousand Oaks, CA: Sage.

Fowler, F. J. (2009). Survey research methods (4th Ed.). Thousand Oaks, CA: Sage Publications, Inc.

Frey, B. B., Petersen, S., Edwards, L. M., Pedrotti, J. T., & Peyton, V. (2005). Item-writing rules: Collective wisdom. Teaching and Teacher Education, 21(4), 357–364.

Murayama, K. (2012). Objective test items. Retrieved December 24, 2013 from http://www.education.com/reference/article/objective-test-items/.

Advertisement

Post-Doc Blogpost: Reciprocal Recognition – Permitting Evaluation to Evoke Advocacy

I had the recent privilege of attending a presentation by Dr. Raymond Cheng, who spoke of the topic of degree equivalency across the world. This presentation covered such qualifiers as degree programs either being quick, cheap, or recognized, yet not any combination of more than two of these. The purpose of this emphasis is for learners and potential graduates to conduct an informed review of how equivalency is regarded when particularly looking at a given degree and its sister recognitions in other parts of the world. Where this becomes meaningful for program evaluation, is when we look beyond the confines of either the evaluation of process outcomes or program outcomes, and look at the issue of how program evaluation is designed to either prevent, or permit, evaluator as advocate for the learners served by these programs.

To describe our ultimate intent Dane (2011) remarks, “Evaluation involves the use of behavioral research methods to assess the conceptualization, design, implementation, and utility of intervention programs. In order to be effectively evaluated, a program should have specific procedures and goals, although formative evaluation can be used to develop them. Summative evaluations deal with program outcomes” (p. 314). At this level Dane permits our understanding that while many interests can converge upon a program’s process and programmatic outcomes, it will be its goals which drive what of the program is evaluated, and to what end. Do we wish to evaluate whether equivalency is a primary goal of a given degree program? Are we instead concerned with the political factors affecting the equivalency evaluation process? These and other considerations are seen as potential intervening variables amid the process of evaluating a program’s efficacy. Yet it may just be that student advocacy is the goal, not an outgrowth of the goal. We return to Dane (2011) to summarize, “Because the researcher is probably the most fully informed about the results, the researcher may be called upon to make policy recommendations. Personal interests may lead a researcher to adopt an advocate role. Although policy recommendations and advocacy are not unethical themselves, care must be taken to separate them from the results” (p. 314).

Evaluations should employ technically adequate designs and analyses that are appropriate for the evaluation purposes (Yarbrough et al., 2011, p. 201). While many of us understand this point anecdotally, combining this concept with advocacy allows us to then understand that it may not be simply a program’s learning outcomes which are the greatest goal. What of the international student who wishes to complete an MBA in the US, have that degree recognized as an MBA in Hong Kong, such that he/she may make a global impact as either consultant or scholar? If the extant evaluation process does not take this goal into account, mere learning outcomes centered on financial analysis, strategic planning, or marketing management as an MBA program would include could potentially be for not when considering the long-term prospects of this student in specific. We, then, as program evaluation personnel are given a critical task when in the design phase, as well as the formative evaluation phase, determining whether the stated goals of a program are in alignment with the goals of those who stand to benefit from that program.

Several factors can influence the role of an evaluator, including the purpose of the evaluation, stakeholders’ information needs, the evaluator’s epistemological preferences, and the evaluation approach used (Fleischer & Christie, 2009, p. 160). This conclusion in mind we see the inherent design equation is only complicated by the involvement of the evaluator him/herself. How this person regards the creation of new knowledge is among the considerations given among the inherent design process. Yet where concepts such as reciprocal recognition, evaluator epistemology, negotiated purposes, and defensible evaluation design converge, is upon the goals established not just by the program’s administrators alone, but the stated goals inclusive of those established by those who stand to benefit most from a program’s existence. Thank you for this reminder, and apt coverage of this topic Dr. Cheng.

Dane, F. C. (2011). Evaluating research: Methodology for people who need to read research. Thousand Oaks, CA: Sage Publications, Inc.

Fleischer, D. N., & Christie, C. A. (2009). Evaluation use: Results from a survey of U.S. American Evaluation Association members. American Journal of Evaluation, 30(2), 158–175.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Post-Doc Blogpost: Issue Polarization & Evaluator Credibility

On the topic of ideology and polarization Contandriopoulos & Brousselle (2012) note, “Converging theoretical and empirical data on knowledge use suggest that, when a user’s understanding of the implications of a given piece of information runs contrary to his or her opinions or preferences, this information will be ignored, contradicted, or, at the very least, subjected to strong skepticism and low use” (p. 63). Program evaluation, as with any other form of research and analysis, must be evaluated in context.  Yet context is not simply defined as the setting of the evaluation, nor the intent of the evaluation alone.  Instead, context must also include consideration for the evaluation’s design, and the very credibility of the evaluator him/herself as well. Evaluations should be conducted by qualified people who establish and maintain credibility in the evaluation context (Yarbrough et al., 2011, p. 15). This points to the need to not only ensure an audience capable of reception of the ideas/findings brought forth by the evaluation, yet to the equally necessary inclusion of evaluator’s capability of preserving the credibility of the study by purporting their own professional credibility as well.

An example of this in action was at a program evaluation session as part of the Orange County Alliance for Community Health Research last year.  This event, presented at UC Irvine, included a three hour presentation on program evaluation delivered by Michelle Berelowitz, MSW (UC Irvine, 2012). MS Berelowitz spoke at length on the broader purpose of program evaluation, the process for designing and conducting program evaluation, and the potential applications of program evaluation. This event was attended by a multitude of program directors, and other leaders of health and human services agencies in proximity to the university, intending to both learn of this process and to network with other agencies as well. Where polarization was introduced, and therefore the first instance of calling evaluator credibility into question, was during the introduction of MS Berelowitz’ presentation.  She, in very plain language, asked the audience who among them was motivated when it came time to perform evaluations of their programs each year.  This question was posed, to which none replied as being motivated, and a general consensus of disregard for the annualized process instead loomed. This calls the evaluator’s credibility into question, as the process itself is only as valuable as it is perceived by its audience, and program evaluation is only meaningful, when it can impact decisions and affect change.

If, during a presentation intended to inform others of the very merits of this program evaluation process the evaluator’s credibility is called into question, strategies must be enacted to counteract this stifling critique and inattention to the process’ value. To briefly return to the value in identifying and addressing polarization among stakeholders Contandriopoulos & Brousselle (2012) remark, “as the level of consensus among participants drops, polarization increases and the potential for resolving differences through rational arguments diminishes as debates tend toward a political form wherein the goal is not so much to convince the other as to impose one’s opinion” (p. 63).  Thus, in a room where a presentation on the merits of program evaluation is to be received with tepid acceptance, the evaluator holds the responsibility to convey the process in a way which fosters consensus, and restores credibility to the process.

One means of establishing greater evaluator credibility is by ensuring inclusion. This remains of no surprise as much of the literature regarding program evaluation centers upon a focus on stakeholder inclusion.  Yet to specifically address how this relates to evaluator credibility Yarbrough et al. (2011) write, “Build good working relationships, and listen, observe, and clarify. Making better communication a priority during stakeholder interactions can reduce anxiety and make the evaluation processes and activities more cooperative” (p. 18).  This was masterfully exercised by MS Berelowitz, as throughout the presentation she was found to be engaging, she drew insights from multiple attendees of the presentation, she worked to incorporate many of the attendees own issues into the presentation’s material, and she was thoughtfully respondent to attendee questions and further paradigm inquiry.

Another of the methods by which evaluator credibility can be restored, is in ensuring the design of the research is one where the audience can be receptive of the work performed. On this topic Creswell (2009) writes, “In planning a research project, researchers need to identify whether they will employ a qualitative, quantitative, or mixed methods design. This design is based on bringing together a worldview or assumptions about research, the specific strategies of inquiry, and research methods” (p. 20). Yet these conclusions impact not only the design of the research itself – and by extension design of program evaluation – yet are considerations which impact an evaluator’s ability to design meaningful research which conveys information according to long established assumptions about research. In this instance, MS Berelowitz conveyed a presentation on program evaluation which was deeply supported by the extant literature, was a presentation which purported her worldview and assumptions on research and thus program evaluation quite clearly, and was delivered in such a way that attendees were permitted to witness both the technical and practical merits of navigating the program evaluation process in the way presented. In both ensuring the inclusion of stakeholders, and ensuring worldview, inherent assumptions, and defensible design, the presentation was ultimately a success, and one where attendees left conveying motivation of the program evaluation process ahead.

Contandriopoulos, D., & Brousselle, A. (2012). Evaluation models and evaluation use. Evaluation, 18(1), 61–77.

Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches. Thousand Oaks, CA: Sage Publications, Inc.

UC Irvine. (2012). Program evaluation. Retrieved October 22, 2013 from http://www.youtube.com/watch?v=XD-FVzeQ6NM.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Post-Doc Blogpost: On Explicit Evaluation Reasoning

Evaluation reasoning leading from information and analyses to findings, interpretations, conclusions, and judgments should be clearly and completely documented (Yarbrough et al., 2011, p. 209). This standard arises not solely for the purpose of ensuring one’s conclusions are logical, rather this standard additionally emerges to function as both a final filter and an ultimate synthesizer of the results of all other accuracy standards. A7 – Explicit Evaluation Reasoning as per The Program Evaluation Standards serves to make known the efficacy of the process by which conclusions are reached. Said of this standard Yarbrough et al. (2011) continue, “If the descriptions of the program from our stakeholders are adequately representative and truthful, and if we have collected adequate descriptions from all important subgroups (have sufficient scope), then we can conclude that our documentation is (more) likely to portray the program accurately” (p. 209).  This level of holism leaves us with a critical imperative, to serve the program we are evaluating well, and to serve the negotiated purposes of the evaluation to their utmost.

Said of the need to ensure clarity, logic, and transparency of one’s process Booth, Colomb, & Williams (2008) elucidate, “[Research] is a profoundly social activity that connects you both to those who will use your research and to those who might benefit – or suffer – from that use” (p. 273). We then have a responsibility as evaluators and as researchers, to conduct ourselves and to document our process explicitly.  Doing so preserves such attributes tantamount to quality research as reproducibility, generalizability, and transferability. Yet there are also more specific considerations at-play. On the topic of this standard’s importance to current/future professional practice, we use the example of an extant job posting for a Program Evaluator with the State of Connecticut Department of Education. The description for this position includes the following, “A program evaluation, measurement, and assessment expert is sought to work with a team of professionals developing accountability measures for educator preparation program approval. Key responsibilities will include the development of quantitative and qualitative outcome measures, including performance-based assessments and feedback surveys, and the establishment and management of key databases for annual reporting purposes” (AEA Career, n.d., para. 2). This position covers a wide range of AEA responsibilities, and makes clear from only the second paragraph the sheer scope of responsibility under this position.  And while the required qualifications include mention of expertise in program evaluation, qualitative and quantitative data analyses, as well as research methods, it more importantly concludes with mention of the need to ‘develop and maintain cooperative working relationships’ and demonstrate skill in working ‘collaboratively and cooperatively with internal colleagues and external stakeholders’. What is required, then, is not solely a researcher with broad technical expertise, nor simply a methodologist with program evaluation background, but instead a member of the research community who can deliver on the palpable need to produce defensible conclusions from explicit reasoning in a way which connects with a broad audience of users and stakeholders.

Explicit reasoning, expressed in a way digestible by readers, defensible to colleagues, and actionable by program participants, requires the researcher be comfortable with where he/she is positioned in relation to the research itself when communicating both process and results.  This is also known among as the literature as positionality. Andres (2012) speaks of this in saying, “This positionality usually involves identifying your many selves that are relevant to the research on dimensions such as gender, sexual orientation, race/ethnicity, education attainment, occupation, parental status, and work and life experience” (p. 18). And yet why so many admissions solely for the purpose locating one’s self among the research? Because positionality has as much to do with the researcher, as it does the researcher’s position and its impact on program evaluation outcomes. An example of this need for clarity comes to us from critical action research.  Kemmis & McTaggart (2005) describe, “Critical action research is strongly represented in the literatures of educational action research, and there it emerges from dissatisfaction with classroom action research that typically does not take a broad view of the role of the relationship between education and social change… It has a strong commitment to participation as well as to the social analyses in the critical social science tradition that reveal the disempowerment and injustice created in industrialized societies” (p. 561). This in mind, it stands to reason that one can only be successful in such a position, if the researcher him/herself is made clear, his/her position to the research is clear, his/her stance on justice as only one example is considered, the process by which the research is conducted is clear, and how this person in relation to this research then renders subsequent judgment on data collected.  For this Program Evaluator role, just as many others like it, must be permitted to serve as both researcher and advocate, exercising objective candor throughout.

American Evaluation Association. (n.d.). Career. Retrieved October 9, 2013 from http://www.eval.org/p/cm/ld/fid=113.

Andres, L. (2012). Designing & doing survey research. London, England: Sage Publications Ltd

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed.). Chicago, IL: The University of Chicago Press.

Kemmis, S. & McTaggart, R. (2005). Participatory action research. In Denzin, N. K. & Lincoln, Y.S., The sage handbook of qualitative research (3rd Ed.), (559-604). Thousand Oaks, CA: Sage Publications, Inc.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

Post-Doc Blogpost: Meaningful Products & Practical Procedures in Program Evaluation

Of the Program Evaluation Standards, standard U6 regards meaningful processes and products, whereas standard F2 is on practical procedures.  To begin, the definition of U6 as described by Yarbrough et al. (2011) includes, “Evaluations should construct activities, descriptions, and judgments in ways that encourage participants to rediscover, reinterpret, or revise their understandings and behaviors” (p. 51). The authors continue when discussing F2, “Evaluation procedures should be practical and responsive to the way the program operates” (p. 87). While U6 is among the utility standards, and F2 the feasibility standards, I very sincerely believe they share much of the same intent as it pertains to program evaluation.  They can be viewed as facets of a singular whole, where U6 on meaningful products and practical procedures discusses the real need for evaluation audiences to have the ability to not only interpret what findings are shared, yet be able to make positive change from those results as well.  F2 reflects pointedly regarding respecting the program’s existing operations to the point of requesting that evaluators act in a way practical when in comparison to what is already in-place.  In their combined essence, U6 asks for findings that mean something to audiences, and F2 asks those findings take existing conditions into account.  Yet is this not already a fundamental requirement of any successful change initiative?

A related position in the field I am quite interested in is the work that institutional research (IR) teams perform.  Working simultaneously and directly with members of the Office of Institutional Research and Assessment from one university, and the Office of Assessment with another university, I have garnered a deep respect for the work they are collectively performing for their relative institutions.  As Howard, McLaughlin, & Knight (2012) define the profession, “two of the most widely accepted definitions are Joe Saupe’s (1990) notion of IR as decision support – a set of activities that provide support for institutional planning, policy formation, and decision making – and Cameron Fincher’s (1978) description of IR as organizational intelligence” (p. 22).  In both cases the focus is on data, and the use of data for an institution to know more about itself in the future than it knows in the present.  Whether this is in the form of performance metrics, operational efficiency analysis, forward-looking planning exercises, or simply the evaluation of data which has been better culled, processed, cleaned, and presented than in the past.  Yet amid each of these steps is the very real need to not only provide a value-added process which is practical, it also must speak to the existing system for its findings to warrant merit.  This is how we return to both U6 and F2, the standards of meaningful processes and products, as well as practical procedures.

Forging a palpable relationship between the evaluation process and relevant stakeholders including sponsors, evaluators, implementers, evaluation participants, and intended users is key. This permits greater understanding of the processes employed and products sought, and permits for greater buy-in when results are later shared.  Kaufman et al. (2006) presents a review of the evaluation plan used to review the outcomes of a family violence initiative for the purpose of promoting positive social change. On meaningful products and practical procedures Kaufman et al. (2006) remark, “Evaluations are most likely to be utilized if they are theory driven, emphasize stakeholder participation, employ multiple methods and have scientific rigor… In our work, we also place a strong emphasis on building evaluation capacity” (p. 191).  This evaluation, focusing first on creating a logical model for the purpose of articulating a highly-defined program concept, was used to then cascade data-driven decisions from the model constructed. With the combined efforts of project management, the program’s staff, and the evaluation team, an evaluation plan was crafted.  This plan permitted broad buy-in based on the involvement of many stakeholders. In the end, this also enabled the work of the evaluation team to continue in a less acrimonious environment and reemphasized for the team the importance of working in collaboration with key stakeholders from the beginning so that stakeholders bought into and supported the evaluation process (Kaufman et al., 2006, p. 195). The results of this collaborative, synthesizing process included greater stakeholder participation, a heightened level of rigor in addition to increased capacity, and most importantly for the program the use of ‘common measures’ across the program.

Yet increased stakeholder involvement is not the only benefit to ensuring meaningful processes & products as well as practical procedures. This heightened use of pragmatism among processes also lends to greater design efficacy. Said of a study of 209 PharmD students at the University of Arizona College of Pharmacy (UACOP), “Curriculum mapping is a consideration of when, how, and what is taught, as well as the assessment measures utilized to explain achievement of expected student learning outcomes” (Plaza, 2007, p. 1). This curriculum mapping exercise was intended to review the juxtaposition of the ‘designed curriculum’ versus the ‘delivered curriculum’ versus the ‘experienced curriculum’. The results of this study show great concordance among student and faculty perception, reinforcing not only sound program evaluation design to permit concordance, yet an effective program as well as measured by these graphical outcomes.  Equally said of the design aspects of pragmatism in program evaluation, Berlowitz et al. (2010) developed, “a system-wide approach to the evaluation of existing programs… This evaluation demonstrates the feasibility of a highly coordinated “whole of system” evaluation. Such an approach may ultimately contribute to the development of evidence-based policy” (p. 148). This study, and the rigorous data collection among existing datasets was not designed for the purpose of purporting a new means of gathering and aggregating data alike, simply a new method for taking advantage of the data already largely available while ensuring a broader yet more actionable resulting series of conclusions strong enough to further input on policy decisions.

Where the above considerations for the utility of meaningful processes and products, and the feasibility of practical procedures then comes together, is among its application in a position held in the office of institutional research.  Said of the role IR has in ensuring pragmatism in program evaluation, Howard et al. (2012) note, “Driven by the winds of accountability, accreditation, quality assurance, and competition, institutions of higher education throughout the world are making large investments in their analytical and research capacities” (p. 25). These investments remain critical to the sustainability of their investing institutions, and require that what is discovered in and among these offices is then instituted through the very departments and programs evaluated.  Institutional research is not a constituent which exists solely for the purpose of performance evaluation, nor do IPEDS or accreditation reporting responsibilities overshadow the need for actionable data among those programs under review.  Rather, we seek an environment where IR analysts and researchers are permitted to utilize practical processes for the purpose of ensuring greater stakeholder buy-in and effective design.

An environment where practical procedures are used is equally one where replicability, generalizability, and transferability all exist in a more stable ecosystem of program evaluation. Said of this need, in a 2 year follow-up study of needs and attitudes related to peer evaluation, DiVall et al. (2012) write, “All faculty members reported receiving a balance of positive and constructive feedback; 78% agreed that peer observation and evaluation gave them concrete suggestions for improving their teaching; and 89% felt that the benefits of peer observation and evaluation outweighed the effort of participating” (p. 1). These are not results achieved from processes where faculty found no direct application of the peer review program, these data were gathered of a program where a respect for the existing process was maintained and the product was seen as having more value than the time and effort required to participate as expressed in opportunity cost. Finally, using a portfolio evaluation tool which measured student achievement of a nursing program’s goals and objectives, Kear et al. (2007) state, “faculty reported that although students found writing the comprehensive self-assessment sometimes daunting, in the end, it was a rewarding experience to affirm their personal accomplishments and professional growth” (p. 113). This only further affirms the very real need for evaluators to continue to discover means of collecting, aggregating, and analyzing data that speak to existing processes.  This also further reinforces the need for program evaluations to result in a series of conclusions, or recommendations that make the greatest use of existing process, allowing for sweeping institutionalization as was seen with the Kear et al study.  Finally, said of this need for practicality among process and product within program evaluation and research as a whole, Booth, Colomb, & Williams, (2008) remark, “When you do research, you learn something that others don’t know. So when you report it, you must think of your reader as someone who doesn’t know it but needs to and yourself as someone who will give her reason to want to know it “(p. 18).

Berlowitz, D. J. & Graco, M. (2010). The development of a streamlined, coordinated and sustainable evaluation methodology for a diverse chronic disease management program. Australian Health Review, 34(2), 148-51. Retrieved from http://search.proquest.com/docview/366860672?accountid=14872

Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research (3rd Ed). Chicago, IL: The University of Chicago Press.

DiVall, M., Barr, J., Gonyeau, M., Matthews, S. J., Van Amburgh, J., Qualters, D., & Trujillo, J. (2012). Follow-up assessment of a faculty peer observation and evaluation program. American Journal of Pharmaceutical Education, 76(4), 1-61. Retrieved from http://search.proquest.com/docview/1160465084?accountid=14872

Howard, R.D., McLaughlin, G.W., & Knight, W.E. (2012). The handbook of institutional research. San Francisco, CA: John Wiley & Sons, Inc.

Kaufman, J. S., Crusto, C. A., Quan, M., Ross, E., Friedman, S. R., O’Reilly, K., & Call, S. (2006). Utilizing program evaluation as a strategy to promote community change: Evaluation of a comprehensive, community-based, family violence initiative. American Journal of Community Psychology, 38(3-4), 191-200. doi:http://dx.doi.org/10.1007/s10464-006-9086-8

Kear, M. & Bear, M. (2007). Using portfolio evaluation for program outcome assessment. Journal of Nursing Education, 46(3), 109-14. Retrieved from http://search.proquest.com/docview/203971441?accountid=14872

Plaza, C., Draugalis, J. R., Slack, M. K., Skrepnek, G. H., & Sauer, K. A. (2007). Curriculum mapping in program assessment and evaluation. American Journal of Pharmaceutical Education, 71(2), 1-20. Retrieved from http://search.proquest.com/docview/211259301?accountid=14872

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

The Juxtaposition of Social/Ethical Responsibility across Disciplines

As I approach full speed in the post-doctoral program, I equally approach my first opportunity to share publicly insights derived from this study of assessment, evaluation, and accountability. The American Evaluation Association (AEA) identifies its established social and ethical responsibilities of evaluators.  In juxtaposition, the social and ethical responsibilities of institutional research as an education-based area of interest, are expressed by the Association for Institutional Research (AIR).  Yet first, a personal introduction as requested by this assignment.  I have chosen institutional research as my professional education-based area of interest, as research and analysis have been at the heart of much of what I’ve done for the past decade or more.

Spanning a period easily covering ten years, I have straddled industry and academe for the purpose of not only remaining a lifelong learner but continuing to leverage what I take from each course and apply it as readily as possible to my working world in industry and in the classroom to the benefit of my employers and my students.  As mentioned elsewhere in my ‘about’ page, my work includes a multitude of projects focused on distilling a clear view of institutional effectiveness and program performance. Roles have included senior outcomes analyst, management analyst, operations analyst, assessor, and faculty member for organizations in industries ranging from higher education to hardware manufacturing and business intelligence.  With each position a new opportunity to assimilate new methods for assessing data.  With each new industry a new opportunity to learn a new language, adhere to new practices, and synthesize the combined/protracted experience that is the sum of their parts.  Yet in each instance I do not feel as though I remain with a steadfast understanding that I’ve learned more and therefore have less left to learn.  In each instance I instead feel as though I know and have experienced even less of what the world has to offer.  Focusing this indefinite thirst to speak to assessment and evaluation specifically, the task becomes pursuing ever-greater growth, and ever-greater success across a wide range of applications, industries, and instances, while equally remaining true to guiding principles which serve those who benefit from any lesson I learn or analysis I perform.

The Program Evaluation Standards intimate standards statements regarding propriety which include responsive and inclusive orientation, formal agreements, human rights and respect, clarity and fairness, transparency and disclosure, conflicts of interest, and fiscal responsibility.  At the heart of these Yarbrough, Shulha, Hopson, & Caruthers (2011) remark, “Ethics encompasses concerns about the rights, responsibilities, and behaviors of evaluators and evaluation stakeholders… All people have innate rights that should be respected and recognized” (p. 106).  This is then compared with a like-minded statement from the AEA directly in stating, “Evaluators have the responsibility to understand and respect differences among participants, such as differences in their culture, religion, gender, disability, age, sexual orientation and ethnicity, and to account for potential implications of these differences when planning, conducting, analyzing, and reporting evaluations” (Guiding Principles for Evaluators, n.d., para. 40).  Finally, in juxtaposition we have Howard, McLaughlin, & Knight with The Handbook of Institutional Research (2012) who write, “All employees should be treated fairly, the institutional research office and its function should be regularly evaluated, and all information and reports should be secure, accurate, and properly reported… The craft of institutional research should be upheld by a responsibility to the integrity of the profession” (p. 42). Thus, in the end, while this work had intended to explore a juxtaposition, the chosen word implies some paradoxical behavior at least to a slight degree, in actuality shows none of the sort.  Rather, we find congruence, and we find agreement.

It is important to uphold standards for the ethical behavior of evaluators, as the very profession is one steeped in a hard focus on data, and the answers data provide.  We as human beings, however, tend to this profession while flawed. We make mistakes, we miscalculate, we deviate from design, and we inadvertently insert bias into our findings.  None of this may be done on purpose, and certainly not all transgressions are present in every study.  The implication is there, though, that we can make mistakes and are indeed fallible.  At the same time we are of a profession which is tasked with identifying what is data and what is noise, what programs work and which curriculum does not, which survey shows desired outcomes and which employees are underperforming.  These are questions which beget our best efforts, our most scientific of endeavors, and our resolute of trajectories to identify only truths however scarce, amid the many opportunities to be tempted toward manufacturing alternate – albeit perhaps more beneficial – realities for we as evaluators and our stakeholders. All participants have rights, all evaluators have rights, and all sponsors have rights.  It is our task to serve in the best collective interest, using the best methods available to ensure a properly informed future.

American Evaluation Association. (n.d.). Guiding principles for evaluators. Retrieved September 11, 2013 from http://www.eval.org/p/cm/ld/fid=51

Howard, R.D., McLaughlin, G.W., & Knight, W.E. (2012). The handbook of institutional research. San Francisco, CA: Jossey-Bass.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards (3rd ed.). Thousand Oaks, CA: Sage Publications, Inc.

How Management Styles are Changing

The organization defined is a goal-directed, boundary-maintaining, and socially constructed system of human activity (Aldrich & Ruef, 1979).  In its essence, the construct of management is also met with a clear definition including the activities of planning, organizing, leading, and controlling.  When considering these concepts in juxtaposition, the foundations of management do not waver, and the activities associated at a conceptual level remain the same.  What has changed, however, is how each of these activities is carried out, for the sake of the sustainability of each goal-directed, boundary-maintaining system of activity.  How the activities of management are performed is a question of style, and style has indeed evolved.  This evolution in style – and the proliferation of additional, nuanced styles – is a result of a combination of advances in technology, organizational form, as well as shifts in the prevailing workforce demographic of each organization.

To say that technology has changed what management is would be giving credit where it is not due. Technology has instead changed the way management is performed, while its functions remain stable.  The prevailing literature, both academic and practitioner, now discuss advances in technology as shepherds of a global, diverse, tightly linked, and transparent collection of organizations, which are home to managers who use every affiliation necessary to further the progress of one’s guiding objectives.  Those objectives are no longer driven by a desire to simply control aggregated activity, as was the focus both during and immediately following the industrial revolution.  Objectives, and the style of today’s manager, are driven by purpose.  This has given way to such works as Hamel’s The Future of Management, Benko & Anderson’s The Corporate Lattice, Hallowell’s Shine, as well as Pascale, Sternin, & Sternin’s The Power of Positive Deviance to name a few.  While these texts describe very different facets of organizational life, they share the common thread of managers doing everything they can to identify what is working in an organization, how best practice can be both identified and spread throughout the organization, and place the focus on the potential of a workforce, rather than upon controlling its activities.  Management style has thus changed from choosing between varying levels of commanding/controlling resources, to instead choosing between varying levels of interaction with the value chain of an organization and the resources associated with that value chain.  In its essence, management style is now most impacted by considerations for epistasis, where the critical question is how the manager will choose to leverage his/her unique talents to influence the organization’s ecosystem.  Rather than ask simply what part of an organization he/she is responsible for, the manager now instead seeks knowledge of influence networks, as both organizational knowledge and responsibility are interspersed.

– Justin

Amid the Tumult, the Purposive Manager

Management as a practice can be seen as a combination of art, craft, and science, which take place on an information plane, a people plane, and an action plane (Mintzberg, 2009). A good manager, then, is someone who moves beyond the traditional confines of seeing one’s function as planning/organizing/leading/controlling each in isolation at a specific point in time, and instead sees managing as using all aspects of one’s intuition, training, and talents at once and in perpetuity.

Where managers are now described in the literature as operating in an environment wherein interruptions can be encountered up to every 48 seconds of the day, and the manager’s attention is thus piecemeal and scattered across multiple tasks as well as decisions in a single hour, it remains important that a “good manager” use this as a strength not as what defines their work. Rather than using the bustle of today’s business environment as an excuse for surface-level consideration of every decision encountered, it is instead an opportunity to convey consistency in message and purpose with every new decision. A day can be filled with hundreds of isolated decisions made at-a-glance, or they can all be made while guided by a single thread of focused purpose and attention to the direction he/she wishes to push their ecosystem within the manager’s given sphere of influence. If the manager wishes to develop a team guided by thoughtful analysis, each decision made can be an interruption prior to returning to this task, or it can be a way to substantiate this wish by emphasizing thoughtful analysis in each decision. A good manager thus uses technical skills to facilitate the professional and technical aspects of daily work, while also using those same technical skills to describe how best to support and influence the technical aspects of others’ work as well. Soft skills are equally important to ensure not only that influence is purported, yet the purpose and direction as the manger sees fitting is communicated within his/her network in a way which delivers a lasting impression.

An Operations Manager uses his technical knowledge of the business to drive operations, while also using this technical knowledge to guide others’ work within the scope of process variation, selection, and retention. His soft skills are important as the potential global, and very likely diverse workforce with which he works must be influenced and led, not directed and controlled alone. The Finance Manager must use her technical skills to drive the fiduciary sustainability of an organization and her team, but must also use her technical skills to seek an efficacious value chain to sustain the organization’s competitive advantage. Her soft skills thus are what provide the conduit for this process, and technical skills the information necessary for her network to later develop the tacit knowledge necessary for this to occur. A “good manager”, then, is aware of the organization’s ecosystem, his/her influence on this ecosystem, and will put to use all intuition, training, and talents present, to help others see how an organization’s value chain drives its purpose.

– Justin

When High Achievers and Low Achievers Work in the Same Group: An Article Review

Learning by Unlearning

In 2008, The British Psychological Society published an article entitled When High Achievers and Low Achievers Work in the Same Group: The Roles of Group Heterogeneity and Processes in Project-Based Learning.  As put by the authors Cheng, Shui-fong, & Chan (2008), “the present research investigated the roles of group heterogeneity and processes in project-based learning.”  This was done with the intent to further understand the effects of grouping in project based work, in order to determine how best to optimize both self efficacy and collective efficacy.  At the crux was the hypothesis of an interaction effect between student achievement and group processes on efficacy.  In order to test this hypothesis, variables ranging from group gender distribution, to group size, to group processes were reviewed; this in order to determine which had the most prominent positive relationship to self efficacy and collective efficacy.

Linear modeling was used to investigate the relationships between variables, and group members were categorized by low, average, and high achievers.  Yet, it was the individuality of the learner, when taken into account in the interpersonal learning environment, which gave the research team their richest data.  Many current understandings about learning provide strong support for classrooms that recognize, honor, and cultivate individuality (Tomlinson, 1999, p. 18).  This was the case when the team discovered their learning around groupings and project-based work led them to emphasize the interactions of learners, as opposed to the effects of grouping itself.  On average, findings about the effects of homogenous and heterogeneous groupings were varied and inconsistent across studies (Cheng, Shui-fong, & Chan, 2008, p. 3).  This was because the assumption that optimal grouping is as heterogeneous as possible across skill levels was simply not the focus when looking to continue to improve both self efficacy and collective efficacy.

Grouping for Excellence

The specifics of what is meant are the keys to incorporating these findings into the concepts surrounding differentiated instruction.  Group processes of high quality include at least four elements: positive interdependence, individual accountability, equal participation, and social skills (Cheng, Shui-fong, & Chan, 2008, p. 3).  These elements are those that lead to a foundation of solid interpersonal communication, group interaction, and – most notably – group cohesion.  While varying levels of skills, gender, and size are all taken into account in the differentiated classroom, what must be emphasized as per the research are the dynamics which shape the interaction among group members.  The brain learns best when it can come to understand by making its own sense out of information rather than when information is imposed on it (Tomlinson, 1999, p. 19).  Both homogenous and heterogeneous groupings come with their own strengths and weaknesses.  Where homogeneous groupings, on average, keep each level of achiever firmly planted in their preexisting classification, and heterogeneous groupings are decidedly not without the consequence of potentially stifling the high achievers for the gain of the improvement of low achievers; looking beyond the groupings alone allow for a differentiated classroom that can have communities of learning within that thrive.

The Rules of the Game

In order to bring this concept from the theoretical to the practical, the various elements discussed must be incorporated into the classroom.  Heterogeneity usually is a one-size-fits-all endeavor where the learning plan swallows some learners and pinches others (Tomlinson, 1999, p. 22).  So while the concept of heterogeneity is still favored to homogeneity, the aforementioned elements of high quality group process must be integrated.  Positive interdependence deals with ensuring that each group member is reliant on other members of the group to serve particular functions or roles in order to achieve the collective goal.  This can be incorporated into the classroom via assigning particular roles to members of a team, or to members of a class.  Individual accountability deals with maintaining a group grade, yet also ensuring that each team member is tasked with a particular portion of each project.  When considering both positive interdependence and individual accountability, these factors begin to allow the class to see beyond their differences, and instead focus on how they will work together to achieve individually assigned components of the work.

Equal participation deals with ensuring that those individually assigned components from the element of individual accountability are assigned in a way where they are equal among the group.  Social skills deals with the abilities of each team member to interact with the remaining group, and include traits such as communication, decision-making, trust-building, and conflict management (Cheng, Shui-fong, & Chan, 2008, p. 4).  When emphasizing these elements in the classroom, work that stems from equity among group members, in a group environment where collaboration reigns and conflict is managed to a minimum, the emphasis can then be on the content, process, and product by which the lesson is completed.

We need to begin our investigation of how to differentiate instruction for a diverse student population (Tomlinson, 1999, p. 24).  While the Tomlinson text discusses additional assumptions around differences in student characteristics, the article under review is to add that components of each student as a member of the group can therefore contribute to the success of the instruction’s differentiation as well.  When put into practice, group work that emphasizes positive interdependence, individual accountability, equal participation, and social skills can have the potential to create a learning environment where not only is self efficacy increased, yet collective efficacy is both emphasized and increased as well.

– Justin

Cheng, R., Shui-fong, L., & Chan, J. (2008). When high achievers and low achievers work in the same group: The roles of group heterogeneity and processes in project-based learning. British Journal of Educational Psychology, 78(2), 205-221.

Tomlinson, C.A. (1999). The differentiated classroom: Responding to the needs of all learners. ASCD. Alexandria, VA.