Metropolitan State Faculty Federation
Student Ratings of Instruction and Faculty Evaluation: A Review of Recent Research
As part of our ongoing efforts to support MSU Denver faculty, MSFF has compiled the following material summarizing recent research on the use of SRIs for faculty evaluation.
Our position is that both quantitative and qualitative elements of the SRI process are biased, invalid, and unreliable and should not be used for summative evaluation of faculty. At best, the SRIs have some formative value if used by faculty for reflection and course improvement. However, research on SRIs demonstrates that they fail to correlate with teaching outcomes and have an ongoing discriminatory impact on women, faculty of color, ESL faculty, faculty in STEM disciplines, and those who are perceived as gender non-conforming.
We encourage the entire MSU Denver community to reconsider the use of SRIs as an element of faculty evaluation for all levels of faculty and encourage you to forward this discussion to other faculty in your department.
Student Evaluations and Comments
MSFF has long held the position that this institution’s reliance on SRIs for summative evaluation is deeply problematic.
It is well established in the research literature that both the quantitative scores and student comments in SRIs systematically discriminate against women and people of color. Research estimates that, with regards to gender, women suffer an SRI penalty of up to 0.5 points on a 5.0 scale as compared to their male colleagues (MacNell and Driscoll 2015). Even well-designed SRIs create unfair outcomes. Esarey and Valdes (2020) recommend that SRIs be adjusted via regression analysis to correct for non-instructional factors before the scores are used for any purpose. Research also suggests that SRI scores are more reflective of bias against women and student grade expectations than any objective measure of teaching effectiveness, and that the gender bias alone explains why effective women instructors receive lower SRI scores than ineffective instructors (Boring, et al. 2016). Continued use of SRIs constitutes discriminatory practices.
Furthermore, the preponderance of research demonstrates that there is no correlation between teaching effectiveness and SRI scores (Uttle, White and Gonzalez 2017), that SRIs fail to reliably identify the better instructor as measured by student outcomes in pairwise comparisons, and that they should never be used in isolation from other measures of teacher effectiveness including teaching observations and peer reviews (Esarey and Valdez 2020), As reported by Insider Higher Education (27 February 2020), Esarey unequivocally states “unless the correlation between student ratings and teacher quality is ‘far, far stronger that even the most optimistic empirical research can support’ then common administrative uses of [student evaluations of teaching] ‘very frequently lead to incorrect decisions.’” Uttl, White and Gonzalez (2017) conclude “that institutions focused on student learning and career success may want to abandon [student evaluations of teaching] ratings as a measure of faculty’s teaching effectiveness” (22). At the very least, current research suggests that SRIs are a poor measure of teaching effectiveness.
Perhaps more disturbing, is a trend identified in the research that suggests that faculty “teach to” the SRI when SRI sores are attached to performance evaluation with serious implications for student achievement. When faculty see SRI scores as high stakes, as when promotion and tenure decisions turn on them, SRIs correlate with grades in a particular class. However, the same students who gave these instructors high evaluations performed worse in follow-on course achievement and demonstrated less “deep learning” (Carrell and West 2008), suggesting that the desire to maintain high SRI scores may actually hinder student learning skills and damage student achievement in subsequent classes.
The practice of comparing any particular faculty member’s score to the mean prefix score is similarly problematic. We have seen numerous cases where faculty have been evaluated as “needs improvement” based on a subset of SRIs below the department mean with no indication that reviewers conducted even basic statistical analysis to determine the spread of the underlying data and to locate any particular score within that spread (Stark 2014).
Finally, the preponderance of research on the use of SRI’s strongly recommends against using SRIs as a summative evaluation tool and argues that they should never be used in isolation from other mechanisms for assessing teaching. In 2013-2014, MSU Denver assembled a task force to evaluate the use of SRIs, particularly student comments, for summative evaluation. Union members on that task force reviewed the literature at that time and concluded that the use of SRIs for summative evaluation was not supported by research. The task force subsequently rejected the consensus of peer-reviewed research and endorsed the use of SRIs, leading to the filing of a minority report against those conclusions. Since then, the case against using SRIs for summative evaluation has gotten only stronger, leading some to conclude that not only is the practice ill-advised but that it may also have legal implications for the institution (Hornstein 2017).
References
Boring, Anne, Kellie Ottoboni, and Philip B. Stark. 2016. “Student evaluations of teaching (mostly) do not measure teaching effectiveness.” Science Open Research. https://www.scienceopen.com/document?vid=818d8ec0-5908-47d8-86b4-5dc38f04b23e
Carrell, Scott E. and James E. West. 2010. “Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors” Journal of Political Economy 118(3).
Esarey, Justin and Natalie Valdes. 2020. “Unbiased, reliable, and valid student evaluations can still be unfair.” Assessment & Evaluation in Higher Education. DOI: 10.1080/02602938.2020.1724875.
Flaherty, Colleen. 2016. “New study could be another nail in the coffin for the validity of student evaluations in teaching.” Inside Higher Ed., 21 September.
—–. 2018. “Study says students rate men more highly than women even when they are teaching identical courses.” Inside Higher Ed., 14 March.
—–. 2019. “Fighting gender bias in student evaluations of teaching, and tenure’s effect on instruction. Inside Higher Ed., 20 May.
—–. 2020. “Study: Student evaluations of teaching are deeply flawed.” Inside Higher Ed., 27 February.
Hornstein, Henry A. 2017. “Student evaluations of teaching are an inadequate assessment tool for evaluating faculty performance.” Cogent Education 4(1).
MacNell Lillian, Adam Driscoll and Andrea N. Hunt. 2015. “What’s in a Name: Exposing Gender Bias in Student Ratings of Teaching.” Innovative Higher Education 40: 291-303.
Uttl, Bob, Carmela A. White and Daniela Wong Gonzalez. 2017. “Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related.” Studies om Educational Evaluation 54: 22-42.
Wallace, Sheri L., Angela K. Lewis and Marcus D. Allen. 2019. “The State of Literature on Student Evaluations of Teaching and an Exploratory Analysis of Written Comments: Who Benefits Most?” College Teaching 67(1): 1-14.