Managing internal nomination and peer review processes to reduce bias

Peer review in its many incarnations is essential to the functioning of the research enterprise, yet it is not without weaknesses. Various forms of bias, for example, threaten the integrity and effectiveness of peer review at journals, funding agencies, and even for programs within institutions.

Recent innovations and scholarly research on peer review can help program administrators, reviewers, and award committees improve practices and mitigate bias. The U-M Office of the Vice President for Research has prepared this guidance on strategies to reduce bias in peer review, with a focus on internal award nominations and funding programs. The intention of this information is not to make peer review an unnecessarily onerous task for any one particular group, but to help educate the research community and ensure internal review practices are equitable and transparent.

Designing calls for proposals or nominations

Publish review criteria at the time of the call, and design them explicitly for the opportunity. Avoid vague criteria like “merit” or “excellence” and identify key qualification metrics if they exist.
Review calls for proposals for stereotypes or other biased language (see below) that may implicitly discourage applicants from historically underrepresented or marginalized groups.
Ensure communications and materials are accessible and make no assumptions about the receiver of communications. See OVPR’s guidance for marketing and communications materials, and visit U-M’s Accessibility Quick Tips regarding ADA compliance. For general digital/IT accessibility inquiries, contact [email protected]. For print accessibility inquiries, contact the brand team ([email protected].)
Periodically review program descriptions to consider other potential application/nomination attributes or restrictions that may serve as deterrents to underrepresented group members (e.g., time since PhD requirements that may disproportionately affect women more likely to take time off for family care).

Avoiding biased language

Whether reviewing an award application or writing a nomination letter, it is important to avoid using biased language. Biased language can reinforce other implicit or explicit biases, and ultimately influence decisions around research awards and funding. This overview, based on other guidance documents (e.g., University of Arizona, Western Michigan University) provides tips for how to avoid biased language, common examples, and alternatives to consider in research-specific contexts.

Avoid unnecessary pronouns

Using gender identifying information when discussing a proposal or applicant does not typically add substantive information during peer review. For women or non-binary genders in particular, this could reinforce that they are in the minority of applicants and could reinforce biases held by other reviewers, committee chairs, or program administrators.

Example: Her record of past scholarship is impressive.
Alternative: The applicant’s record of past scholarship is impressive
Alternative: Dr. Robinson’s record of past scholarship is impressive.

Use preferred pronouns when known and/or necessary

When necessary to discuss or evaluate an applicant, and when known to the evaluator and/or committee, use a person’s preferred personal pronoun of he, she, they, or ze. If unknown, avoid making an assumption about preferred gender or pronouns and use an inclusive substitute such as their instead of guessing or intentionally indicating uncertainty.

Example: The PI has published extensively on this topic and his/her record of impact is outstanding.
Alternative: The PI has published extensively on this topic and their record of impact is outstanding.
Alternative: The PI has published extensively on this topic and has an outstanding record of impact.

Don’t inject unnecessary personal opinions

Including one’s personal opinions about the applicant, field or study, or other unrelated criteria can introduce or reinforce certain biases. This is especially true when they are personal opinions that are unsupported by useful information for the reader.

Example: The applicant proposes using xxx method, which they’ve used in the past but personally I’ve always had a hard time trusting.
Alternative: The applicant proposes using xxx method, which they have successfully used in the past; however, a new paper by Jackson et al highlights some serious drawbacks.

Be specific when warranted

When writing about certain groups, be as specific as possible. Without such detail or justification, readers can make inaccurate assumptions that may lead to biases.

Example: The study should be sure to include Asian students in the survey population.
Alternative: The study should be sure to include students who identify as Asian American in the survey population.
Alternative: Given the local demographics, the study should be sure to include students of Korean descent in the survey population.

Avoid gendered phrases

Although there are many commonly used phrases that are gendered (e.g., those that contain variants of man), suitable non-gendered alternatives almost always exist. Use this phrasing to be more inclusive and help avoid biases that may arise, whether conscious or unconscious.

Example: The chairman of the review committee recommended against funding the project.
Alternative: The chair of the review committee recommended against funding the project.

Stick to what’s relevant

Avoid commenting on aspects of an applicant’s C.V. or proposal that are irrelevant to the actual review criteria. For applicants or nominees that are underrepresented minorities, this can lead to biases that disproportionately favor majority groups.

Example: Both applicants are strong, but the first trained under a member of the National Academies and therefore has more potential in this field.
Alternative: Both applicants are strong, but the first has a stronger track record of publishing in areas in which the funding agency has a stated interest.

Avoid words or phrases with racist roots

Some commonly used phrases have racist roots, yet are so ingrained in the lexicon that a writer may not have recognized their origin or original meaning (e.g., “master” research agreements, uppity, peanut gallery, “black” mark). In nearly all cases, there are suitable alternatives to use that don’t have the potential to elicit or perpetuate bias. Federal agencies such as the National Institute of Standards and Technology have recently initiated similar efforts to avoid such terminology.

Example: I worry that the applicant has published in several blacklisted journals, which will affect how the foundation evaluates the impact of their work.
Alternative: I worry that the applicant has published in several predatory journals, which will affect how the foundation evaluates the impact of their work.

Actively soliciting proposals or nominations

Proactively seek a diverse applicant/nominee pool. Applicants are evaluated and judged most fairly when they make up a critical mass (at least 30%) of the pool. For example, if the pool has just a single woman out of 10 applicants, there will be a tendency for reviewers to unconsciously pay more attention to her gender. If that same pool had at least 3 applicants of the same gender, there would be less tendency to consciously or unconsciously assign gender to applicants.
To diversify the pool of applicants/nominees, broadly advertise opportunities to groups of underrepresented faculty [e.g., Association of Black Professionals, Faculty, Administrators and Staff (ABPAFS); Professional Latinos at U-M Alliance; INDIGO (LSA Asian and Asian American Faculty Alliance; ADVANCE Faculty Networks], share with any potentially relevant centers/institutes/departments, and use university-wide tools such as Research Commons.
Consider the roles that gender and racial socialization may play in self-promotion and self-nominations. Women and those in communities of color may be socialized to not engage in self-promotion, and there tend to be double standards such that members of underrepresented groups are more negatively evaluated for self-promotion and self-nomination than their majority counterparts.
Regularly review and discuss practices for building a diverse pool of applicants/nominees. Consider contributing factors that could influence nomination and application patterns, such as the history of the program with regard to gender and racial diversity in award outcomes, pipeline challenges that are impacting eligible scholars in the field, prior review process patterns (whether those from underrepresented groups tend to be eliminated at earlier or later review stages), whether the percentage of nominees/applicants relate to the proportion of awardees over past 5-10 years, among other considerations. If data are not available, begin to collect and examine data that will allow for examination of potential patterns and trends.
In cases where the call may limit the number of applicants/nominees per department/program/unit, consider allowing at least two applicants/nominees. Often this slight expansion encourages programs to put forward more applicants/nominees from non-majority groups.

Review panel structure and composition

Structure

Deciding between standing (regular use of the same reviewers, sometimes on a rotating basis) or ad hoc (new reviewers for each competition/award) review panels can influence the outcome of review processes in undesired ways. See the table for benefits/drawbacks of each approach:

Ad hoc panels

Standing Panels

Benefits

Can identify expert reviewers that can respond to specific requirements of call
Distributes opportunities for service across a greater number/diversity of faculty
Supports a changing diversity of perspectives and experiences that can enrichen evaluation process

Less administrative burden and training of reviewers
Consistency in evaluations
Committee members establish rapport over time that can support group engagement
Potentially more prestige for reviewers associated with serving on a standing panel, which could be especially important for junior faculty seeing promotion/tenure

Challenges

Additional administrative burden to find and train reviewers
May require thoughtful efforts to establish group rapport quickly to support group engagement

Limits opportunities for higher prestige service to a small network
Prone to establishing dominating roles on committee over time
Lack of turnover can lead to long-standing biases
Vulnerable to assumption that one-time training is enough to address/counter biases emerging over time

Ad hoc and standing panels can both operate effectively remotely, and don’t necessarily require the panel to meet synchronously at all. Enabling virtual panels and/or not requiring specific meeting times can encourage participation of a broader and more diverse review panel (e.g., those with more restrictive schedules due to caregiving responsibilities).
Take active steps to avoid conflicts of interest. (See OVPR COI policy here). It is difficult to avoid conflicts entirely when reviewing internally, and many relationships exist, but reviewers should recuse themselves if necessary when there is a mitigating factor (e.g., personal relationship, active collaboration, position of authority/management) that may prevent the reviewer from making a fair assessment.
Regardless of whether an ad hoc or standing panel is used, ensure whomever has final decision-making authority (e.g., committee chair, program director) is known to all reviewers ahead of time (and to applicants/nominees whenever possible).

Composition

Strive to create diverse review panels so that they incorporate a breadth of individuals representing different genders, races/ethnicities, disciplinary or methodological expertise, and career stage. Reviewers should represent as many aspects of constituent units involved in the funding/award program as possible.
Due to their relatively small numbers on campus, underrepresented group members often receive numerous service requests at the same time, so they may decline your invitation. Do not use this as justification for focusing only on majority group reviewers—be effortful in recruiting additional diverse reviewers. Consult with programs/caucuses of underrepresented groups (see examples above) to help identify appropriate and willing reviewers.
Including underrepresented groups in service activities such as review panels provides opportunities for forms of service often seen as prestigious. Such service can also be educative (e.g., committee members learn about the awards selection process) in ways that can support successful future attainment of these awards for them and/or among their networks.
Collect regular demographic information on reviewers such as gender, race/ethnicity, and unit. Programs need to know where their baseline is before they can identify areas to improve.
Publishing the membership of review panels, when possible, ensures transparency and helps mitigate against unconscious bias when reviewers know their identities will be known to applicants/nominees and the broader community. However, in other cases anonymizing reviewer identities (traditionally referred to as “blind” review, a term which can be considered abelist) may also be appropriate where power differentials exist and/or revealing reviewer identities may itself may result in retaliation or other unintended consequences.
Although some journals follow processes with multiple layers of anonymization in which neither author nor reviewer identities are revealed (sometimes called “double blind” review), this does not eliminate all forms of bias and is not an adequate replacement for training reviewers on implicit and explicit biases. Funding programs like NIH are also experimenting with implementing this model, although it is unclear whether this practice actually helps reduce bias for funding awards, and it may be prohibitively difficult for many internal programs where identifiable information is required (e.g., CVs).

Reviewer training and preparation

Provide explicit guidance and training for reviewers about bias and evaluation systems. Acknowledgment of the fact that all researchers carry biases can help normalize and set expectations, and help support people in thinking about biases in productive ways (e.g., bias awareness is a good thing as it can help one be an active part of improving the research community).
Reviewers should receive training materials (including but not limited to this document) before reviewing their assigned materials. Chairs/directors should explicitly have a discussion about bias and related issues if the panel is meeting synchronously. For standing panels, this should occur at regular intervals (e.g., annually).
In addition to training materials and discussion, other structures such as those described elsewhere in this document must be in place to raise awareness of behaviors and practices.
Advise reviewers to allow sufficient time to complete the review, and not agree to participate in a review if they cannot do so. Likewise, avoid multitasking when reviewing.
Reviewers should be familiar with award criteria prior to evaluating materials. Reflect on your own views and interpretations of the criteria to help ensure what you are assessing is how the content and features of the applications/nominations align with the call and not an applicant/nominee’s prior unrelated achievements, applicant/nominee mentors,, or academic pedigree.
Panel chairs should take steps to avoid “criteria shifting” after viewing and discussing nominees.
Reviewers should be instructed to focus on actual evidence of achievement as compared to potential. Reviewers can often unfairly expect underrepresented groups to have already done a task to show they can do it, where majority groups often are given the benefit of doubt.
Reviewers should seek to avoid gendered or biased language (see above). And be alert to such language in other support letters and how it may influence you as a reviewer.
Reviewers should be alerted to implicit bias, and reminded that they should aim to mitigate its influences when making recommendations and feedback. For reviewers unfamiliar with the concept, provide a working definition, and also make suggestions for tools to explore the concept, such as Project Implicit.
Reviewers should also be advised about explicit biases (e.g., reviewers may openly think that there are certain institutions that produce the best researchers while others do not, therefore biasing their assessment of an applicant/nominee).
In some cases (e.g., for limited submissions), reviewers need to be aware that the goals of the process are not to award funding or a prize, but to identify the most competitive nominee(s) from the University, while also supporting colleagues in providing constructive feedback for improvement of the work and projects under review.

Designing effective review criteria

Update scoring systems for reviewers as needed to be more equitable, prioritizing the funding opportunity instead of a blanket rubric. Some ranking systems can have a strong influence on the outcomes of decision-making, and may favor more conservative or more controversial ideas depending on how they are designed.
Improper weighting (or burying of criteria in larger buckets) can skew results. Avoid basing recommendations on summary or impact scores created by adding scores across rated criteria; use holistic and qualitative approaches that allow for scoring across criteria areas to help reviewers discuss the areas they evaluated as stronger or weaker but that also allow for a range of score profiles across those criteria to be viewed as meritorious. Refrain from asking for overall rankings, which can be subject to strong implicit biases, and instead assess the most important criteria as identified in the call.
Consider only asking for relevant information from applicants/nominees. For example, should including the academic pedigree of an applicant/nominee be considered as part of the assessment process? If not, would it be possible to only ask for relevant sections of a CV (e.g., publications, grants) without creating too much administrative burden for applicants/nominees?
Use standardized processes and tools to manage peer review processes. At U-M, we currently use InfoReady Review as one such tool, but there are several others in use across the institution.
Actively counter biases around types of topics or forms of knowledge that are viewed as high impact. For example, include (and do not systematically exclude) accomplishments related to DEI that are appropriate for specific awards. Awards for public impact of research should include value for diverse publics (e.g., from national and state policy/legislation to impactful engagement in marginalized communities).
Consider implementing “cross-review,” when time permits, for reviewers to have the opportunity to see and/or comment on other reviews before a final decision is made (see Science’s policy for example). Reviewers should be encouraged to comment on when they see other comments that may be out of scope or inappropriate.

Final decisions and assessment

Build in adequate time for decision-making. Biased decision-making is most likely under conditions when individuals have to make snap judgements. In contrast, implicit biases are more likely to be mitigated when there is thoughtful time for discussion and/or sharing of reviews. Also build in time to make sure all reviewers’ voices/inputs are heard, acknowledged, and considered.
Processes should avoid having a single decision maker. A group of at least two or more individuals should be involved in decisions to ensure checks and balances.
Consider having reviewers list top nominees before viewing/hearing others’ recommendations. This can help mitigate undue influences of individual members and help ensure the full committee’s list of top candidates is as broad as possible.
When numerical rankings are used, do not base decisions solely on absolute rankings—especially if/when they are within noise of the ranking scale. Consider binning, weighting criteria, or other types of standardized questions (e.g., yes/maybe/no) instead of or in addition to numerical scores.
Decisions should be made with consideration of which explicit biases may affect judgement (e.g., expectations of quid pro quo, or bias towards certain disciplines, fields, institutions, etc).
Administrators of the program should not edit reviews unless they are of an overly personal or ad hominem nature. In such cases, reviewers should not be asked to serve in the future.

Post-peer review

Administrators should share written comments of reviewers directly with applicants/nominees when possible. In cases where that is not possible, the chair or administrator should provide a summary of reviewer feedback. This ensures transparency and also serves as an opportunity for applicants to learn from and apply towards future applications or funding opportunities.
Collect metrics on winners/awardees to help evaluate whether there may be systemic biases that need to be confronted and addressed.
Publicly posting selections of award winners or nominees ensures accountability and transparency.

FAQs

Additional Resources

Best Practices for Peer Review from the Association of American University Presses
Diversity in Peer Review: Survey Results from the Council on Publication Ethics
Innovating in the Research Funding Process: Peer Review Alternatives and Adaptations from AcademyHealth
Reducing the Impact of Bias in the STEM Workforce from the National Science Foundation
Implicit Bias Resources from the National Institutes of Health
Resources and literature around faculty hiring processes from U-M’s ADVANCE Program

Further Scholarly Reading

Some of these publications may require a subscription to access. For members of the U-M community, refer to the U-M Library for assistance on accessing resources off-campus.

Andersson et al (2019) Implicit bias is strongest when assessing top candidates
Barnett et al (2017) Using democracy to award research funding: an observational study
Brezis and Birukou (2020) Arbitrariness in the peer review process
Bromham et al (2016) Interdisciplinary research has consistently lower funding success
Carter et al (2020) Developing & delivering effective anti-bias training: Challenges & recommendations
Day (2015) The big consequences of small biases: A simulation of peer review
Fang et al (2016) Research: NIH peer review percentile scores are poorly predictive of grant productivity
Ginther et al (2011) Race, ethnicity, and NIH research awards
Ginther and Heggeness (2020) Administrative discretion in scientific funding: Evidence from a prestigious postdoctoral training program
Guthrie et al (2018) What do we know about grant peer review in the health sciences?
Hoppe et al (2019) Topic choice contributes to the lower rate of NIH awards to African-American/black scientists
Kaatz et al (2015) Threats to objectivity in peer review: the case of gender
Lee et al (2012) Bias in peer review
Li and Agha (2015) Big names or big ideas: Do peer-review panels select the best science proposals?
MacKay et al (2017) Calibration with confidence: a principled method for panel assessment
Pier et al (2018) Low agreement among reviewers evaluating the same NIH grant applications
Pier et al (2019) Laughter and the chair: Social pressures influencing scoring during grant peer review meetings
Teplitskiy et al (2019) Do experts listen to other experts? Field experimental evidence from scientific peer review
Severin et al (2019) Gender and other potential biases in peer review: Analysis of 38,250 external peer review reports
Tricco et al (2017) Strategies to prevent or reduce gender bias in peer review of research grants: A rapid scoping review
Viner et al (2004) Institutionalized biases in the award of research grants: a preliminary analysis revisiting the principle of accumulative advantage
Witteman et al (2019) Are gender gaps due to evaluations of the applicant or the science? A natural experiment at a national funding agency

Questions?
If you have any comments or questions, please contact U-M Research ([email protected]).