Some research-assessment frameworks rank individual departments or staff members.Credit: Getty
Steve Goodman had just completed a three-page referee’s letter for an academic colleague who was going for promotion in a law department, when he received a bundle of papers that surprised him. The package contained the recommendation letters of other referees. “They were eight to ten pages long,” he says, adding that he had never seen letters like these before.
Goodman, an epidemiologist at Stanford University in California, describes what he read as having been written by people who had digested the colleague’s books, engaged with their arguments and produced a commentary similar to something you’d publish. His letter, by contrast, took half a day to write. His legal academic colleagues told him it was standard for law researchers to set aside a week to write referee letters, and that they received credit towards their own career development for doing so.
Goodman was stunned. The idea that you would do a deep dive into someone’s scholarship to write a recommendation letter is “inconceivable in biomedicine”, he says, where it is standard to focus on a researcher’s publications and assess the importance and value of their contributions. Goodman did not know the scholar in question, but was asked to write the letter because their work bridged law and biomedicine.
Unbeknown to him, Goodman had stumbled across the patchwork nature of research assessment in the United States. Even within an institution, the process by which the work of researchers in different disciplines is evaluated can vary widely. It is a similar picture elsewhere.
Some countries run huge nationwide programmes to assess the quality of research, often drilling down to measure the success of individual departments or staff members so that they can be compared.
The exercises cost millions of dollars and require countless hours that, some argue, could otherwise be dedicated to research. But governments champion them because they help to ensure accountability in the distribution of taxpayers’ money. In other countries, the United States among them, there are no formal mechanisms to assess research nationwide, and evaluations are reserved for individuals’ grant applications or hiring and promotion decisions.
Researchers have mixed views on assessment. Some say it drives improvements in quality, but others argue that it has a negative impact on research culture and morale. And in some countries, such as Argentina, many scientists have earned themselves a promotion but are yet to see it because of budgetary constraints, fostering distrust of the evaluation process.
But across the diverse ecosystem of research assessment, some things remain constant. Assessors often use shortcuts, such as quantitative metrics based on citation data, to judge research quality. Although quicker, such metrics lack the nuance of more time-consuming qualitative peer review, causing tension for those being evaluated.
Many systems are now changing, having recognized the negative effects of assessment on research culture, and some are grappling with how artificial intelligence could support decision-making.
History
Forty years ago, the United Kingdom wanted to have a better way to distribute research funding. The current system is called the Research Excellence Framework (REF), and is run every six or seven years. The stakes are high because the results dictate how £2 billion (US$2.7 billion) in public research funding is distributed annually among the country’s universities, which number more than 150. It aims to capture a wider picture of research, including measurements of the social, economic and political impact and how the public engages with it. The next REF, expected to run in 2029, is likely to place greater emphasis on these elements and on the environment in which research takes place, in a bid to improve research culture.
The UK model inspired a wave of countries and territories to follow suit. In the early 2000s, for example, Hong Kong, New Zealand and Australia adopted regular nationwide exercises.
Many of these schemes focus on researchers’ outputs, which can include journal articles, data sets and contributions to conferences, and which are set before an expert panel for critical appraisal. At the nationwide level, these panels might convene world-leading authorities with specific criteria to consider, but for other assessments, such as hiring decisions, the panels might involve more-informal gatherings of departmental members to discuss applicants, for example.

Research-policy specialist James Wilsdon cautions against using journal impact factors when recruiting researchers.Credit: Layton Thompson
These deliberations take time, and so assessors use shortcuts to help them to understand research quality, shortcuts that often rely on bibliometric data to attempt to describe the significance of their publications. Metrics such as paper or citation counts could feed into discussions or be used alone to determine research quality.
As research assessments rose in prominence, scientists became increasingly concerned about the limitations of the evaluation methods, and in particular, with one metric — the journal impact factor, a measure of the average number of citations that articles published by a journal in the previous two years have received in the current year. It was originally designed to help librarians to decide which journals to subscribe to, but is often used as a proxy for the quality of individual journal papers, or even the authors. Critics say that such a proxy loses all the nuances of how research contributes to advancing knowledge in a field, innovations or benefits to society. “It’s when you are using the journal impact factor to determine whether someone should get a job that you are in very dodgy terrain,” says James Wilsdon, who studies research policy at University College London (UCL).
During the 2010s, a wave of initiatives began to flag concerns about journal impact factors and other metrics in evaluations, and to suggest better ways to use these measures (see ‘Four initiatives that champion responsible research assessment’). These ideas have received widespread support, but metrics continue to be widely used. A global survey of almost 4,000 researchers by Springer Nature, Nature’s publisher, published in April, found that 55% of respondents are mostly or entirely assessed using metrics, with just 12% saying they were mostly or entirely assessed by qualitative means. (Nature is editorially independent of its publisher.) The full anonymized survey data are available here.
The survey found that metrics-based evaluations are often focused narrowly on journal articles, with little consideration of other research outputs, such as data sets. Most reseachers also wished for a balanced weighting of quantitative and qualitative factors, but have concerns about the subjectivity, bias and workload of the latter.
In some countries, metrics, although imperfect, can be helpful, argues Wilsdon. In systems in which nepotism or corruption are rife, using metrics to compare citation counts of researchers, for example, could be useful and corrective, he says.
But he adds that there has been a “growing clamour” to approach research assessment in a more sensitive and holistic way. Many research-assessment programmes look at past work . However, a report1 published in May on behalf of the Global Research Council (GRC), which represents more than 50 government research funders worldwide, found that evaluations over the past five years show a gradual shift towards more forward-looking elements. These components, known as formative assessments, can include giving weight to more than just the research output — the REF scores UK institutions on the impact of their work in society, for example. Researchers know that they can use these examples of impact in their CVs. “That’s quite an important axis of change in assessment, because it’s using assessment in a much more deliberate way to try and shape research systems,” says Wilsdon, who heads the Research on Research Institute (RORI), a non-profit organization based at UCL, which led the work on the report for the GRC.
Promotion policies
Funders are also experimenting with fresh ways to assess researchers. One method that is gaining traction is the narrative CV. These offer a structured, written description of a scientist’s broad contributions and achievements, in contrast to the conventional CV, which typically lists publication and employment histories, with little context.
Narrative CVs really tell you about the researcher, says Yensi Flores Bueso, a cancer researcher at University College Cork, Ireland. The format helps to level the playing field for researchers from different countries and backgrounds, who might not have access to the same resources.
In January, Bueso and her colleagues published an analysis2 of more than 300 promotion policies in more than 120 countries, covering how institutions and government agencies promote people to full professorships. The analysis broadly suggests that countries in the global south tend to use quantitative metrics, whereas high-income countries tend to prioritize qualitative aspects such as a researcher’s visibility and engagement.
Much of how research is assessed is cultural, says Bueso, adding that in some African countries, ”the emphasis is on social commitments”. Promotion documents ask for more details about how a researcher has served civil society, which committees they sit on and their voluntary work, rather than their publication history, she adds. By contrast, some southeast Asian countries rely heavily on metrics and point scoring, with researchers being awarded points according to where in a paper’s author list their name appears or the journal’s impact factor, for example.

Yensi Flores Bueso advocates the use of narrative CVs.Credit: Young Academy of Ireland
Bueso knows from her own experience about the flaws of relying on metrics such as paper counts or the prestige of journals in which work is published. Originally from Honduras, she started her career there by establishing a laboratory where her research fulfilled a social need: determining the rates of disease in certain populations and which diagnostic assays worked best. But the facility was poorly resourced. In 2020, she moved to the United States, where she worked in a series of large, well-funded labs and made contributions to big projects, which are likely to be published in high-impact journals.
“There is not a chance that my contributions from my work in Honduras will ever be seen in any peer-reviewed journal” owing to lack of funds, she says. “What matters at the end is the quality of the research and how you adapt to a team,” she adds.
In some countries, huge nationwide assessment programmes have driven changes to the research landscape. In the United Kingdom, for example, research assessment previously helped to concentrate research funding in a number of elite institutions. And in Australia, where administrators have been evaluating research nationwide since 2010, a 2023 review found that the exercise had negative effects on research culture. Policymakers were left questioning the benefit of the system, and the programme, called Excellence in Research for Australia (ERA), was halted, with researchers wondering what would come next.
Regional reform
The most recent ERA exercise, in 2018, saw institutions submit research outputs and data on funding and citations across eight subject areas. Expert committees then evaluated the work in various ways. The extent to which panels relied on bibliometrics depended on the discipline, with science and engineering subjects relying on metrics more heavily than did the humanities and social sciences, which leant more on peer review.
The results of the exercise — a score on a five-point scale, ranging from below to well above world standard — had only limited impact on institutional funding in some years, and were mostly used by policymakers to compare institutions across disciplines and internationally.
The original idea for ERA was prompted by concerns to do with quality and value for money in Australian research, says Jill Blackmore, an education researcher at Deakin University in Melbourne, Australia. It was introduced after a period in which many institutions amalgamated and policymakers felt that research capacity was lacking. “It was very much a focus of quantity, not quality,” she adds.
The 2023 ERA review by the Australian government found that the exercise pitted universities against each other and saw them poaching staff and duplicating expertise, rather than working together. The process was also onerous and costly. Evidence provided to the review by the University of Sydney, for example, found that its submission, due every three years in line with the assessment cycle, required more than 40,000 hours of staff time, costing in excess of Aus$2 million (US$1.3 million) in salaries alone.
Although researchers are now breathing a sigh of relief, they are awaiting what might come next, says Blackmore. Institutions have been doing pre-emptive work around assessment, just in case, she says, adding, “There is no magic bullet to this, because you’re trying to measure value for money.” She also argues that ERA has served its purpose. Now, in education, she says, “we have got high-quality research, and we didn’t for a while”. She doesn’t think ERA is necessary any longer. “Other countries produce quality research without it.”

Research-policy specialist Takayuki Hayashi says that Japanese researchers are often indifferent about assessment exercises.Credit: Takayuki Hayashi
As in Australia, policymakers in Japan began assessing research on a nationwide scale as part of broader higher-education reforms. Since 2004, Japan has run the National University Corporation Education and Research Evaluation programme every six years. However, the results have little bearing on funding; instead, they inform university planning and higher-level government policy.
But this has downsides, with many universities “feeling the process lacks tangible incentives or impact”, says Takayuki Hayashi, a research-policy scholar at the National Graduate Institute for Policy Studies in Tokyo. Unlike in the United Kingdom, where the national assessment programme carries huge weight, in Japan “many researchers view the current evaluation system with a sense of detachment or fatigue, largely because the results have minimal influence on funding or career advancement”.