DevPubMetric provides an objective and quantitative method to measure and compare the academic impact of development or global challenge research, whilst also estimating the levels of engagement of developing country researchers and research institutions.
In the September 2000, the United Nations (UN) adopted the Millennium Declaration and in the following year the Millennium Development Goals (MDGs). The MDG’s set out an ambitious agenda for global development and poverty reduction up to 2015 when the current set of Sustainable Development Goals (SDGs) were adopted by the UN.
The resulting focus on sustainable development and addressing the world’s shared global challenges has led to very significant growth in research that addresses these challenges.
A number of major research initiatives addressing global challenges and sustainable development have been launched by governments, philanthropic and multilateral organisations along with a number of major charities.
Most of these new programmes have shared objectives that include:
- Contributing to global knowledge (publications);
- Driving both academic and development impact;
- Engaging with and empowering developing country researchers and research organisations.
There are well established methodologies for measuring and documenting development impact, though in many cases these can be challenged by the complex non-linear pathways that link research and impact, the time-lag to build impact and the question of how to evidence attribution.
There are many fewer examples of systematic processes to measure the academic impact of development research, and virtually none that map the global engagement and partnerships that deliver the research and its impact.
One of the first examples of such an approach was developed by the Ecosystem Services for Poverty Alleviation Programme (ESPA), a global interdisciplinary research programme funded by the Government of the United Kingdom, through the Department for International Development, Natural Environment Research Council and Economic and Social Research Council.
In 2013 the ESPA Directorate developed a quantitative monitoring framework for the programme, that included measures of both academic impact (numbers of publications and citations) and engagement with developing country researchers (proxy indicators based on the proportion of academic publications with a developing country first author and the proportion of with at least one developing country author. These data formed part of ESPA’s monitoring and evaluation leading to standard reporting against its Theory of Change and Logical Framework. An example of regular reporting that used these data is presented below
Monthly reporting was collated by the ESPA Directorate up to the point that the programme closed in July 2018. ESPA’s final highlight document included the last set of data collated by the programme at which time the programme had captured information about 419 journal articles which had been cited 8,516 times.
The original system developed by the ESPA Directorate utilised data and systems that were available in 2013-15 with a high level of reliance on internal data and extensive manual processing that required every paper to be assessed by a team member. This meant that data collection and reporting could not continue automatically once the programme had closed. It also meant that ESPA’s approach could not be extended to be used by much larger research programmes that followed including the UK’s Newton Fund and Global Challenges Research Fund.
Defining the challenge:
Capturing the opportunities of “open” big data.
During 2020, work commenced to implement a new system, DevPubMetric that could provide an update to the data previously collected by ESPA in a way that was essentially automatic and could easily be extended to other research programmes and research funders. The motivation for this work was to update the statistics from ESPA to capture and quantify the very significant academic impact of research after projects and programmes closed. It is often said that up to half of a programme’s academic impact may be invisible after funding ceases.
Whilst this work was originally intended to update data for ESPA, it was realised that the same approach could be used for other programmes, providing the opportunity to do quantitative comparative analysis between and within programmes.
In order to deliver these objectives DevPubMetric was designed to address a set of challenges an opportunities as detailed below:
Challenges and Opportunities | Responses |
---|---|
Automate data collection and analysis. | This has become possible through the evolution of data products and systems. • Extensive access to downloadable data through web-based APIs ( Application Programming Interface) • Enhanced data availability for UK funded research through the Gateway to Research (GTR) • Enhanced data content available from commercial bibliometric databases (including open access metrics and information on authors and their institutional affiliations. |
Remove the requirement for manual processing. | The development and application of a standardised system for data processing and analysis has removed the requirement for routine manual interventions. It has not been possible to remove this requirement entirely because of some issues relating to data compatibility, for example in the names of some countries and institutions in different systems. Where possible these issues linked to integrating data from different sources have been addressed through the application of look-up tables. These tables need to be checked and updated at intervals. |
Develop new approaches to measure the levels of open access publication and developing country engagement. | The original ESPA system linked measures of developing country engagement to data stored about funded researchers which are not available in the public domain and could not be applied to other research programmes. For this reason the new system of analysis tracks engagement through the institutional links of authors, which are data readily available from the current generation of bibliometric analysis including the Elsevier Scopus system that is used in the new analysis. |
Enhance the range of publications used for analysis | The system developed for ESPA used data downloaded from the Web of Science bibliometric system delivered by Clarivate Analytics. The updated system has moved to use data provided by Scopus provided by Elsevier. Both systems are constantly evolving and whilst they have many common functions, there are aspects where each can be considered to have an advantage. One of the perceived strengths of Scopus is its broader coverage of current academic journals, especially for social sciences, arts and humanities. This is the main reason that the choice of data provider has changed for the current system of analysis. Both Scopus and Web of Science have added additional information in the period since ESPA’s original system was designed. Of most relevance is the ability to identify open access publications and the provision of detailed information about the institutional affiliations of all authors of a publication |
Enable objective comparison with other programmes | The design of the new data collection and analysis system utilises a set of standard data that should be readily available for most major research programmes. This required the ability to link publications to programmes using unique identifiers of either by programme name or project reference codes. These in turn can be used as search terms in biometric systems. In addition, if the programme publishes a downloadable record of publications these can also be used to supplement analysis in case authors have not acknowledged links to funding programmes. It is also possible to undertake analysis at institutional level for research institutions and funders which are dedicated to global development activities, for example the United Kingdom’s Department for International Development or Canada’s International Development Research Centre. |
Sources of Information (data)
DevPubMetric utilises data from a variety of open access data sources and then combines it with data extracted from the commercially provided Scopus bibliometric system. The results provided here are restricted to those available from open access sources, or the synthesis of results obtained from Scopus along with information that Scopus lists in the public domain.
Additional detail and links to the data sources utilised in the analysis.
The system then generates lists of projects associated with key research programmes and full lists of publications that have been captured through the core search protocols. These data have been collated in the data repository for the system: https://pvgglobal.uk/devpubmetric/repository. These data extracts are updated on the first day of each month with data generated from the system after new publications and projects have been extracted.
The DevPubMetric system requires the following information:
(Further information about search protocols)
Data requirement | Current Sources | Comments |
---|---|---|
Programme names | These are obtained from research funder websites. ESPA: www.espa.ac.uk GCRF: ukri.org/research/global-challenges-research-fund/ Newton Fund: newtonfund.ac.uk/ USAID: usaid.gov/what-we-do/economic-growth-and-trade/research UK FCDO (DFID): www.gov.uk/government/organisations/foreign-commonwealth-development-office/about/research Canadian International Development Research Centre (IDRC/CRDI): idrc.ca | One of the search criteria used in analysis is to extract abstracts from publications that acknowledge the programme. This information needs to represent a unique identifier and for this reason acronyms have been found to be unreliable SCOPUS Search terms have been developed for each programme to guide the capture of publications from each programme to balance the capture of relevance publications against rejection of those that should not be attributed at programme level. Some programmes, such as the Newton Fund have multiple names in difference contexts. The search process has been designed to cover all known variants of the programme name (currently restricted to English language versions.) |
Project identifier codes | Gateway to Research (For programmes administered by UKRI) Directly from research funder websites (listed above). | These are used to search for publications that cite the project reference code. This a requirement from nearly all funders, but this requirement is not always respected by researchers. There are additional problems if project reference codes are not distinctive or if they have been entered incorrectly by authors. |
Publications | UKRI Gateway to Research Scopus Publications are downloaded from Scopus using an abstract search based on unique project identifiers (where available) or by using queries of funder information using a set of uniquely defined programme names. | All projects funded by UKRI are required to report outcomes including publications through the ResearchFish reporting system. These data are then checked by UKRI systems before being replicated on their Gateway to Research System. These data are in the public domain and can be downloaded using using unique identifiers for a programme or project. One constraint with this approach is that it relies on accurate and timely reporting by projects. In order to partially address the constraint of under-reporting, Scopus is used to capture additional publications, by searching for publications using unique programme and projects identifiers where these are available. There are limitations to this approach which are discussed below in the sections below on Quality Assurance and Remaining Challenges |
Descriptive data about publications: Authors, institutions and their location (country) and open access status of publications. | Scopus | Scopus was used to provide the descriptive information about publications for further analysis. This included a full list of authors, their institutional affiliation and the countries that those institutions were operating from. Scopus also provided data on which publications were open access.The disadvantage of reliance on Scopus is that some lesser-know journals are not covered (including many published in developing countries) and that the system does not capture open-access publications provided through institutional repositories. An alternative approach utilising data that are collected through the ResearchFish / Gateway to Research portals was found to have insufficient detail to meet the needs of the analysis. In some types of publication, lists of authors were incomplete and institutional detail did not have the level of information required to link back to map engagement with developing countries. |
Publication metadata: Corresponding Authors and their affiliation. | Scopus | The first phase of analysis using DevPubMetric used authorship as a proxy measure of the levels of engagement of developing country researchers. It was noted that first authorship is not always the most appropriate measure of scientific leadership in publications and that protocols for deciding who acts as first author varies between disciplines. The Scopus system stores information about the corresponding author for most publications. These data have collected from May 2021 using the Scopus “Abstract” API (Application Interface) |
Publication metadata: Funding organisations and project references. | Scopus | The Scopus system uses pattern matching and AI techniques to extract information extracted from publications through their acknowledgements of funding. Data describing funders and project reference codes has been collected from May 2021 using the Scopus “Abstract” API (Application Interface). Earlier versions of the DevPubMetric system collated funding data from the Scopus “Search” API, which was found to be incomplete during the quality assurance process of the data extract. |
Development status of institutional affiliations for authors. | World Development Indicators (World Bank Group) | Current data on World Development Indicators, including the income status of all nations recognised by the United Nations has been downloaded for integration into the analysis. The analysis is based on the institutional affiliation of each author as recorded by the publication. In some cases authors will list more than one institutional affiliation. In this case, the author was linked to the country with the lowest income status.(High income > Middle income > Low Income) |
Geographic and SDG groups and additional population data | World Population Prospects (UN Population Division) | Regional groupings are documented in as metadata for the Population Prospects report. https://population.un.org/wpp/Download/Metadata/Documentation/ Full population data and future projections are available at: https://population.un.org/wpp/Download/Standard/CSV/ |
Integrating data: Generic approaches, challenges, complexity and solutions
DevPubMetric utilises data provided from up to four distinct data sources. These data need to be integrated before the analysis can be completed. There are three stages to this process:
- Quality assurance (see below)
- Capturing unique identifiers for core data
- Creating links between data sources
Unique identifiers
Most of the data sources used in DevPubMetric have pre-existing Unique Identifiers (UIDs) for core data. These have been retained for the analysis and where necessary look-up tables have been created to link data sources. These are outlined below:
Data type | Unique Identifier(s) |
---|---|
Programme Names | As published by funders. The SCOPUS search terms needed to be adjusted to maximise the probability that publications were captured whilst also minimising the likelihood of false positives. An additional look-up table has been implemented to capture publications linked the regional variants of the UK’s Newton Fund |
Project references | Derived from Gateway to Research (GTR) ∙ Published UKRI project reference ∙ GTR Project UID. From May 2021, DevPubMetric has also extracted project references via Scopus which are derived from the acknowledgement sections of most publications. |
Publications | · Scopus Electronic Identifier (eid) · Gateway to Research UID · Digital Object Identifiers (DOI) included in bibliographic information · The ESPA programme produced a downloadable database of publications produced up to July 2018. DOIs are used to create a look-up table linking systems |
Institutions | · Scopus Electronic Identifier (eid) Gateway to Research data does not currently have comprehensive data for non-UK institutions. It is currently not possible to cross-link these data sources |
Countries and Income Status | The World Bank’s World Development Indicators include a comprehensive list of countries recognised by the United Nations. This list includes the current income-status for each territory. The Scopus system does not publish a list of countries, but this can be extracted from data downloaded from Scopus. These data include a small number of territories not currently recognised by the United Nations or with names that do not directly map onto those used by the United Nations. A look-up table has been created to link country names derived from Scopus to those published in the World Development Indicators. This table is updated manually when a new country is listed in Scopus, and once a year when the annual revision of World Development Indicators are published (normally August). |
Regular Data downloads and updates
The data used by DevPubMetric are updated at regular intervals as follows:
Detail on search protocols and data extraction used by the DevPubMetric analysis
Data Type | Source | Interval |
---|---|---|
Programmes | Funder website | Updated as required Manual |
Project codes and identifiers | Gateway to Research | Monthly Automatic. |
Publication outcomes | Gateway to Research (Extract new DOIs) | Monthly Automatic |
Publication details (Meta data) | Scopus | Monthly Automatic |
Publication citation counts | Scopus These data are stored for every publication at monthly intervals, permitting tracking of the time course of academic impact for individual publications, projects and programmes | Monthly Automatic |
Countries and Development Status | World Development Indicators Bulk downloads of the current data is available as either Excel spreadsheet or a CSV file with additional information provided through the World Bank’s Data Catalogue Customised data extracts are available from the World Bank’s Data Catalogue API. | Annual Manual extract. |
Quality Assurance
Each of the systems publishing data used in DevPubMetric has implemented a form of internal quality assurance with details published along with the data. the development of the analysis system has identified a number of areas where existing quality assurance currently acts as a constraint.
Publication Coverage (Scopus)
The Scopus system has comprehensive coverage of academic publications, but this does not include all publications likely to be used for reporting results from development or global challenge research. Known gaps include:
- Lower impact publications, including many national or regional publications originating from developing countries;
- Relatively low coverage of books, book chapters and monographs;
- Lower coverage of publications in languages other than English.
- Poor coverage of pre-prints.
Whilst these issues are a constraint, there is currently no better alternatives that can provide the systematic analysis used in the current system. As with all commercial bibliometric systems, Scopus (and possible alternatives) are under constant development and some the current constraints may be mitigated in future releases.
The advantages of using a comprehensive commercial bibliometric system is that there are very high levels of internal quality assurance and internal consistency. One area key importance is the data data model which has four core components: Publications,
Reporting Gaps and Over-reporting
(Gateway To Research & Scopus)
DevPubMetric makes full use of UKRI project reporting from the Gateway to Research System, which in turn relies on reporting by projects and researchers. A subset of publications reported through ResearchFish were tested as an additional quality assurance during the development of the current system. The following issues emerged from the QA process:
- Reporting Gaps: This is a result of researchers failing to (or being unable to ) report publications linked to a project or programme. These gaps are identified when results are compared with those resulting to a direct Scopus search by programme name or project identifiers.
- Over-reporting: There are also examples of potential over-reporting, which represent publications that should realistically not be considered to be attributable to the programme or project that they are linked to.
This can be seen in Gateway to Research records, but also can apply to records derived from Scopus if the authors have provided the relevant programme name or project codes.
Issues relating to under and over-reporting are largely linked to behaviour of researchers and authors. Research funders provide guidance and encouragement to facilitate the adoption of good practice, but it is clear that both issues remain to some extent. One possible application of the analytical system would be to use a random subset of publications within a programme to provide an assessment of the scale of the issue
Lack of Institutional Affiliation in some Publications
The analysis and reporting in DevPubMetric uses institutional affiliations of authors to link publications with countries. There are a small number of publications where the publisher does not include this information and hence it is not collected by abstracting services and bibliometric analysis.
Analysis of the scope of this issue is ongoing. Initial results suggests that it is most prevalent in papers with large numbers of authors in specific academic disciplines.
Institutions and Countries Linked to Research
(Gateway to Research)
Data extracted from Gateway to Research were examined to evaluate how these could be used to enable additional analysis of the levels of engagement with researchers and institutions in developing countries. This was not possible for two reasons:
- Gateway to Research is currently not recording links between researchers and institutions listed on a grant, with the exception of the one Principal Investigator listed on the grant application.
(Data are available of the current institution linked to individuals, but this is know to change with time and may not relate to a grant at any time. - Gateway to Research records details of the main institutions linked to the grant, including those outside the UK. Details recorded for non-UK institutions are inconsistent, with the majority of records not listing which country those institutions are located in.
There are also instances of duplication of institutional records which means that there may not be a unique institutional identifier for non-UK institutions. This issue also applies to some non-traditional UK institutions.
Current Applications for DevPubMetric
Current applications of the DecPubMetric system are provided on the website at pvgglobal.uk/devpubmetric. More detailed analysis of results are provided on the following topis:
- Renewing the analysis of results from the ESPA programme.
pvgglobal.uk/espa-ten-years-on/ - Application of DevPubMetric to the Global Challenges Research Fund and Newton fund. Comparison of GCRF, The Newton Fund and ESPA
- Application of DevPubMetric to other development research programmes, nows including FCDO/DFID, USIAD and IDRC.
- SDG mapping of research publications captured by the system
- Analysis of academic outputs from the Nairobi Alliance (A research alliance joining the Universities of Nairobi, Rwanda, Malawi, Witwatersrand and Leicester)
Potential future developments and applications for DevPubMetric
The data collected for DevPubMetric analysis can be extended to build additional applications. The following potential applications are being consider for the next phase of applications:
- Institutional performance for development research, either at programme or full institutional level.
- Gender analysis of publications.
- Linking the analysis at programme or institutional level with emerging processes to link academic outputs to relevance to the Sustainable Development Goals.
Change/update log
Date | Change | Comments |
---|---|---|
10 October 2021 | Added links to full publication lists extracted by DevPubMetric for the ESPA, GCRF and Newton Fund programmes | These extracts are updated on a monthly basis Data are available at: https://www.pvgglobal.uk/devpubmetric/repository |
28 September 2021 | Added documentation on links to additional population data from the UN’s World Population Prospects data | Data are available at: https://population.un.org/wpp/Download/Standard/CSV/ |
28 September 2021 | Added documentation on links to additional data describing UN regional groups | These data are presented on the country summary pages of the website. They are derived from the UN World Population Prospects system at: https://population.un.org/wpp/Download/Metadata/Documentation/ |
16 September 2021 | Updated data derived from the World Bank’s World Development Indicators. This update includes changes in the income status of 10 countries. Two additional fields of data were added to country summaries (pvgglobal.uk/country-list/): ● National Gini coefficient, a measure of inequity; ● Proportion (%) of the youth population (15-24) not in employment, education or training (NEET). | |
8 June 2021 | Published information about a wider range of development research programmes | Published data on: FCDO/DFID Research (UK) USAID (United States) IDRC (Canada). Information about the programme names used to identify publications Comparison of DevPubMetric KPIs for all programmes currently captured by the system. |
18 May 2021 | Identification of publications that contribute to Covid-19 research | The methodology used to generate SDG mapping was extended to identify publications relevant to Covid-19. These data, where available, are currently included in the diagrammatic presentations of SDG mapping. pvgglobal.uk/activity/sdg-mapping/ |
18 May 2021 | SDG Analysis provided in DevPubMetric. | SDG analysis and mapping has been including by linking publication lists generated by DevPubMetric with the 2021 SDG Mapping Protocol provided in SciVal. Full details of the methodology are provided at: pvgglobal.uk/activity/sdg-mapping/ |
1 May 2021 | Publication of a full list of projects that have been captured by the DevPubMetric system for the GCRF and Newton programmes derived from the UKRI Gateway To Research system. Data are provided as a CSV file that can be read directly by most spreadsheet and database systems | The most recent data extract is updated on the first data of each month. pvgglobal.uk/data/GCRF-projects.csv pvgglobal.uk/data/NEWTON-projects.csv |
1 May 2021 | Publication of a full list of publications that have been captured by the DevPubMetric system for the GCRF and Newton programmes. Data are provided as a CSV file that can be read directly by most spreadsheet and database systems | The most recent data extract is updated on the first data of each month. pvgglobal.uk/data/GCRF-publications.csv pvgglobal.uk/data/NEWTON-publications.csv Archives of these data are available at: pvgglobal.uk/archive/gcrf_publications/ pvgglobal.uk/archive/newton_publications/ |
1 May 2021 | Enhanced data capture of publications derived from the UKRI Gateway To Research (GTR) database | Data capture methodology modified to ensure that all publications listed in GTR are assessed for inclusion in the DevPubMetric analysis on a monthly basis. This change was required to ensure that data were synchronised as GTR reporting dates can be several months after publication and data capture by the Scopus system. |
1 Feb 2021 | Created archives of monthly extracts of recently captured or recorded awards and publications for GCRF and the Newton Fund | Archives of new records Publications pvgglobal.uk/archive/newton/ pvgglobal.uk/archive/gcrf/ Awards pvgglobal.uk/archive/newton_awards/ pvgglobal.uk/archive/gcrf_awards/ |
1 Jan 2021 | Publishing details of UKRI projects extracted from the Gateway to Research website | Data are currently provided for the GCRF and Newton Fund programmes pvgglobal.uk/devpubmetric/newton-recent-awards/ pvgglobal.uk/devpubmetric/gcrf-recent-awards/ |
14 October 2020 | Enhanced detail describing the SCOPUS search terms used to identify publications by programme name: – Newton Fund by local variants | Updated documentation |
14 October 2020 | Added text to this document outlining the need to adjust the SCOPUS search to enhance the accuracy of search results by programme name | This change covers the need to adjust the SCOPUS search terms used for ESPA and GCRF to remove false positives. The databases were also searched and purged of any entries that were considered to be erroneous after the new searches were defined. |
14 October 2020 | Created change log |