DevPubMetric: Search Protocols

Data used in the DevPubMetric system has been extracted from a variety of datasources before being stored and integrated in a dedicated MYSQL database system.

Technical Details

The data extraction steps used to collate the information are detailed below. These steps are coded into a web-based system (PHP 8.0and Apache 2.4).

Where data are extracted through an Application Programming Interface (API), the extract is downloaded in Javascript Object Notation (JSON) using CURL. The JSON objected returned by the API is then converted into a PHP data structure (nested array) and processed to be stored in relational MYSQL (Version 8.0) tables.

All subsequent data analysis is undertaking using MYSQL queries and procedures.

Data Extraction Process

Data used in the DevPubMetric system are extracted as a set of steps and multiple data sources. The process is outlined below:

StepProcessData sourceComments
1Obtain programme names.Funder websitesManual extraction
2Obtain unique project references within each programme. (UK Research and Innovation projects only)Gateway to ResearchGTR API is used with a search term linked to programme name.
(Currently only GCRF or Newton Fund)
3Download list of publications for UK Research and Innovation Projects. Gateway to ResearchPublications are extracted using the unique project references extracted in Step 2
4Locate publications from GTR publication records (Step 3) using a Scopus search, with record identified using the list of DOIs returned from GTR Scopus
5Locate additional publications from Scopus, using a search for publications that acknowledge funding for a project reference collated at Step2Scopus
6Located additional publications using the unique programme namesScopusCare is needed to ensure that programme names are unique and to allow for any regional variants (e.g Newton Fund Variants)
7Extract additional data about all publications collated in steps 4-6.ScopusThis includes full bibliographic data, and information about authors and their institutional affiliations
8A monthly extract of publication citations for all publications is collatedScopusThese data allow the system to track growth of publications with time and to identify those publications that are gaining most interest at any time
9Open access status of recent publications is updated on a quarterly basisScopusThis process is required to capture publications which shift to open access through deposit in institutional repositories or those which shift to open access after a publisher’s embargo period has expired.
10SDG mappingSciValThese data are collated by transferring a list of Scopus publication identifiers (EIDs) to SciVal to create a publication set, which is then used to extract a data file that maps each publication to any SDG’s that it may have contributed to.

The EID list transferred from the DevPubMetric system is filtered to exclude any publications included in Scopus after the most recent SciVal SDG analysis.

Gateway to Research (GTR) Data Extracts

GTR API programming instructions are available on the Gateway to Research website.
(Note: UID = Unique Identifier)

Data ItemsDetailAdditional data
CURL Options
(Required for all extracts)
CURLOPT_RETURNTRANSFER=true
CURLOPT_FOLLOWLOCATION=true
CURL Header Options
(Required for all extracts)
Accept:application/vnd.rcuk.gtr.json-v7
Return GTR project record (UKRI project code)https://gtr.ukri.org:443/gtr/api/projects?q=XXXX“&f=pro.gr”XXXX = UKRI project reference code
Return GTR project record
(from GTR project IDs)
https://gtr.ukri.org:443/gtr/api/projects?q=XXXXXXXX = GTR Project UID
Return GTR project outcomes
(GTR Project UID)
https://gtr.ukri.org:443/gtr/api/projects/XXXX/outcomes/publicationsXXXX = GTR project UID
Return a list of projects within a UKRI (RCUK) programmehttps://gtr.ukri.org/gtr/api/projects?q=”XXXX“&f=pro.rcukpXXXX = UKRI/RCUK Programme names. Currently only 2 are available:
· GCRF
· Newton Fund
Return details of a person
(GTR Person UID)
https://gtr.ukri.org/gtr/api/person/XXXXXXXX = GTR Person UID
Return details of a organisation
(GTR Institution UID)
https://gtr.ukri.org/gtr/api/organisation/XXXXXXXX = GTR Organisation UID
Return details of funding for a projecthttps://gtr.ukri.org/gtr/api/funds/XXXXXXXX = GTR funding UID

Scopus Data Extracts

Scopus API programming instructions are available on Elsevier’s development website. An institutional subscription to the Scopus system is required to use these search terms. Users will require a Scopus API Key and in some circumstances will also required an additional Institutional Token if data extraction is taking place outside the standard range of institutional IP addresses.
(Note: UID = Unique Identifier)

Data ItemsDetailAdditional Data
CURL Options
(Required for all extracts)
CURLOPT_RETURNTRANSFER=true
CURLOPT_FOLLOWLOCATION=true
CURL Header Options
(Required for all extracts)
Accept:application/json
X-ELS-APIKey:XXXX
X-ELS-Insttoken:YYYY
XXXX = Scopus API key
YYYY = Scopus Institutional Token (May be required for some applications)
Scopus Abstract Search
(Document identifiers)
https://api.elsevier.com/content/search/scopus/?query=XXXX

Where: XXXX = standard Scopus Search term:
DOI(YYYY)
EID(YYYY)
YYYY = Either a standard Digital Object Identifier (DOI) or Scopus document Electronic Identifier (EID)

See the Scopus Search Guide for information on search strategy
Scopus Search
(by Programme Name)
https://api.elsevier.com/content/search/scopus/?query=FUND-ALL(YYYY)YYYY = a unique programme name

See the Scopus Search Guide for information on search strategy

Data were extracted using the COMPLETE view of the API
Scopus Abstract Search
(by Project Identifier)
https://api.elsevier.com/content/search/scopus/?query=FUND-NO(YYYY)YYYY = a unique project reference code

See the Scopus Search Guide for information on search strategy

Data were extracted using the COMPLETE view of the API
Scopus Institutional Affiliation Hierarchy
(Provides a full list of institutional affiliations for large/complex research institutions)
Part 1 of extract:
Search for EID for the hierarchy document.
https://api.elsevier.com/content/affiliation/affiliation_id/XXXX?view=FULL

Part 1:
XXXX represents the top-level affiliation ID. The JSON returned by this call includes a section [hierarchy-document] which lists an document electronic identifier [eid] used in Part 2 of the data extract
The Full View option is required to provide the link to the hierarchy document.
Part 2 of extract:
Returns the full institutional hierarchy for the institution (including parent institutions if relevant).
https://api.elsevier.com/content/affiliation/eid/YYYY?view=FULL
Part 2 of extract:
YYYY represents the [eid] returned from Part 1 of the data extraction.
The Full View option is required to provide the list of affiliated institutions.
Corresponding Authors (Scopus)
Extracted for each individual publication in Scopus
https://api.elsevier.com/content/abstract/eid/YYYY?view=fullYYYY represents the [eid] for each publication

The Full View option is required to provide information about the corresponding author and their affiliation. These data are stored in the [bibrecord] node of the data returned from the API
Funding Information (Scopus)
Extracted for each individual publication in Scopus
https://api.elsevier.com/content/abstract/eid/YYYY?view=fullYYYY represents the [eid] for each publication

The Full View option is required to provide information about funding associated with publications.

These data are stored in the [xocs:meta] node of the data returned from the API.

Data capture and processing has been designed to capture information from multiple funders and recognise that more than more research grant per funder may have contributed to that publication.

The Scopus API has limitations on both the number of calls within a set time period and the number of objects returned by each call. For this reason the code used in DevPubMetric has been implemented where possible, to group queries to reduce the number of API calls. When the number of objects returned by a search exceeds the default maximum (e.g. 25 for the full view of abstracts returned by a Scopus search) the system uses either paging or cursors to process the full list. Additional information on this process and usage limits are documented on Elsevier’s Development website.