DevPubMetric: Search Protocols

Data used in the DevPubMetric system has been extracted from a variety of datasources before being stored and integrated in a dedicated MYSQL database system.

Technical Details

The data extraction steps used to collate the information are detailed below. These steps are coded into a web-based system (PHP 8.0and Apache 2.4).

Where data are extracted through an Application Programming Interface (API), the extract is downloaded in Javascript Object Notation (JSON) using CURL. The JSON objected returned by the API is then converted into a PHP data structure (nested array) and processed to be stored in relational MYSQL (Version 8.0) tables.

All subsequent data analysis is undertaking using MYSQL queries and procedures.

Data Extraction Process

Data used in the DevPubMetric system are extracted as a set of steps and multiple data sources. The process is outlined below:

Step	Process	Data source	Comments
1	Obtain programme names.	Funder websites	Manual extraction
2	Obtain unique project references within each programme. (UK Research and Innovation projects only)	Gateway to Research	GTR API is used with a search term linked to programme name. (Currently only GCRF or Newton Fund)
3	Download list of publications for UK Research and Innovation Projects.	Gateway to Research	Publications are extracted using the unique project references extracted in Step 2
4	Locate publications from GTR publication records (Step 3) using a Scopus search, with record identified using the list of DOIs returned from GTR	Scopus
5	Locate additional publications from Scopus, using a search for publications that acknowledge funding for a project reference collated at Step2	Scopus
6	Located additional publications using the unique programme names	Scopus	Care is needed to ensure that programme names are unique and to allow for any regional variants (e.g Newton Fund Variants)
7	Extract additional data about all publications collated in steps 4-6.	Scopus	This includes full bibliographic data, and information about authors and their institutional affiliations
8	A monthly extract of publication citations for all publications is collated	Scopus	These data allow the system to track growth of publications with time and to identify those publications that are gaining most interest at any time
9	Open access status of recent publications is updated on a quarterly basis	Scopus	This process is required to capture publications which shift to open access through deposit in institutional repositories or those which shift to open access after a publisher’s embargo period has expired.
10	SDG mapping	SciVal	These data are collated by transferring a list of Scopus publication identifiers (EIDs) to SciVal to create a publication set, which is then used to extract a data file that maps each publication to any SDG’s that it may have contributed to. The EID list transferred from the DevPubMetric system is filtered to exclude any publications included in Scopus after the most recent SciVal SDG analysis.

Gateway to Research (GTR) Data Extracts

GTR API programming instructions are available on the Gateway to Research website.
(Note: UID = Unique Identifier)

Data Items	Detail	Additional data
CURL Options (Required for all extracts)	CURLOPT_RETURNTRANSFER=true CURLOPT_FOLLOWLOCATION=true
CURL Header Options (Required for all extracts)	Accept:application/vnd.rcuk.gtr.json-v7
Return GTR project record (UKRI project code)	https://gtr.ukri.org:443/gtr/api/projects?q=*XXXX*“&f=pro.gr”	*XXXX* = UKRI project reference code
Return GTR project record (from GTR project IDs)	https://gtr.ukri.org:443/gtr/api/projects?q=*XXXX*	*XXXX* = GTR Project UID
Return GTR project outcomes (GTR Project UID)	https://gtr.ukri.org:443/gtr/api/projects/*XXXX*/outcomes/publications	*XXXX* = GTR project UID
Return a list of projects within a UKRI (RCUK) programme	https://gtr.ukri.org/gtr/api/projects?q=”*XXXX*“&f=pro.rcukp	XXXX = UKRI/RCUK Programme names. Currently only 2 are available: · GCRF · Newton Fund
Return details of a person (GTR Person UID)	https://gtr.ukri.org/gtr/api/person/XXXX	XXXX = GTR Person UID
Return details of a organisation (GTR Institution UID)	https://gtr.ukri.org/gtr/api/organisation/XXXX	XXXX = GTR Organisation UID
Return details of funding for a project	https://gtr.ukri.org/gtr/api/funds/XXXX	XXXX = GTR funding UID

Scopus Data Extracts

Scopus API programming instructions are available on Elsevier’s development website. An institutional subscription to the Scopus system is required to use these search terms. Users will require a Scopus API Key and in some circumstances will also required an additional Institutional Token if data extraction is taking place outside the standard range of institutional IP addresses.
(Note: UID = Unique Identifier)

Data Items	Detail	Additional Data
CURL Options (Required for all extracts)	CURLOPT_RETURNTRANSFER=true CURLOPT_FOLLOWLOCATION=true
CURL Header Options (Required for all extracts)	Accept:application/json X-ELS-APIKey:*XXXX* X-ELS-Insttoken:*YYYY*	*XXXX* = Scopus API key YYYY = Scopus Institutional Token (May be required for some applications)
Scopus Abstract Search (Document identifiers)	https://api.elsevier.com/content/search/scopus/?query=*XXXX* Where: *XXXX* = standard Scopus Search term: DOI(YYYY) EID(YYYY)	*YYYY* = Either a standard Digital Object Identifier (DOI) or Scopus document Electronic Identifier (EID) See the Scopus Search Guide for information on search strategy
Scopus Search (by Programme Name)	https://api.elsevier.com/content/search/scopus/?query=FUND-ALL(*YYYY*)	YYYY = a unique programme name See the Scopus Search Guide for information on search strategy Data were extracted using the COMPLETE view of the API
Scopus Abstract Search (by Project Identifier)	https://api.elsevier.com/content/search/scopus/?query=FUND-NO(*YYYY*)	*YYYY* = a unique project reference code See the Scopus Search Guide for information on search strategy Data were extracted using the COMPLETE view of the API
Scopus Institutional Affiliation Hierarchy (Provides a full list of institutional affiliations for large/complex research institutions)	Part 1 of extract: Search for EID for the hierarchy document. https://api.elsevier.com/content/affiliation/affiliation_id/*XXXX*?view=FULL	Part 1: *XXXX* represents the top-level affiliation ID. The JSON returned by this call includes a section [hierarchy-document] which lists an document electronic identifier [eid] used in Part 2 of the data extract The Full View option is required to provide the link to the hierarchy document.
	Part 2 of extract: Returns the full institutional hierarchy for the institution (including parent institutions if relevant). https://api.elsevier.com/content/affiliation/eid/*YYYY*?view=FULL	Part 2 of extract: *YYYY* represents the [eid] returned from Part 1 of the data extraction. The Full View option is required to provide the list of affiliated institutions.
Corresponding Authors (Scopus) Extracted for each individual publication in Scopus	https://api.elsevier.com/content/abstract/eid/YYYY?view=full	*YYYY* represents the [eid] for each publication The Full View option is required to provide information about the corresponding author and their affiliation. These data are stored in the [bibrecord] node of the data returned from the API
Funding Information (Scopus) Extracted for each individual publication in Scopus	https://api.elsevier.com/content/abstract/eid/YYYY?view=full	*YYYY* represents the [eid] for each publication The Full View option is required to provide information about funding associated with publications. These data are stored in the [xocs:meta] node of the data returned from the API. Data capture and processing has been designed to capture information from multiple funders and recognise that more than more research grant per funder may have contributed to that publication.

The Scopus API has limitations on both the number of calls within a set time period and the number of objects returned by each call. For this reason the code used in DevPubMetric has been implemented where possible, to group queries to reduce the number of API calls. When the number of objects returned by a search exceeds the default maximum (e.g. 25 for the full view of abstracts returned by a Scopus search) the system uses either paging or cursors to process the full list. Additional information on this process and usage limits are documented on Elsevier’s Development website.