SDG Mapping of Publications using SciVal

Project Summary
Status:ActiveProject Reference
Start date: 1 May 2021End date:
Activities:Monitoring and Evaluation, Research
Countries:Global
Funding:PVG Global

A new feature added to the DevPubMetric system in May 2021 that links the publication lists generated by DevPubMetric with the 2021 SDG Mapping methodology released as part of the SciVal system in April 2021.

Further information about SciVal’s SDG mapping is available at: https://data.mendeley.com/datasets/9sxdykm8s4/2
(doi: 10.17632/9sxdykm8s4.2)

The first SciVal analysis for SDG mapping was released in April 2021 using a data extract from the Scopus database in January 2021. From June 2021 it is expected that new data extracts will be available through SciVal once every two months until late 2021 when a bi-weekly data extract is expected.

The methodology currently being applied in DevPubMetric requires a set of manual steps outlined below. It is expected that this process will be automated once regular bi-weekly data extracts are available from SciVal in 2021.

Step 1. Create list of publications to be analysed

This step is carried out on a monthly basis for relevant programmes within the DevPubMetric system. Data for the Newton Fund and GCRF programmes are currently available on this website at: pvgglobal.uk/data.

The list of publications need to include either the publications digital object identifier (doi) or the electronic identifier (eid) used by the Scopus system. Where possible, it is advisable to use the eid value as some publications will not have a doi value.

Step 2. Create a Publication Set in SciVal

A list of publication identifiers (eid or doi), with one id per line is pasted into SciVal using the “Import a Publication Set” command. In most cases, SciVal will need to process this list offline before the set is available for additional analysis. An email is generated by SciVal when the new Publication Set is available.

Step 3. Extract the new SDG Mapping from Scival

The new Publication Set is opened in SciVal and an appropriate date range is selected (the DevPubMetric analysis is currently working with publications dates >= 2015.

The resulting publication subset is then viewed so that the results can be exported as a CSV file. Only two data items are exported, the publication’s eid and the field “Sustainable Development Goals (2021)”.

Step 4. Exclude publications in Scival that were added after the most recent SDG analysis.

The data export produced by SciVal (Step 3, above) is likely to include a small number of publications that have been added to the system since the date of the most recent data extract used for the SDG mapping. It is currently not possible to differentiate between publications that do not map to any SDGs, from those which are recent publications that were not included in the most recent SciVal SDG analysis.

The approach used by the DevPubMetric system is to use the original publication list generated in Step 1 as part of a query to the Scopus system to determine when the publication was first added to Scopus and hence available to SciVal for the SDG analysis. The reason for this is that SciVal utilises data derived from Scopus for analysis.

This process utilises the Scopus Application Interface (API) dev.elsevier.com/scopus.html using the “Scopus Search Option) to return a list of publications that were added to Scopus before the most recent SciVal SDG analysis.

The search term used combined a list of Scopus eid values and a function to only return publications that were added to Scopus before a set date.

The Scopus query used in this process is constructed as :
EID(comma separated list of eids) AND ORIG-LOAD-DATE BEF date-in-unix-format

where:
comma separated list of eids is generated from the list of publications.
date-in-unix-format is a numeric value representing the number of seconds since 1 January 1970.

A list of publication identifiers (eids) is then generated by the data returned from the API call.

Step 5. Identify publications that have contributed to Covid-19 research areas.

The Corvid-19 pandemic has generated a very significant surge of publications which could be argued to be relevant to the SDGs or at least the theme of global challenge research.

A list of publications where generated using the query published in SciVal using the Scopus Search API (See above, Step 4).

The Scopus query used in this process is constructed as :
EID(comma separated list of eids) AND (TITLE-ABS-KEY(“coronavirus disease 2019” OR covid19 OR covid OR ncov OR sars-cov-2 OR {novel coronavirus}) AND (PUBYEAR > 2018))

where:
comma separated list of eids is generated from the list of publications.

Step 6. Generate list of publications and the SDGs that they have been mapped against.

This task is processed by reading the data returned from Steps 3, 4 and 5 into a MySQL database where the resulting data tables are joined using a condition to will only return data if the eid is contained in the table generated in Step 4. A standard SQL INNER JOIN condition is used for this step. The resulting query contains two columns, eid and SDG, with one row of data for each SDG that a publication has been mapped against. If a publication has contributed to Covid-19 research the DevPubMetric system adds a row with a dedicated unique value (SDG=20) There is also a dedicated value (SDG=100) used to identify publications that did not map against a SDG and did not contribute to COVID-19 research.

The resulting data table is then available for further analysis. The DevPubMetric system currently provides a diagrammatic analysis of the contributions of both the GCRF and Newton Fund programmes to the SDGs. Additional analysis will be provided in future releases of the DevPubMetric system.

Leave a Reply