Scopus API in OpenRefine

Dear all

This deals with API based data fetching from Scopus.
I have both API key and Insttoken (from Scopus with the kind help of our librarian)

The instruction from Scopus reads:

  • Submit APIKey in header: X-ELS-APIKey
  • Submit insttoken in header: X-ELS-Insttoken

My settings are as follows:

Authorization: X-ELS-APIKey: [my key here], X-ELS-Insttoken: [my key here]
User-Agent: OpenRefine 3.5.2 [e3efd4e]
Accept: */*

But receiving authorization error:

org.apache.hc.client5.http.ClientProtocolException: HTTP error 401 : Unauthorized for URL

Could you please help me to know what should be the proper format to add keys in Header?
I tried different combinations space / no space and : / = and so on in the authorization field but it’s always same error.

Someone reported that the api key must pass a query paramater (401 Error · Issue #1 · ElsevierDev/apidemo · GitHub) and I attempted that also but with the same fate.

Does it mean Scopus only supports POST request?


I think we just do not support setting arbitrary HTTP headers yet.
It looks like the issue about this was actually opened by you two years ago!

I think it makes a lot of sense and hope that someone works on this soon. Perhaps an intern this summer?

Yes, I opened that issue in the context of a less known service named Namsor (for name-to-gender inference), but I thought that perhaps someone here has already explored how to use APIs for a globally well-known bibliographic data set called Scopus. I was wondering why Scopus charges us so much for subscription (really, we pay through the nose for Scopus in a developing country like India), but we get so little in terms of API-based services in return. :frowning:

Hi @psm You might already know this but…
Depending on what metadata you are interested in getting out of Scopus, you might find that Crossref API or their Metadata Delivery service might serve you alternatively. Or OpenCitations Corpus which has RDF if you need as well as both incoming and outgoing citations of bibliographic resources.

Furthermore, it might also be useful to surround yourself with a few community members around I4OC - Initiative for Open Citations (Wikimedia and others founded it) and I4OA - Initiative for Open Abstracts. One member of whom is coincidentally instrumental to keeping OpenRefine funded through our CZI relationship, Dario Taraborelli

My take on it is that Scopus charges based on the effort of additional metadata they provide such as Author disambiguation, crosslinks, and all their other record features that are made and maintained by staff and machines. Which means there’s a hefty cost to providing the enriched database they maintain. It is what it is. Have you tried talking with them to negotiate or renegotiate terms and issues with their API services? Might be worth a try, right?

Thanks Thad for showing us the path as usual. We have already explored bibliographic metadata sources other than Scopus like Crossref, OpenAlex, Semantic Scholar, Dimensions, OCC and so on. We are in the process of preparing a training dataset in OpenRefine for automated indexing in a given domain and attempted metadata fetching from Scopus. After exchanging a few mails they allotted us insttoken but with a lots of conditions like - * It can’t appear in any browser side code, * It can’t appear in the address bar and so on.