I’m getting an error when accessing an API but only when fetching the URL in OpenRefine.
In Firefox and HTTPie, the following URL resolves properly and either supplies data if it has anything associated with the identifier (at the end of the URL) or returns an empty array.
https://v2.sherpa.ac.uk/cgi/retrieve_by_id?item-type=publication&api-key=&format=Json&identifier=2313-5786
In OpenRefine, I get an error:
org.apache.hc.client5.http.HttpHostConnectException: Connect to https://v2.sherpa.ac.uk:443 [2 IPv4 addresses and 2 IPv6 addresses] failed: No route to host (connect failed).
Here are some thing I tried:
First I created the URL in a new column, using the ISSN I already have in my project. Then I ‘Add column by fetching URLs’ from the URL column and supply value as the URL to fetch from. When I click on the link in the URL column I created, Firefox will open to the API result and show me some JSON. When I fetch the URL in OpenRefine, I get the error above.
I tried a different service (OpenAlex) but which uses the same URL structure: An API URL appended with an ISSN. OpenRefine connected to the service and retrieved data without issue.
I tried changing to different user-agents and other headers in the OpenRefine fetch URL modal window before running the command and still get the same error.
I also tried using the exact URL in HTTPie and connected successfully.
Has anyone encountered an issue like this before? Is it possible it’s a bug in OpenRefine? Could I be hitting a weird filter issue since I’m on a campus network and a managed device? Could it be some fingerprint method with Cloudfare that I’m not aware of? I’m totally stumped.