"No route to host" - Add column by fetching URLs

I’m getting an error when accessing an API but only when fetching the URL in OpenRefine.

In Firefox and HTTPie, the following URL resolves properly and either supplies data if it has anything associated with the identifier (at the end of the URL) or returns an empty array.

https://v2.sherpa.ac.uk/cgi/retrieve_by_id?item-type=publication&api-key=&format=Json&identifier=2313-5786

In OpenRefine, I get an error:

org.apache.hc.client5.http.HttpHostConnectException: Connect to https://v2.sherpa.ac.uk:443 [2 IPv4 addresses and 2 IPv6 addresses] failed: No route to host (connect failed).

Here are some thing I tried:
First I created the URL in a new column, using the ISSN I already have in my project. Then I ‘Add column by fetching URLs’ from the URL column and supply value as the URL to fetch from. When I click on the link in the URL column I created, Firefox will open to the API result and show me some JSON. When I fetch the URL in OpenRefine, I get the error above.

I tried a different service (OpenAlex) but which uses the same URL structure: An API URL appended with an ISSN. OpenRefine connected to the service and retrieved data without issue.

I tried changing to different user-agents and other headers in the OpenRefine fetch URL modal window before running the command and still get the same error.

I also tried using the exact URL in HTTPie and connected successfully.

Has anyone encountered an issue like this before? Is it possible it’s a bug in OpenRefine? Could I be hitting a weird filter issue since I’m on a campus network and a managed device? Could it be some fingerprint method with Cloudfare that I’m not aware of? I’m totally stumped.

Hey @asnukal, what version of OpenRefine are you running? Based on Get Redirected URL, it sounds like fetching the URL contents from Python might be an alternative. If that doesn't help, would you be able to share any more logs, either from your terminal output or the browser console?

Is it possible it’s a bug in OpenRefine? Could I be hitting a weird filter issue since I’m on a campus network and a managed device?

My intuition would be something along the lines of your second guess, perhaps a proxy that your browser is configured to use, but OpenRefine isn't.

Another possibility is that you have the OpenRefine server running on a different computer than your browser and that machine has something different about its networking setup. The HTTP requests are going to be sent from the OpenRefine server, not your local web client.

Since this is a "managed" device, perhaps whoever is doing the "managing" can help, since they likely know how the network was configured, etc.

Tom