Authentication timeout - add column based on URL

christie · May 15, 2025, 5:45pm

I am using a service that expires the tokens every 20 minutes which allows for about 2,000 rows to be processed. Is there a way to automatically refresh a token after authentication expires from within OpenRefine? Or do a manual refresh and restart the lookup where the lookup started failing due to authentication failures? Our process right now is to breakup projects into batches of 2,000 rows, but that is less than ideal. I am hoping that we can find a way to make OpenRefine work for this project.

b2m · May 16, 2025, 5:59am

Here are some ideas:

Use Jython in the "add column based on this column..." dialog for the GET-Request instead of "add column based on URL" and add a routine for updating the token when it expires (advanced).
Split your project into batches using a Custom text facet with the expression rowIndex / 2000 (easy).
Use the setting "store error" in the "add column based on URL" dialog, filter on rows with the error and repeat the process on these rows with a new token (easy but not recommended).

Martin · May 16, 2025, 4:37pm

Another option is to use OpenRefine cli to script the process to split your file into subfiles of 2000 rows each and submit each one in sequence to

Load the file in OpenRefine
Apply the operations to fetch the API
Download the results

You can coalesce all the files received at the end of your script.

tfmorris · May 20, 2025, 2:29am

I am using a service that expires the tokens every 20 minutes which allows for about 2,000 rows to be processed. Is there a way to automatically refresh a token after authentication expires from within OpenRefine? Or do a manual refresh and restart the lookup where the lookup started failing due to authentication failures? Our process right now is to breakup projects into batches of 2,000 rows, but that is less than ideal. I am hoping that we can find a way to make OpenRefine work for this project.

Unfortunately there isn't a good way to do this today. What is the refresh protocol for the authentication token? If it's something standard, we could look at supporting it.

Something which would make the segmented approach easier would be a function to update a column, rather than adding a new column, then you could either pre-chunk things or use facet on error to update the column in piecewise fashion. That would be useful in other contexts as well to work around intermittent problems with long running operations.

Tom

Topic		Replies	Views
OpenRefine 2024 Barcamp: Support OpenAPI in OpenRefine Development & Design barcamp-2024	0	50	July 9, 2024
OpenRefine access using python API Support and Helpdesk	1	419	February 16, 2023
Fail to upload images in Wikimedia commons Support and Helpdesk wikimedia-commons	1	26	July 22, 2025
Using local ChatGPT-like LLMs in OpenRefine for data wrangling Support and Helpdesk hints-and-tips	137	1452	May 23, 2025
Add column by fetching URLs results in a blank export Support and Helpdesk	2	233	May 30, 2023

Authentication timeout - add column based on URL

Related topics