Authentication timeout - add column based on URL

I am using a service that expires the tokens every 20 minutes which allows for about 2,000 rows to be processed. Is there a way to automatically refresh a token after authentication expires from within OpenRefine? Or do a manual refresh and restart the lookup where the lookup started failing due to authentication failures? Our process right now is to breakup projects into batches of 2,000 rows, but that is less than ideal. I am hoping that we can find a way to make OpenRefine work for this project.

Here are some ideas:

  • Use Jython in the "add column based on this column..." dialog for the GET-Request instead of "add column based on URL" and add a routine for updating the token when it expires (advanced).
  • Split your project into batches using a Custom text facet with the expression rowIndex / 2000 (easy).
  • Use the setting "store error" in the "add column based on URL" dialog, filter on rows with the error and repeat the process on these rows with a new token (easy but not recommended).

Another option is to use OpenRefine cli to script the process to split your file into subfiles of 2000 rows each and submit each one in sequence to

  • Load the file in OpenRefine
  • Apply the operations to fetch the API
  • Download the results

You can coalesce all the files received at the end of your script.

I am using a service that expires the tokens every 20 minutes which allows for about 2,000 rows to be processed. Is there a way to automatically refresh a token after authentication expires from within OpenRefine? Or do a manual refresh and restart the lookup where the lookup started failing due to authentication failures? Our process right now is to breakup projects into batches of 2,000 rows, but that is less than ideal. I am hoping that we can find a way to make OpenRefine work for this project.

Unfortunately there isn't a good way to do this today. What is the refresh protocol for the authentication token? If it's something standard, we could look at supporting it.

Something which would make the segmented approach easier would be a function to update a column, rather than adding a new column, then you could either pre-chunk things or use facet on error to update the column in piecewise fashion. That would be useful in other contexts as well to work around intermittent problems with long running operations.

Tom