Problems with VIAF Reconciliation

Hi, I tried the reconciliation service today and it worked fine. Can you check if you are still facing an issue. If the issue is persisting can you share sample data that will help in further investigation,

Hi all,

I'm the author of the VIAF reconciliation service that's available at https://refine.codefork.com/

There was a change to the VIAF API that caused the service to be completely broken for a while; I fixed this back in February--thank you to Sunil for bringing my attention to it.

However, VIAF has recently been more aggressively rate limiting their API, and I am seeing a lot of 429 "too many requests" http error responses even under relatively light usage (a few thousand requests over a few hours, in a single thread). This almost never happened in the past even with tens of thousands of requests on four concurrent threads of execution.

I've emailed OCLC at bibchange@oclc.org twice, requesting that they loosen the rate limit for my IP address if possible, but they haven't responded.

Does anyone know anything about the status of the VIAF project in general at OCLC? I wonder if they are slowly sunsetting that project, perhaps due to a loss of resources under the current federal administration.

Jeff

3 Likes

Hi Jeff!

Thank you for maintaining this service for so many years. We use it regularly at my workplace. I don't think that US federal cuts would be affecting VIAF, but I'm guessing the general bot-pocalypse affecting open data sources (I think this is becoming a debilitating problem in my field--libraries / archives / museums / cultural heritage) is the main reason for these data limits. See this article on Wikimedia: https://www.pcmag.com/news/wikipedia-faces-flood-of-ai-bots-that-are-eating-bandwidth-raising-costs

I should also have noted, that while I think OCLC is committed to VIAF for the foreseeable future, in the past couple years they have been developing WorldCat Entities as they begin the transition of WorldCat into a more linked-data direction. WorldCat Entities | OCLC I believe that the plan is still to have VIAF as a separate project that interacts with WorldCat Entities, but I'd have to dig around a bit. Jeff Mixter, one of the key folks working on WorldCat Entities at OCLC, has given a number of presentations on it in the past couple years, they are pretty easy to find online.

I worked on the VIAF service, but can't shed any light on its current state. But, I was just manually searching WorldCat and got a 429 error, so I'm guessing something is broken.

Jeff Mixter would be a great resource.

Welcome to the forum, @codeforkjeff! Thank you for your contributions to the OpenRefine ecosystem over the year.

Good call re: bots, that hadn't occurred to me. WorldCat Entities looks interesting, definitely very VIAF-ish. I'll search for Jeff Mixter and see what comes up, thank you.

I suspect OCLC has more generous limits on their APIs if you use a WSKey, but unfortunately, I'm not affiliated with an institutional library anymore. Although I guess, even if I did, I'm not sure OCLC would appreciate me using it for a publicly available service.

Thanks to the other folks who replied as well! I'll update this thread if I hear back from OCLC.

I echo everyone else here when I thank you for all of your hard work on this, Jeff. Is there any update? I have been unable to reconcile using VIAF for a couple of months now.

All best,
Sarah

After months of experiencing the same problems as TRM, I was finally able to utilise the VIAF and VIAF-LC reconciliation services two weeks ago. Alas, this success only lasted a week and now it is back to returning no matches. Any update on this situation would be much appreciated

I haven't heard back from OCLC, unfortunately. I'm continuing to see very high percentages of 429 responses from the VIAF API.

I modified my service running at http: to disable the thread pool and self rate limit by adding a 100ms delay after each request. Hopefully this will be enough to prevent the 429 responses, though it will be a lot slower. I think that's the best I can do at this point.

I'll continue to monitor it when I can find the time.

Jeff

@codeforkjeff, thanks for your ongoing work in keeping the service available. I’m trying to understand whether the recent issues stem from your hosted instance or from the OCLC/VIAF side.

If the HTTP 429 rate-limit errors are coming from the OCLC/VIAF API itself, would running the reconciliation service locally, via conciliator or a similar setup, help avoid those issues?

Put another way: would users experience fewer interruptions or better reliability by hosting the service themselves?

I appreciate your insights, and thank you again for supporting the community.

Hi Martin,

If the host my service is running on isn't being treated differently than any other by VIAF, running it locally shouldn't make any difference in reliability. But that is a big assumption. Many services do set looser and stricter policies based on IP address, so it's possible they set a stricter limit on mine. There's no way to know without information from someone at the VIAF project.

There's certainly no harm in trying it--at the very least, you can confirm for yourself whether any problems you are experiencing are due to the 429 response codes from VIAF. These would be printed in the log.

This branch here has the features for setting the thread pool size to 1 and adding a 100ms delay after making requests to VIAF: add properties for VIAF to set thread pool size and delay for rate limiting by codeforkjeff · Pull Request #37 · codeforkjeff/conciliator · GitHub That's the code running now on my hosted instance; if everything looks good after a few days, I'll make a proper release.

Hope that helps!
Jeff