Can anyone recommend a service that can pull country names from address or affiliation strings? Either an API where you can feed it the address/affiliation string via Fetch URL and parse the response JSON or via reconciliation service?
Also one that pulls lat/long and a map service that is easy to use for visualizing the results?
I haven’t tried it recently but there’s a wikipage on using Google Maps to get lat/lon for a street address Geocoding · OpenRefine/OpenRefine Wiki · GitHub
1 Like
Hi Chris,
Can anyone recommend a service that can pull country names from address or affiliation strings? Either an API where you can feed it the address/affiliation string via Fetch URL and parse the response JSON or via reconciliation service?
If the address actually contains the country, you might try Named Entity Recognition. For affiliation strings (organization names), I'd probably try to reconcile them against a data source like Wikidata and then fetch the country name from there.
Also one that pulls lat/long and a map service that is easy to use for visualizing the results?
The lookup piece is called "geocoding" and is commercially valuable, so the APIs tend to have relatively low free tiers (if they have them at all). For visualization, Google Maps, Tableau Public, and Leaflet are possibilities, depending on what you're trying to do.
Tom
3 Likes
Thanks @ostephens @tfmorris I tested the Wikidata service and had only 8% matches on the data I was working with. Though, I chose research institute as the entity type to reconcile against and these can be businesses, university departments, etc as well. This open street map API seemed promising Overview - Nominatim 4.2.0 but also had difficulty making a number of matches. There is Google Maps but requires a credit card and like you say @tfmorris commercial services are available.
The Geonames web services may be worth looking at. There are a lot of options but this maybe one to try GeoNames Addresses GeoCoding and Reverse GeoCoding
@tfmorris also mentioned NER and I agree that’s worth a look. Dandelion has a free allowance (see the thread on the Dandelion API in this forum) and you can run NERs locally as well for completely free use (I’ve used Stanford NLP in the past Download - CoreNLP). I don’t think The NER extension works with the latest OpenRefine release but if you download the 3.4.1 release you could use that GitHub - stkenny/Refine-NER-Extension: Named-Entity Recognition extension for OpenRefine
1 Like
Some years ago I needed something similar for academic affiliations. I put together a small NER system for Wikidata, OpenTapioca, specifically trained on a dataset of affiliations manually annotated with Wikidata. It’s far from a production-grade service but because it’s trained on this exact domain, perhaps that can help?
It can be used with the development version of the NER extension, as a NIF service.
By the way, the development version of the NER extension is developed for OpenRefine 3.6, and I assume it should also work with 3.7 (and likely some earlier versions too):
2 Likes
@Chris_Erdmann are the affiliations in one column and addresses in another column? You could use a bit of Python to handle some of the looking up. I’ve used pycountry
for something like that before to have on hand your own database to search values against with python functions for the expression. This returns an array for example:
results = pycountry.countries.search_fuzzy(u'New')
1 Like
@thadguidry in this case affiliations/addresses are in the same column. The example from @antonin_d might be the best way to go.
I’ve been using mapbox’s geocoding for a project with address strings: Geocoding | API | Mapbox
You just use it through the api fetch url in openrefine. I’ve heard that google maps geocoding is actually a bit better but there’s no longer any free option, which is a bummer
1 Like
I’ve been using Google’s Geocoding API for several years. I’d say the accuracy is >99%.
Yes, it requires an API key (and a credit card), but the first US$ 200 every month are free, which translates into 40’000 coding requests. So I get a monthly 0$ bill, which seems a very fair deal.
2 Likes
| ahagstro
December 31 |
I’ve been using Google’s Geocoding API for several years. I’d say the accuracy is >99%.
Yes, it requires an API key (and a credit card), but the first US$ 200 every month are free, which translates into 40’000 coding requests. So I get a monthly 0$ bill, which seems a very fair deal.
The promotional $200 credit is nice (while it lasts), but the biggest problem with the Google service isn't the price, but the terms of service since you're only allowed to use the geocoding results on Google Maps and not save them (short term caching allowed for use on Google Maps only). If your use case fits these terms of service, it's a great option, but otherwise a non-starter. A lot of people want to save the geocode results for later use or use them to calculate distances or plot them on an OpenStreetMap map, etc., none of which is permitted.
Tom
Perhaps of interest to those following this thread is the video “Strategies for Matching Affiliation Strings to ROR IDs” from the 2023 ROR Annual Meeting, especially the presentation by the US Department of Energy (DOE) Office of Scientific and Technical Information (OSTI) describing their OpenRefine reconciliation service for organizations which is currently in testing and scheduled to be available for public use in Q1 2023 (“hopefully by March”).
More generally, the continued trend of increasing adoption of persistent identifiers is a good thing for users with reconciliation needs.
3 Likes