Geolocation (country) from affiliation or address string

Can anyone recommend a service that can pull country names from address or affiliation strings? Either an API where you can feed it the address/affiliation string via Fetch URL and parse the response JSON or via reconciliation service?

Also one that pulls lat/long and a map service that is easy to use for visualizing the results?

I haven’t tried it recently but there’s a wikipage on using Google Maps to get lat/lon for a street address Geocoding · OpenRefine/OpenRefine Wiki · GitHub

1 Like

Hi Chris,

Can anyone recommend a service that can pull country names from address or affiliation strings? Either an API where you can feed it the address/affiliation string via Fetch URL and parse the response JSON or via reconciliation service?

If the address actually contains the country, you might try Named Entity Recognition. For affiliation strings (organization names), I'd probably try to reconcile them against a data source like Wikidata and then fetch the country name from there.

Also one that pulls lat/long and a map service that is easy to use for visualizing the results?

The lookup piece is called "geocoding" and is commercially valuable, so the APIs tend to have relatively low free tiers (if they have them at all). For visualization, Google Maps, Tableau Public, and Leaflet are possibilities, depending on what you're trying to do.

Tom

4 Likes

Thanks @ostephens @tfmorris :smiley: I tested the Wikidata service and had only 8% matches on the data I was working with. Though, I chose research institute as the entity type to reconcile against and these can be businesses, university departments, etc as well. This open street map API seemed promising Overview - Nominatim 4.2.0 but also had difficulty making a number of matches. There is Google Maps but requires a credit card and like you say @tfmorris commercial services are available.

The Geonames web services may be worth looking at. There are a lot of options but this maybe one to try GeoNames Addresses GeoCoding and Reverse GeoCoding

@tfmorris also mentioned NER and I agree that’s worth a look. Dandelion has a free allowance (see the thread on the Dandelion API in this forum) and you can run NERs locally as well for completely free use (I’ve used Stanford NLP in the past Download - CoreNLP). I don’t think The NER extension works with the latest OpenRefine release but if you download the 3.4.1 release you could use that GitHub - stkenny/Refine-NER-Extension: Named-Entity Recognition extension for OpenRefine

1 Like

Some years ago I needed something similar for academic affiliations. I put together a small NER system for Wikidata, OpenTapioca, specifically trained on a dataset of affiliations manually annotated with Wikidata. It’s far from a production-grade service but because it’s trained on this exact domain, perhaps that can help?

It can be used with the development version of the NER extension, as a NIF service.
By the way, the development version of the NER extension is developed for OpenRefine 3.6, and I assume it should also work with 3.7 (and likely some earlier versions too):

2 Likes

@Chris_Erdmann are the affiliations in one column and addresses in another column? You could use a bit of Python to handle some of the looking up. I’ve used pycountry for something like that before to have on hand your own database to search values against with python functions for the expression. This returns an array for example:
results = pycountry.countries.search_fuzzy(u'New')

1 Like

@thadguidry in this case affiliations/addresses are in the same column. The example from @antonin_d might be the best way to go.

I’ve been using mapbox’s geocoding for a project with address strings: Geocoding | API | Mapbox

You just use it through the api fetch url in openrefine. I’ve heard that google maps geocoding is actually a bit better but there’s no longer any free option, which is a bummer :frowning:

1 Like

I’ve been using Google’s Geocoding API for several years. I’d say the accuracy is >99%.
Yes, it requires an API key (and a credit card), but the first US$ 200 every month are free, which translates into 40’000 coding requests. So I get a monthly 0$ bill, which seems a very fair deal.

3 Likes

| ahagstro
December 31 |

  • | - |

I’ve been using Google’s Geocoding API for several years. I’d say the accuracy is >99%.
Yes, it requires an API key (and a credit card), but the first US$ 200 every month are free, which translates into 40’000 coding requests. So I get a monthly 0$ bill, which seems a very fair deal.

The promotional $200 credit is nice (while it lasts), but the biggest problem with the Google service isn't the price, but the terms of service since you're only allowed to use the geocoding results on Google Maps and not save them (short term caching allowed for use on Google Maps only). If your use case fits these terms of service, it's a great option, but otherwise a non-starter. A lot of people want to save the geocode results for later use or use them to calculate distances or plot them on an OpenStreetMap map, etc., none of which is permitted.

Tom

Perhaps of interest to those following this thread is the video “Strategies for Matching Affiliation Strings to ROR IDs” from the 2023 ROR Annual Meeting, especially the presentation by the US Department of Energy (DOE) Office of Scientific and Technical Information (OSTI) describing their OpenRefine reconciliation service for organizations which is currently in testing and scheduled to be available for public use in Q1 2023 (“hopefully by March”).

More generally, the continued trend of increasing adoption of persistent identifiers is a good thing for users with reconciliation needs.

3 Likes

Many people continue to visit this thread over time, making it one of the most viewed topics on the forum. So for those still searching for the right option, I found a set of recipes from HERE and Nominatim developed by Peter Aldhous. I haven't tried them myself, but I'm pleased to see something that leverages OpenStreetMap data.

One could use the free OpenStreetMap Nominatim service (but be aware of the usage policy!).

For affiliations/addresses like
British Museum, Great Russell St, London
in a column you could add a column by fetching urls with a throttle of 1000 ms (because of the policy) with
"https://nominatim.openstreetmap.org/search.php?q=" + escape(value, "url") + "&format=jsonv2&limit=1&addressdetails=1&email=YOUR_MAIL_ADDRESS"
and than add another column by parsing the JSON response for the country value with
value.parseJson()[0]["address"]["country"]

Last year I wrote a tutorial that discusses several approaches for loading geo coordinates using OpenRefine :de: using Wikipedia, GeoNames, OpenStreetMap/Nominatim, OpenStreetMap/Photon, GND, Wikidata and Getty TGN.

But the language is German and it also focuses on several issues regarding municipal structures in Germany.

2 Likes

There is also a policy to add your email to the request. Otherwise you might just get blocked if you are doing large numbers of requests (no definition in the docs what "large numbers" means).

2 Likes

Thanks for the tipp! I added the mail address to my demo query.

1 Like