Finding addresses from business names

Hello all!
I have a large database of business names which I know are all in one state. I'd like to find a way to get their addresses and, if possible, the county they're in new columns.
I hope that makes sense and thank you for all your help!

Many states (in the United States, if that's what you mean?) have a Data Act or Data Transparency Act which allows you to get or request business data.
An example: https://www.ilsos.gov/data/bus_serv_home.html

You might ask or write to the state that your working with and ask if you can get the bulk data download or where to find it online.

You'll then need to setup a reconcile server (if you want to use OpenRefine for this) which has the state data loaded and available for querying via the reconcile api.)

Alternatively, you might be able to use fuzzy search (similarity string functions using algorithms like Jaro-Winkler or my favorite for Latin languages - * Sørensen-Dice) to crosswalk between your databases' names and their names. You can do a bit of this in OpenRefine, there's tutorials out there, but you'll probably be better off with other tools that have direct support for joins or lookups using fuzzy search algorithms like qsv's - apply or Apache Hop's - Fuzzy Match pipeline transform

2 Likes

Instead of setting up your own reconciliation service for that, you could have a look at OpenCorporates. They offer a reconciliation service which might do the job:
https://api.opencorporates.com/documentation/Open-Refine-Reconciliation-API

2 Likes

I second the recommendation for OpenCorporates. It won't give you county names, but they should be easy to look up based on the city/town which is returned.

They have an interactive search box on their home page that you can use to test coverage, but I think they aggregate from the official Secretary of State incorporation databases, so it should be pretty complete.

Tom

1 Like

Over the last 3 years I've had difficulty in returning Reconciling results from OpenCorporates because they switched to using HAProxy Edge. So depending on where you live in the world, or where your reconcile client lives, you might get 403 Forbidden as a result.

Beyond that, when it was working better, it still wasn't ideal and results would just truncate, like some invisible limit applied and no indication of an error.
I sent them an email, but never got a response.
I've just opened a JIRA ticket with them to see if the service is still operating correctly (I have to use a VPN to reach it, but that might be a separate issue).

Can anyone confirm that they get a response with the endpoint for OpenCorporates listed on our Reconciliation testbench ?

2 Likes

Ah, I think it's changed to https://api.opencorporates.com/reconcile and indeed it now requires an API key (where in their OpenRefine Reconciliation API docs, it says it does not, but might change in the future.)

And indeed they did change that policy,

An API token is required to access the OpenCorporates Reconciliation API.

and so they now require an account to be setup and then an API token (key) will be given depending on your application but given only X requests per day unless you pay for higher usage with a paid-for API account. Makes sense.

2 Likes

Looks like one can apply for a free access API key after providing documentation for a "Public Benefit project" here.

OpenCorporates exists to make company data more useful, usable and understandable. As part of our public benefit mission, academics, NGOs, registered journalists, media organisations and registered nonprofits can apply for free access to our bulk data under an open public license. To apply please answer the questions below so we consider your request.

2 Likes

Thank you so much for these replies!
I'm a local journalist so I think I'm going to go the Open Corporate route.
I'll also look into Fuzzy search algorithms but I know nothing about how to use Github at the moment. I really want to learn how to use it though!

1 Like