I am fiddling around with the OpenAI API in Python for named entity recognition and just had the idea to use it directly within OpenRefine. You just need to create an account at http://openai.com, generate an API key, and copy it to “YOUR-API-KEY” in the code below.
Now you can formulate a question on your data in the “prompt” and apply it to the values in the respective column. You even can define a format for your response in the query, here it is JSON(ish).
The code, by the way, for the most part was generated via ChatGPT.
import urllib2
import json
# Set the API endpoint and your API key
endpoint = "https://api.openai.com/v1/completions"
api_key = "YOUR-API-KEY"
# Set the request parameters
model = "text-davinci-003"
prompt = "extract all entities from the following string and provide them with their type as a JSON object:" + value
max_tokens = 200
temperature = 0
# Set the Authorization header
auth_header = "Bearer " + api_key
headers = { "Authorization": auth_header, 'Content-Type':'application/json', }
# Set the POST data
data = {
"model": model,
"prompt": prompt,
"max_tokens": max_tokens,
"temperature": temperature
}
# Create the request object
request = urllib2.Request(endpoint, headers=headers, data=json.dumps(data))
request.get_method = lambda: "POST"
# Send the request and get the response
response = urllib2.urlopen(request)
# Read the response and parse it as JSON
json_response = response.read()
data = json.loads(json_response)
# Print the response
return (data)["choices"][0]["text"].replace("\n","")
Here is a screenshot of it being at work: