Using the OpenAI API to apply natural language queries to cells/data

I am fiddling around with the OpenAI API in Python for named entity recognition and just had the idea to use it directly within OpenRefine. You just need to create an account at http://openai.com, generate an API key, and copy it to “YOUR-API-KEY” in the code below.
Now you can formulate a question on your data in the “prompt” and apply it to the values in the respective column. You even can define a format for your response in the query, here it is JSON(ish).
The code, by the way, for the most part was generated via ChatGPT.

import urllib2
import json

# Set the API endpoint and your API key
endpoint = "https://api.openai.com/v1/completions"
api_key = "YOUR-API-KEY"

# Set the request parameters
model = "text-davinci-003"
prompt = "extract all entities from the following string and provide them with their type as a JSON object:" + value
max_tokens = 200
temperature = 0

# Set the Authorization header
auth_header = "Bearer " + api_key
headers = { "Authorization": auth_header, 'Content-Type':'application/json', }

# Set the POST data
data = {
    "model": model,
    "prompt": prompt,
    "max_tokens": max_tokens,
    "temperature": temperature
}

# Create the request object
request = urllib2.Request(endpoint, headers=headers, data=json.dumps(data))
request.get_method = lambda: "POST"

# Send the request and get the response
response = urllib2.urlopen(request)

# Read the response and parse it as JSON
json_response = response.read()
data = json.loads(json_response)

# Print the response
return (data)["choices"][0]["text"].replace("\n","")

Here is a screenshot of it being at work:

3 Likes

This is very interesting thanks for sharing @Michael_Markert
I moved the thread to the support hints and tricks section. The Training section is for those teaching OpenRefine or building curriculum.

Thanks for moving the topic to the right place!

Thanks for this post.

Today I attempted to run this code in OpenRefine, and it is working nice for different prompts. However, if values are more than 5 (I mean selection of more than five records from a list of concepts) is giving me “too many requests” error.

Is there any way to execute the script for one value at a time?

Regards

Hello,

try the code with

import time

at the top and

time.sleep(1)

at the bottom which adds a delay of 1 second between each request. Takes some time but I just got the response for 100 items with the demo code.
Best
Michael

Thanks. It is rocking now.