Using local ChatGPT-like LLMs in OpenRefine for data wrangling

psm · February 17, 2025, 5:17pm

@Sunil_Natraj It is working as expected. Congratulations !!

psm · February 17, 2025, 5:23pm

I studied it a bit. Here what I'm using : GitHub - harvard-lil/warc-gpt: WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.

In fact I don't know how to include the retrieved results (document vectors) against a query (query vector). The following components are needed:

model: One of the models /api/models lists (required)
message: User prompt (required)
temperature: Defaults to 0.0
max_tokens: If provided, caps number of tokens that will be generated in response.
**search_results**: Array, output of /api/search.

In my request this 'search result' part is missing which is a product of another API call.

Can you guide me here?

Regards

Sunil_Natraj · February 18, 2025, 9:07am

Thanks for sharing details. WARC is a RAG implementation, there are 2 steps in it, first is to search for content based on user query & then generate an Inference using a LLM with the user prompt and search results. You can ry it out using CURL
1, Call the search endpoint - /api/search, pass the parameter message set the value as the user query e.g. "message": "what is Pragyan in the context of Chandrayaan-3?". Copy the response of the API.
2. Call the inference endpoint - api/complete, In the parameters message will have the user query same as what you passed in the search. For the param search_results pass set the value which is the response of the search API. Rest of the params are standard, you can ignore history param.

How do you plan to use the WARC RAG in OpenRefine flow?

Sunil_Natraj · February 20, 2025, 12:01pm

@antonin_d @Martin The extension is ready for wider release. Shall i create a post for this and send it for review?

Martin · February 20, 2025, 7:28pm

Thank you, @Sunil_Natraj. This is a great addition to the ecosystem! Yes, please go ahead and create a PR to add it to the extension page.

Martin · February 25, 2025, 8:05pm

Thanks everyone who help make this extension happen!

If you find bug or want to suggest new feature you can create issue directly in the new repository:

archilecteur · February 25, 2025, 9:02pm

+1

Envoyé de mon iPhone

Sunil_Natraj · March 10, 2025, 12:45pm

Quick preview of support for column update in the AI extraction extension. Feedback much appreciated.
Demo video

psm · March 10, 2025, 4:36pm

Is it available in version 0.1.1? I don't see update existing option rather it's showing New Column name. I downloaded the plugin from here - Releases · sunilnatraj/llm-extension · GitHub.

Best

Partha

Sunil_Natraj · March 11, 2025, 4:22am

Hi, The feature is still under development. Will notify when released.

Sunil_Natraj · March 17, 2025, 1:43pm

Prompt history and reuse flow is nearing completion. Here is a quick demo of the same, do share any feedback.
Demo video

Sunil_Natraj · March 17, 2025, 1:45pm

I am planning to make a release with the 2 additions this week. Do share any feedback on these

Prompt History & reuse
Support for Column update

archilecteur · March 17, 2025, 6:05pm

Great idea @Sunil_Natraj.

I haven't had the time to follow your work over the last few weeks and put the published extension to the test, but I can't wait to do so!

Sunil_Natraj · March 18, 2025, 12:15am

Thank you @archilecteur

psm · March 18, 2025, 12:58pm

@Sunil_Natraj Presently this plughin cannot store prompts (some prompts require lots of preview generation to get results in the desired format) and there is no trace of the prompt for later operations. The only option is to store prompts locally in a separate file.

Will it be possible in future to store a few prompts (like staring in History) in future release? It will help many of us really.

Thanks and regards

-partha

Martin · March 18, 2025, 1:13pm

@psm this is something we are discussing with Sunil. You can join the conversation here:

github.com/sunilnatraj/llm-extension

Access previously used prompt.

opened 07:55PM - 25 Feb 25 UTC

magdmartin

When accessing the `Extract using AI` modal, I want to see my previous prompt an…d reuse it in a few clicks. This may be done by enabling a history tab with a `reuse` and `star` button similar to the GREL modal. When using the feature, I am even wondering why we are not following the design on the GREL modal (see, for example, Add a column by fetching URL on how we can add more options). _Originally posted by @magdmartin in https://github.com/sunilnatraj/llm-extension/issues/2#issuecomment-2674692211_

psm · March 18, 2025, 1:22pm

@Martin Thanks, Joining asap.

Regards

Sunil_Natraj · March 18, 2025, 2:21pm

@Martin @psm Here is the video of the prompt history - reuse flow. Demo video

Sunil_Natraj · March 19, 2025, 4:23pm

AI Extension V 0.1.2 is available for preview. Please try it out & share feedback.

AI Extension V0.1.2

belfra · March 19, 2025, 5:23pm

Open refine couldnt start.
B.

Topic		Replies	Views
👋 Introductions thread! Community Feedback	123	2721	March 21, 2025
Using the OpenAI API to apply natural language queries to cells/data Support and Helpdesk hints-and-tips	5	731	February 4, 2023
Using LLMs in OpenRefine for data wrangling with Hugging Face inference API Support and Helpdesk hints-and-tips	2	94	December 4, 2024
Results of two user surveys for Wikimedia Commons users of OpenRefine Community announcements wikimedia-commons	16	748	January 13, 2024
Script to interact with open source LLM Support and Helpdesk llm	5	191	May 28, 2024

Using local ChatGPT-like LLMs in OpenRefine for data wrangling

Related topics