Using LLMs in OpenRefine for data wrangling with Hugging Face inference API

archilecteur · December 3, 2024, 3:26pm

Dear all,

(This post follows a previous post and subsequent exchanges here.)

I finally found the time to adapt @Michael_Markert’s code to call a language model deployed on Hugging Face remotely. It wasn’t rocket science, in the end, the problem I was having was with the URL provided with the Python code on the model card. I had to use the one for cURL instead. (I did not relied on the framework I was refering too in my last message. This could still be useful and an interesting solution to explore, and would offer more control and options, but it’s also a more complicated route.)

Note that I have a pro account with Hugging Face. This may not be necessary. I did noticed that making requests without the user access token results in an error message for “too many requests”.

I’ve put it here, for the benefit of other OpenRefine users who are curious to follow this route. Regardless of the characteristics of the local processor, this gives access to NLP functions such as classification, summarization, reformulation, translation, or simply correction, extraction of named entities or features, and so on. All this using open models.

The only drawback is that the “json_object” parameter is not recognized, and I can’t explain why.

Which leads me to add a complement to the answer I was giving to @TadGuidry: A sensible integration of language models with OpenRefine should offer a small dashboard similar to the one offered by LM Studio for manipulating model parameters, including the “json_object” parameter, by offering the possibility of providing a schema. Alternatively, one could use a less “machine-like”, more intuitive interface, setting the number of columns to be added, their keys, and then “translate” the query behind the hood. This would be like mixing templating, create new colomn, split column and split row in one operation.

thadguidry · December 4, 2024, 3:30am

@archilecteur Feel free to write up a good feature enhancement request in our GitHub Issues for what you envision an extension might look like for your needs. Then others, or perhaps a volunteer contributor, or an OutReachy intern next year might pick it up and work on it...you never know!
Because developers use GitHub and don't really see this enhancement here in our Support and Helpdesk category very well.

archilecteur · December 4, 2024, 1:34pm

@tadguidry, not sure I’m the best person to document such a feature enhancement request, but I might give it a try!

Topic		Replies	Views
Using local ChatGPT-like LLMs in OpenRefine for data wrangling Support and Helpdesk hints-and-tips	137	1398	May 23, 2025
Inviting Benjamin to the GitHub organization Development & Design election	1	38	April 8, 2025
Script to interact with open source LLM Support and Helpdesk llm	5	188	May 28, 2024
Using the OpenAI API to apply natural language queries to cells/data Support and Helpdesk hints-and-tips	5	728	February 4, 2023
Ruby as an expression language Development & Design	1	65	May 26, 2024

Using LLMs in OpenRefine for data wrangling with Hugging Face inference API

Related topics