Dear all,
(This post follows a previous post and subsequent exchanges here.)
I finally found the time to adapt @Michael_Markert’s code to call a language model deployed on Hugging Face remotely. It wasn’t rocket science, in the end, the problem I was having was with the URL provided with the Python code on the model card. I had to use the one for cURL instead. (I did not relied on the framework I was refering too in my last message. This could still be useful and an interesting solution to explore, and would offer more control and options, but it’s also a more complicated route.)
Note that I have a pro account with Hugging Face. This may not be necessary. I did noticed that making requests without the user access token results in an error message for “too many requests”.
I’ve put it here, for the benefit of other OpenRefine users who are curious to follow this route. Regardless of the characteristics of the local processor, this gives access to NLP functions such as classification, summarization, reformulation, translation, or simply correction, extraction of named entities or features, and so on. All this using open models.
The only drawback is that the “json_object” parameter is not recognized, and I can’t explain why.
Which leads me to add a complement to the answer I was giving to @TadGuidry: A sensible integration of language models with OpenRefine should offer a small dashboard similar to the one offered by LM Studio for manipulating model parameters, including the “json_object” parameter, by offering the possibility of providing a schema. Alternatively, one could use a less “machine-like”, more intuitive interface, setting the number of columns to be added, their keys, and then “translate” the query behind the hood. This would be like mixing templating, create new colomn, split column and split row in one operation.