Apply operation history from json file not working

Cher · June 14, 2024, 4:10pm

Hello, I am a very new OpenRefine user, thanks for your patience.

I'm trying to clean multiple sets of data. I'm trying to apply my saved (long and time consuming) cleaning to another dataset. This worked fine yesterday. But today, when I try to apply the saved history, nothing happens. Like, nothing at all. I saved a copy of the json file as a txt file and tried importing that, and OpenRefine flashed the "working" window for a split second, and then didn't do anything. I tried different saved histories and those didn't work either.

What is going on?? I am about to cry. How do I make it work again?

Thanks very much.

ostephens · June 14, 2024, 5:57pm

Are you able to share the file here?

ostephens · June 15, 2024, 10:38pm

It's also worth saying that in order for the history to apply the project you are applying to has to have identical column names to your original project - the transformations are applied to named columns, so if the column named in the transformation doesn't exist in your project, the transformation won't run.

The more information you can share about the data your are working with, the structure of the projects and the transformations you are applying, as well as which version of OpenRefine you are using, the more likely it is someone might be able to figure out what's going on

timothy-mendenhall · October 7, 2025, 1:36pm

I wanted to reopen this discussion, as I have noticed that this difference in behavior appears to have begun with versions 3.9. I have long used earlier versions at work, and applying a saved JSON operation to a project, even if it did not have every column in the JSON operation, seems to work in OpenRefine 3.8.2. However, I just replaced my laptop that I use when working from home, and installed OpenRefine 3.9.5. I am now experiencing this problem as well. This is a bit frustrating, as my team built workflows around reusing JSON histories in OpenRefine for somewhat messy datasets that we get from multiple providers. These often have inconsistencies in the column names. So I may revert back to using older versions of OpenRefine while we consider what we should do moving forward so that our workflows are not disrupted. It would be helpful to have a setting in place that allowed you to override this behavior – that is, if applying an operations history via JSON, if OpenRefine encounters an invalid instruction, it could just skip it and move onto the next instruction.

jcorrice · October 7, 2025, 1:55pm

I noticed the same thing–one of my colleagues is using a previous version of OpenRefine because the column names sometimes change in the JSON data we’re cleaning, and our operations stopped working in 3.9+ because of it. I would second an override feature for sure–we don’t really have an option at this point other than to use an older version or redo the operations every time the json data changes.

Rory · October 14, 2025, 4:00pm

Thanks for the report! Are you able to share an example recipe that works in 3.8 but fails in 3.9? There’s been a lot of work done to make this part of the application more robust, but I wonder if in doing so there was a breaking change. Having a concrete example to highlight how and where this fails would be a huge help.

ostephens · October 15, 2025, 8:51am

@Rory the biggest change I’ve noticed is that now (3.9.x) I can’t apply a set of operations unless they are all valid. It used to be the case that invalid operations were just ignored and the rest of the set was applied, but now the presence of an invalid transform leads the whole set to fail. Specifically the issue I’ve noticed is that you can no longer include transformations in the history that operate on columns that are not present in the project.

I understand why this might be a good thing, but it’s also sometimes a pain. I have to be a lot more careful that I have everything just right before I apply a transformation

To give a specific example, if I have a set of data transforms that are useful over a set of files, but those files sometimes contain a column and sometimes don’t, then I have to make sure that when I apply the transformations to those lacking the column I first remove all the operations that would act on that column. It used to be the case that I could apply all the transformations to all the files and it wouldn’t matter because the operations on absent columns would just be skipped.

I understand that I’m being saved from potentially doing something stupid here to some extent, but its made certain workflows more painful.

It will be interesting to hear if there are other scenarios where the stricter parsing of the operations JSON is causing an issue but for this particular case potentially a “Skip operations on columns that are not present” option (or similar) would allow me to make that decision in cases where I know its what I want to do

jcorrice · October 15, 2025, 3:11pm

@Rory I have the exact same issue as @ostephens . I have an order of operations I want to perform on a json file to transform it into a working csv, but because the same columns aren’t always present, it has failed since the 3.9 update (error message: “Invalid JSON format: java.lang.Exception: No column named ‘xyz’”). My colleague that regularly uses this OpenRefine script to create a working csv from the json file is using an older version of OpenRefine because we haven’t come up with an alternative solution. I can share the OR operations script and a sample json file we perform them on if that’s helpful? I can’t upload a .json or a .txt file here, so if you’d like I can email you the files.

Martin · October 16, 2025, 1:30pm

@ostephens @jcorrice thank you for the detailled report. I think this comes from those two PRs and related issues

github.com/OpenRefine/OpenRefine

Improve error handling in Apply operations dialog

4.0 ← wetneb:5539-operation-command-refactor

opened 01:30PM - 09 Jan 23 UTC

wetneb

+256 -22

This is a first step towards #5539 and #5540 (but does not close either of them)…. When an operation cannot be executed because its JSON representation is invalid or it does not apply to the given project, the corresponding error is reported to the user and the following operations are not executed. Before this PR, errors were reported in the server logs and all operations were attempted even if a previous one failed (see #5540). This is not meant to be a final UX, I plan to do many other improvements of this dialog. This first behaviour is just to make sure we can reuse the logic behind the dialog to apply operations everywhere else in the tool, for #5539.

github.com/OpenRefine/OpenRefine

Map recipe columns to project columns in 'Apply' dialog

master ← wetneb:column_rename_in_ApplyOperationsCommand

opened 11:44AM - 24 Feb 25 UTC

wetneb

+474 -50

This adds support for mapping the columns mentioned in a recipe to the columns p…resent in the project, when applying a recipe with the "Apply" dialog. The first commit introduces the functionality in the backend. It relies on the relevant operations implementing the newly introduced `AbstractOperation::renameColumns()` method introduced in #7132 (see #7153, #7154, #7155, #7156 and #7157). The second one exposes the functionality in the frontend, adding an additional step to give the opportunity to the user to map columns. The last one adapts the Cypress test to take this new workflow into account. <s>It currently fails because of the dependency on #7153 and #7156.</s> ### Screenshot ![image](https://github.com/user-attachments/assets/0aa84204-7981-42d7-948d-b98d76bb0bc7)

ostephens · October 16, 2025, 1:47pm

Thanks @Martin . Those PRs/issues don’t seem to be linked to milestones and I don’t see them mentioned in any release notes?

The column mapping, when it comes, is a nice step, but won’t resolve the issue being reported here as far as I can see?

timothy-mendenhall · October 16, 2025, 3:32pm

Hi there,

I completely agree with Owen and Julia on this. In my case, my organization has complex application profiles for some of our systems, in many cases with hundreds of possible metadata fields. If all metadata fields were used (which never occurs in a single dataset), theoretically we could get a .csv from a submitting repository with a completely unmanageable number of columns–I would make an analogy to requiring a submitter to include all possible Wikidata properties or every field / subfield in the MARC bibliographic standard in their spreadsheet, even if they are just using a limited profile of properties / metadata fields within these standards. It would be madness to require partners across my institution to submit .csv’s containing hundreds or thousands of columns if they only need to use 20 of them to support their project. For about a decade, I had happily maintained a set of transformation operations that work with this reality in OpenRefine, but with the strict validation requirements implemented in OpenRefine 3.9+, I can no longer use these json operation histories to process metadata, and would be forced to create a new set of json operations for each project under the validation routine implemented with OpenRefine 3.9+. A simple solution would be allow an option to ignore invalid operations, as Owen suggests. I don’t think I can attach a .json file to this message, so I have bundled a sample json operation history and a sample .csv with relevant data in to a zip file, attached here.

OpenRefine-ApplyOpHist-BugReport-2025-10-16.zip (39.5 KB)

The json operation works fine in OpenRefine 3.8, but runs into the validation problem in 3.9 due to missing columns in the csv data.

tfmorris · October 16, 2025, 5:51pm

@Rory the biggest change I’ve noticed is that now (3.9.x) I can’t apply a set of operations unless they are all valid.

I wasn't involved in these changes, but that sounds like exactly what one would want to prevent silent data corruption.

It used to be the case that invalid operations were just ignored and the rest of the set was applied, but now the presence of an invalid transform leads the whole set to fail.

The problem with this scenario is that each operation depends on its predecessors, so once one fails, the project is no longer in the assumed state.

The ability to extract and reapply an Undo History was a quick hack that was thrown together to allow a set of operations to be reapplied to the same file or a file of identical shape. Once you deviate from this, you are in uncharted, and more importantly, untested territory. Unfortunately, missing, along with all the other error checking, are any checks to make sure that these constraints are being followed. The constraints exist only as verbal warnings passed down through the community.

I haven't investigated this yet, but my suspicion is that, if the new error checking is causing you pain, you've probably been unknowingly corrupting your data.

We could probably introduce an option to treat errors as warnings, but removing the error checking altogether and going back to the previous situation seems like a bad idea to me.

As an aside, I'm surprised that automation always rates so low on Martin's surveys if it's being so heavily used.

Tom

tfmorris · October 16, 2025, 5:53pm

[...] I have an order of operations I want to perform on a json file to transform it into a working csv, but because the same columns aren’t always present, it has failed since the 3.9 update (error message: “Invalid JSON format: java.lang.Exception: No column named ‘xyz’”).

My understanding is that the way that this is intended to work is that you should be presented with a column mapping dialog intended to allow you to map the original column names to the column names in the new file.

Are you not being presented with this dialog?

Tom

timothy-mendenhall · October 16, 2025, 6:07pm

Snip: you've probably been unknowingly corrupting your data.

This is definitely not the case for me. We run extensive QC after transformations are run, and the systems that we are loading the data into also run validation.

ostephens · October 16, 2025, 6:20pm

No I have never seen this dialogue. I also don’t think it would be sufficient for the use cases described where the requirement is not “apply these to a differently named column” but “skip these operations because that column doesn’t exist”

ostephens · October 16, 2025, 6:32pm

I understand this. I also (and I think I was clear about this in my reply) understand why the changes were introduced and I’m not asking for this checking to be removed.

However given this is how OpenRefine has operated for as long as I’ve used it, and I’ve found it occasionally useful (as others have clearly based on this thread), I think it’s reasonable to ask for someway to recreate the previous behaviour - whether that’s by an ‘override’ for particular errors in the Apply history, or by some other mechanism.

timothy-mendenhall · October 16, 2025, 6:36pm

I completely support Owen here. I would also push back against the idea that this feature of OpenRefine should be performing some kind of shape or schema validation (against what, exactly?). It is perfectly standard practice when creating a metadata schema to designate some elements as optional / not required. In these instances, datasets that do not contain these elements would still be completely valid. However, IF optional elements (or Columns) are present, you would like relevant operation to be performed.

jcorrice · October 16, 2025, 6:51pm

Nope. I’ve never seen this dialog. But the issue isn’t that the column has a new name–it’s that it doesn’t exist. The json file being generated only includes that metadata field if it exists, and there’s no field if it doesn’t.

Certainly not. 3.8 still produces a file that is correctly performing all the operations on the columns that do exist, and ignoring the ones that don’t. We use these generated csv files to manually check our authority data, and most certainly would have discovered corrupted data in the past 3+ years we’ve been doing this if that were the case. A human is reviewing these files on a regular basis.

tfmorris · October 16, 2025, 7:06pm

| ostephens
October 16 |

| - |

tfmorris:

Are you not being presented with this dialog?

No I have never seen this dialogue. I also don’t think it would be sufficient for the use cases described where the requirement is not “apply these to a differently named column” but “skip these operations because that column doesn’t exist”

It looks like the functionality that I was thinking of hasn't been released yet.

I'll see if I can track down the root cause of the behavior change. The first one mentioned by Martin was not merged to master.

Tom

Rory · October 16, 2025, 10:43pm

This might be the change we’re looking for: apply-operations: Report invalid JSON structure from the backend to the frontend by wetneb · Pull Request #6907 · OpenRefine/OpenRefine · GitHub

@timothy-mendenhall , thank you very much for that sample project and history (coincidentally, I think it also uncovered a UI bug with the column mapping dialog @tfmorris mentioned).

I think it’s worth investigating what a resolution to this looks like. The ideal scenario is likely complicated and would take time to properly design, but I think a reasonable first step would be to provide users with a warning that not all operations are valid. Said warning could have “cancel” and “continue” options so users have the option of getting something close to the old behavior while still offering some protection. If that sounds like an acceptable way forward, I can write up an issue to track this.

Topic		Replies	Views
Applying invalid JSON causes unrecoverable error Support and Helpdesk	2	23	September 21, 2025
Invalid json error for a previously used json file Support and Helpdesk	6	45	June 12, 2025
Create Project not listed in Undo / Redo history Support and Helpdesk	8	176	March 12, 2024
Cannot open this project, and not corrupted. But still no info, or data grid showing? Support and Helpdesk	1	154	January 8, 2024
OpenRefine access using python API Support and Helpdesk	1	464	February 16, 2023

Apply operation history from json file not working

Related topics