OpenRefine Erasing Data, not allowing me to go forward or backward in history

Hi,

I hope this is a problem that has a solution, because otherwise I will sadly have lost a lot of work. I am working on a project and went back into its history to see an earlier moment in the project. When I tried to return to the latest step, it told me I couldn’t due to a java.lang.IndexOutOfBoundsException. I had only gone back a few steps, so I decided to deal with it and redo the previous few actions. However, multiple columns on the project had been duplicated, resulting in an abundance of columns with the same name (which shouldn’t be possible) and which were anachronistic to the stage of the project I was on. I deleted them and it resulted in my records being split up into multiple records, and some of my data being erased. I tried to go back even further in the project’s history to before qny of this happened, but now it won’t let me go back, with an error saying java.lang.NullPointerException. I just want to return to a stage in the project before any of this confusion occured. Is there a way to do so? I have dealt with these error messages showing up when moving through history before, but the results have never been as problematic for me as now.

Thanks,

Ella

The error message has changed, and is giving a slightly better picture of the problem now. It says java.lang.IndexOutOfBoundsException: Needed to remove row 7157, but only 7084 rows were available.

Hi Ella. Sorry to hear that you're having trouble. First, before you do anything else, make a backup of your project (export it) and, ideally, your entire OpenRefine workspace. In the future, do this at the first hint that anything might be wrong. Data security is our absolute top priority, but sometimes bugs still creep in.

Is this different from the problem that you reported back in July with a similar error message? It looks like Rory attempted to follow up with you on that iteration.

Some additional information would be useful in helping to understand the problem better, especially the version of OpenRefine that you are using and the complete text of the error messages (including all lines for multiline messages).

If the data is not sensitive and you'd like me to take a look at recovering the project, feel free to send it to me. Please send the earliest / least damaged version of the project that you have.

Best,
Tom

Hello! Thanks so much for the response. Yes, this error is very similar, if not identical, to the issue I reported in July- sorry for the duplication of posts, I had forgotten posting that first one and never noticed Rory’s response. This is occurring on version 3.9.3 of OpenRefine, and the full error is

Exception caught (30ms)
java.lang.IndexOutOfBoundsException: Needed to remove row 7157, but only 7084 rows were available.
at org.freeyourmetadata.ner.operations.NERChange.deleteRows(NERChange.java:316)
at org.freeyourmetadata.ner.operations.NERChange.revert(NERChange.java:83)
at com.google.refine.history.HistoryEntry.revert(HistoryEntry.java:175)
at com.google.refine.history.History.undo(History.java:243)
at com.google.refine.history.History.undoRedo(History.java:183)
at com.google.refine.history.HistoryProcess.performImmediate(HistoryProcess.java:86)
at com.google.refine.process.ProcessManager.queueProcess(ProcessManager.java:96)
at com.google.refine.commands.history.UndoRedoCommand.doPost(UndoRedoCommand.java:73)
at com.google.refine.RefineServlet.service(RefineServlet.java:187)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1410)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:529)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1570)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:790)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1384)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1543)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1306)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at com.google.refine.ValidateHostHandler.handle(ValidateHostHandler.java:93)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.Server.handle(Server.java:563)
at org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)

I am using the NER Extension and I wonder if that is potentially part of the issue? The problem seems to be related to the number of rows in the project, and potentially the creation of new rows within a record that occurred when extracting Named Entities threw it off somehow? The data is not sensitive, so I would very much appreciate your looking at the project. I will send it to you and see if you can get anywhere. Thanks so much for all of your help.

Best,

Ella

Hello, all! I wanted to leave an update on this issue, as I believe I have experienced the same bug again on another project. I am not particularly concerned this time, as I hadn’t done much on the project before it was corrupted, but I thought it might be helpful to further diagnose what the issue is. In this instance, I had a project with 705 rows, and I used the “Create Column by Fetching Urls” functionality to create a new column that ended up having a great deal of data (for reference, each row had a cell containing a JSON response with about 500 lines). This massively slowed down OpenRefine- every time I tried doing anything, I had to wait around 20 seconds and deal with “Not Responding” messages until it resolved. My ultimate goal was to use the “Columnize by key and value columns” functionality to create new columns, one for each key in the JSON responses, so I used GREL to transform each JSON response and add a delimiter between each item. I then used the “Split multi-valued cells” functionality, which worked fine, although it took a long time. I then used the “Split into Several Columns” functionality in an attempt to separate the keys and values into two distinct columns. The operation began, and a waiting message appeared on the screen. At this point, I realized there was something else I wanted to do before splitting the columns, so I pressed ESC to halt the operation. Upon doing so, and reloading the page, I discovered that all of the content in the “Actor” column (the column containing the JSON responses) was gone, and each cell in the column contained only a null. I went to the Undo/Redo tab to attempt to restore an earlier stage, but I was unable to reverse prior to the step where I added the delimiter between each JSON item. When I went to that earlier stage, the Actor column was actually replaced by two columns named Actor 1 and Actor 2, as I would have expected from the column splitting functionality, but this was anachronistic, as I performed that operation at a later stage in the project’s history, and both columns were also empty. Trying to go back further than this stage results in the following error message:

 Exception caught (7385ms)
java.lang.NullPointerException
        at com.google.refine.model.changes.MassCellChange.revert(MassCellChange.java:118)
        at com.google.refine.history.HistoryEntry.revert(HistoryEntry.java:175)
        at com.google.refine.history.History.undo(History.java:243)
        at com.google.refine.history.History.undoRedo(History.java:183)
        at com.google.refine.history.HistoryProcess.performImmediate(HistoryProcess.java:86)
        at com.google.refine.process.ProcessManager.queueProcess(ProcessManager.java:96)
        at com.google.refine.commands.history.UndoRedoCommand.doPost(UndoRedoCommand.java:73)
        at com.google.refine.RefineServlet.service(RefineServlet.java:187)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
        at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1410)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:529)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1570)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:790)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1384)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1543)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1306)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at com.google.refine.ValidateHostHandler.handle(ValidateHostHandler.java:93)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.Server.handle(Server.java:563)
        at org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
        at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)

Trying to go forward again to the latest stage in the project now results in its own error message:

 Exception caught (4518ms)
java.lang.RuntimeException: Failed to load change file C:\Users\ert779\AppData\Roaming\OpenRefine\2556982847395.project\history\1768417562630.change.zip
        at com.google.refine.io.FileHistoryEntryManager.loadChange(FileHistoryEntryManager.java:85)
        at com.google.refine.history.HistoryEntry.apply(HistoryEntry.java:150)
        at com.google.refine.history.History.redo(History.java:259)
        at com.google.refine.history.History.undoRedo(History.java:190)
        at com.google.refine.history.HistoryProcess.performImmediate(HistoryProcess.java:86)
        at com.google.refine.process.ProcessManager.queueProcess(ProcessManager.java:96)
        at com.google.refine.commands.history.UndoRedoCommand.doPost(UndoRedoCommand.java:73)
        at com.google.refine.RefineServlet.service(RefineServlet.java:187)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
        at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1410)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:529)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1570)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:790)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1384)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1543)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1306)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at com.google.refine.ValidateHostHandler.handle(ValidateHostHandler.java:93)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.Server.handle(Server.java:563)
        at org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
        at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at com.google.refine.history.History.readOneChange(History.java:82)
        at com.google.refine.history.History.readOneChange(History.java:68)
        at com.google.refine.io.FileHistoryEntryManager.loadChange(FileHistoryEntryManager.java:99)
        at com.google.refine.io.FileHistoryEntryManager.loadChange(FileHistoryEntryManager.java:83)
        ... 42 more
Caused by: java.lang.OutOfMemoryError: Java heap space

All of the times that I have dealt with an issue like this seem to have some elements in common - they tend to occur when an operation is creating new columns and result in situations where columns are present or not present anachronistically, i.e. in stages of the project’s history where they ought not to be there/not there; and where I am unable to progress earlier and sometimes later from a stage in the history. As I mentioned, I am not particularly concerned with recovering this specific project, but as I keep running into this issue, I thought it was worth mentioning. Thanks so much for your help and for reading this long message!

Best,

Ella

Oh, and for reference, this is on version 3.9.5

Thanks for following up with so much detail! Have you tried running OpenRefine with more available memory? That seems like it should help with the second error.

As for the first error, the stack trace you shared does reference the NER extension so that might be related. Are you able to share more about that case? I'd like to take a look.

1 Like

Rory - I looked at the NER case before the holidays and can share the data and analysis. The NER extension uses a somewhat unconventional "composite" undo structure, which I suspect is related to the problem.

Ella - Thanks for the report. I'm not sure I'd read too much into the surface appearance that both cases involve column addition/removal. I suspect the cause of the second problem is that you simply ran out of memory (although we strive to make sure that OpenRefine never corrupts data in such cases). If you'd like to share the project, we can take a look to see if we can figure out what happened.

For folks in general, if OpenRefine becomes excessively sluggish, it's best to stop what you're doing, make a back of your project, and then increase the memory allocated to OpenRefine and restart it. Once things start to go downhill, they rarely, if ever recover.

Tom

1 Like

Rory and Tom, thanks so much for your responses. I will be sure to try increasing the memory allocation for OpenRefine. The latest project contains so sensitive information, so unfortunately, I am unable to share it, but I appreciate all of the help you both have been in regards to these issues, and I will be sure to keep you updated if I run into the same error again.

Best,

Ella

1 Like