Hello, all! I wanted to leave an update on this issue, as I believe I have experienced the same bug again on another project. I am not particularly concerned this time, as I hadn’t done much on the project before it was corrupted, but I thought it might be helpful to further diagnose what the issue is. In this instance, I had a project with 705 rows, and I used the “Create Column by Fetching Urls” functionality to create a new column that ended up having a great deal of data (for reference, each row had a cell containing a JSON response with about 500 lines). This massively slowed down OpenRefine- every time I tried doing anything, I had to wait around 20 seconds and deal with “Not Responding” messages until it resolved. My ultimate goal was to use the “Columnize by key and value columns” functionality to create new columns, one for each key in the JSON responses, so I used GREL to transform each JSON response and add a delimiter between each item. I then used the “Split multi-valued cells” functionality, which worked fine, although it took a long time. I then used the “Split into Several Columns” functionality in an attempt to separate the keys and values into two distinct columns. The operation began, and a waiting message appeared on the screen. At this point, I realized there was something else I wanted to do before splitting the columns, so I pressed ESC to halt the operation. Upon doing so, and reloading the page, I discovered that all of the content in the “Actor” column (the column containing the JSON responses) was gone, and each cell in the column contained only a null. I went to the Undo/Redo tab to attempt to restore an earlier stage, but I was unable to reverse prior to the step where I added the delimiter between each JSON item. When I went to that earlier stage, the Actor column was actually replaced by two columns named Actor 1 and Actor 2, as I would have expected from the column splitting functionality, but this was anachronistic, as I performed that operation at a later stage in the project’s history, and both columns were also empty. Trying to go back further than this stage results in the following error message:
Exception caught (7385ms)
java.lang.NullPointerException
at com.google.refine.model.changes.MassCellChange.revert(MassCellChange.java:118)
at com.google.refine.history.HistoryEntry.revert(HistoryEntry.java:175)
at com.google.refine.history.History.undo(History.java:243)
at com.google.refine.history.History.undoRedo(History.java:183)
at com.google.refine.history.HistoryProcess.performImmediate(HistoryProcess.java:86)
at com.google.refine.process.ProcessManager.queueProcess(ProcessManager.java:96)
at com.google.refine.commands.history.UndoRedoCommand.doPost(UndoRedoCommand.java:73)
at com.google.refine.RefineServlet.service(RefineServlet.java:187)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1410)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:529)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1570)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:790)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1384)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1543)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1306)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at com.google.refine.ValidateHostHandler.handle(ValidateHostHandler.java:93)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.Server.handle(Server.java:563)
at org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Trying to go forward again to the latest stage in the project now results in its own error message:
Exception caught (4518ms)
java.lang.RuntimeException: Failed to load change file C:\Users\ert779\AppData\Roaming\OpenRefine\2556982847395.project\history\1768417562630.change.zip
at com.google.refine.io.FileHistoryEntryManager.loadChange(FileHistoryEntryManager.java:85)
at com.google.refine.history.HistoryEntry.apply(HistoryEntry.java:150)
at com.google.refine.history.History.redo(History.java:259)
at com.google.refine.history.History.undoRedo(History.java:190)
at com.google.refine.history.HistoryProcess.performImmediate(HistoryProcess.java:86)
at com.google.refine.process.ProcessManager.queueProcess(ProcessManager.java:96)
at com.google.refine.commands.history.UndoRedoCommand.doPost(UndoRedoCommand.java:73)
at com.google.refine.RefineServlet.service(RefineServlet.java:187)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1410)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:529)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1570)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:790)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1384)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1543)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1306)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at com.google.refine.ValidateHostHandler.handle(ValidateHostHandler.java:93)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.Server.handle(Server.java:563)
at org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at com.google.refine.history.History.readOneChange(History.java:82)
at com.google.refine.history.History.readOneChange(History.java:68)
at com.google.refine.io.FileHistoryEntryManager.loadChange(FileHistoryEntryManager.java:99)
at com.google.refine.io.FileHistoryEntryManager.loadChange(FileHistoryEntryManager.java:83)
... 42 more
Caused by: java.lang.OutOfMemoryError: Java heap space
All of the times that I have dealt with an issue like this seem to have some elements in common - they tend to occur when an operation is creating new columns and result in situations where columns are present or not present anachronistically, i.e. in stages of the project’s history where they ought not to be there/not there; and where I am unable to progress earlier and sometimes later from a stage in the history. As I mentioned, I am not particularly concerned with recovering this specific project, but as I keep running into this issue, I thought it was worth mentioning. Thanks so much for your help and for reading this long message!
Best,
Ella