Hi
I've loaded up a spreadsheet (xslx) to tidy up and on exporting back to a spreadsheet (xslx) the dates have been adjusted by what looks like the timezone. I've not had this before. The OpenRefine date values are presented in UTC format which marries the NZ format. How do I force OpenRefine to export back using my local time zone?
NB: Have tested exporting .XLS and CSV - same problem.
Am using a Mac M1 and am on the latest version of OpenRefine
I've loaded up a spreadsheet to tidy up and on exporting back to a spreadsheet the dates have been adjusted by what looks like the timezone. I've not had this before. Is it my original spreadsheet or is it within the OpenRefine settings? The OpenRefine date values are presented correctly in OpenRefine.
NB: Checking Excel it stores dates in the local timezone - so it's something to do with OpenRefine
Excel (and all other spreadsheet tools that I'm familiar with) doesn't have the concept of timezones and spreadsheet dates don't represent an instant in time. On the other hand, OpenRefine datetimes must represent an instant in time. There was an unannounced change to OpenRefine's date handling in May 2018 which switched timezoneless dates from being interpreted as local to being interpreted as at UTC. I suspect what you are seeing is related to that change and the associated havoc that was wreaked on the internal date handling, but it would be good to have a small example that can be used to reproduce the problem. If you could create a Github issue, I'd be happy to look into it further.
Simple roundtripping should work regardless, but the impedance mismatch between Excel dates and OpenRefine's current internal date representation will always cause some rough edges. This mismatch was pretty much unavoidable when OpenRefine was first written due to Java's limited date support, but the new (ie post-2014) Java date handling includes date formats which are much more closely aligned with Excel's internal format (LocalDate, LocalTime, LocalDateTime). I've got implementing support for these on my to-do list, which would also fix another gap - the ability to use dates without an associated time and vice versa.
I'm on the fence as to what to do about the incompatible date handling change. It was working for 8 years before being broken in mid-2018, but it's now been broken for 5 years, so it's not clear what the best solution is. I think the original handling is more logical, but I'm open to arguments for or against.
My take on this is that it’s fine within OpenRefine to handle dates in UTC form factor. If multiple parties are working on a project around the world as researchers do, that’s probably best.
What’s missing is when OpenRefine does a spreadsheet/CSV/Text export that there is an option to specify ‘Use Local Time For Dates’. Or OpenRefine could just assume for Excel Exports that the user wants the dates in the local time zone as they will be expecting the dates to be accurate.
I fixed my problem by incrementing the date by the time difference. However as we had daylight saving turn off on April 2nd at 3:00 AM this was not a fun process
The advantages I can see to the current behaviour is that it offers reproducibility and consistency between users - which is useful in training, support and in cases where users are working collaboratively but are geographically across time zones (or a single user working on the same data inside/outside daylight saving time? I'm not sure how the previous behaviour handled this situation?).
Outside this I don't have strong feelings on the issue - the most important thing is that we are clear about the behaviour
I do like the suggestion that we have more/better tools for converting date times between time zones - possibly on import, on export (as suggested by @Tim_Chaffe) and in transformations.