Hi. I noticed that a project is opened in record mode if the first column as at least one blank cell. Is it right? Can I force OR to open a project in row mode? I started from a simple CSV file:
AA
REFERENCE
1
1
2
7
3
7
4
7
5
7
6
7
7
Thanks, Fabio
Hi @fabiolinus ,
This is a problem I have been working on and have a fix for this for the upcoming 4.x releases.
opened 10:12AM - 02 Mar 23 UTC
closed 12:39PM - 10 Apr 23 UTC
Type: Feature Request
records
When creating a new project, OpenRefine turns on the records mode automatically … if the project "has records":
https://github.com/OpenRefine/OpenRefine/blob/d5c325652cc2cea732211c6345d2a43f0521490d/main/webapp/modules/core/scripts/project/browsing-engine.js#L36
Concretely, a project is deemed to "have records" when the number of rows and records differ:
https://github.com/OpenRefine/OpenRefine/blob/d5c325652cc2cea732211c6345d2a43f0521490d/main/src/com/google/refine/model/RecordModel.java#L108-L112
This essentially means that if the project has any blank cells in the first column, it will "have records" and so the records mode is turned on by default.
This is a problem for multiple reasons:
* this assumption that any spreadsheet with blank cells in the first column uses the same conventions as OpenRefine to represent structured data seems very bold to me. Blank cells can legitimately be used in many other sorts of ways, for instance to represent missing values. This is especially true because OpenRefine is typically used with dirty data.
* the records mode is notoriously confusing for newcomers, is poorly explained by the UI, and it is easy to run operations in records mode by mistake, thinking that we are in rows mode. This is especially true if the records in the view only span one row, in which case the only visual clue that we are in records mode is in the grid header (with the rows/record toggle and the number of records being displayed above).
### Proposed solution
The records mode should only be turned on automatically if both conditions are met:
* the project has blank cells in the first column
* the project was created by an importer which uses the records structure (in OpenRefine itself I think this is only the JSON and XML importers, extensions can define others)
This would ensure that projects created from a CSV file (for instance) are opened in rows mode by default. The user would of course still be able to switch to the records mode as they wish.
The reason why we still need the first condition is that the fact that the importer used the records structure initially does not mean that the records structure is still relevant after some transformations. For instance, the user can decide to fill down the first column, removing the records structure. When reopening a project in such a state, we should use the rows mode by default.
### Alternatives considered
One other solution could be to never turn on the records mode automatically, always use the rows mode by default and let the user turn on the records mode when they need it by themselves. That would not be great for projects created from JSON or XML though.
One could also imagine some sort of modal dialog or hint, asking the user if they want to turn on the records mode, when blank cells are present in the first column. This could be a bit intrusive and get in the way of people's workflow though.
### Additional context
Beyond the usability problems, this is also the source of a performance issue: OpenRefine has to count records of a project, to determine if the records mode should be switched on by default. This counting is happening frequently, even if the user is only working in rows mode, and is an expensive computation for larger datasets. The solution proposed would also let us optimize this away in many cases.
Also, this picture found on social media (sadly I cannot find the source anymore) suggests the extent to which this issue can be frustrating for users:
![FTuxX8jaMAAJHL1](https://user-images.githubusercontent.com/309908/222399401-c160205d-c9e7-472b-b207-12b3d0086af1.jpeg)