Row.record.cells is not an array?

RolfBly · November 1, 2025, 11:44am

In this record view, I want to label all records in which all values in IN_lref are identical. I’m not at all familiar with record operations in OR yet, so first, I’m trying to find out how I should address all the values in this column in the record.

So I do Add column based on this column and enter this formula.

forEach(row.record.cells, v, v).join(', ')

In the preview, that gets me

First argument to forEach is not an array or JSON object

Whereas the documentation states that row.record.cells is “An array of the cells in the given column of the record”.

What am I doing wrong?

Nicolas_VIGNERON · November 1, 2025, 12:10pm

Hmm, something is indeed strange.

But I’m not sure to see why you are using row.record.cells.

Naively (and if I understand you correctly), I would split the row in two (or more) records (see Cell editing | OpenRefine ) and then use a duplicates filter.

colognella · November 1, 2025, 12:20pm

I was curious about records as I have even less experience with them. Playing around with

type(row.record.cells) (= string per row)

and things like

row.record.cells[1]

which outputs the second char of first column row (!) I came to the expression

row.record.cells['c2'].value (my c2 is your IN_lref)

which outputs an array of the record column values. So this should work for you:

forEach(row.record.cells['IN_lref'].value, v, v).join(', ')

Hope it helps!

RolfBly · November 1, 2025, 3:03pm

The reason for using row.record.cells is that I need to do a few operations with it.
Marking whether or not all occurrences in the record are duplicates is one.
Finding the highest value in a set of cells is another.

Facet by duplicates is a good idea, but one has to be careful. OR identifies this as a duplicate

but it’s not - at least not for me in this case.

RolfBly · November 1, 2025, 3:10pm

This does help, thank you.

I did play around with row.record.cells[0] too. Index number 0 to 9 spells “com google”.

I’m using `row.record.cells.IN_lref.value` which does return the array.

b2m · November 3, 2025, 7:37am

So row.record.cells is a two dimensional structure called RecordCells.

The first dimension is the columns of the project and the second is "An array of the cells in the given column of the record".

As already was pointed out you need to first specify the first dimension (column name) to be able to access the second dimension (cell array).

row.record.cells[columnName]

Note that columnName is not a placeholder but actually a variable containing the name of the current column.

To be able to show the values of the cells you then would also need to add .value, because OpenRefine does not have a concept of representing "(Cell)Objects" in the GUI (yet).

row.record.cells[columnName].value

Let's wrap this knowledge into a more complex GREL expression, that you can use for example in a Custom text facet.

and(
  row.record.cells[columnName].value.uniques().length() == 1,
  row.record.rowCount > 1
)

This expression checks whether the current column of the record only contains one unique value by using uniques and length. We have to add a second expression to distinguish between records that only have one element using record.rowCount and combining the two expressions using and.

Let's also add some sugar to the output by using the conditional if:

if(
  and(
    row.record.cells[columnName].value.uniques().length() == 1,
    row.record.rowCount > 1
  ),
 "all idendical",
 "different values"
)

This expression will also "translate" the meaning of true and false to human readable text labels.

tfmorris · November 3, 2025, 4:32pm

Good detective work! Would you be willing to update/correct the documentation based on your research?

The current design seems a little odd / non-regular to me since normally the columnName lookup is done implicitly.
Perhaps we could support row.record.cell (or some other notation) as equivalent to row.record.cells[columnName].
This would be a parallel to the equivalence between cell and cells[columnName]

The current syntax certainly isn't very easy to discover.

Tom

p.s. You can also the literal column name e.g. row.record.cells['IN_lref'].value

tfmorris · November 3, 2025, 4:53pm

The current design seems a little odd / non-regular to me since normally the columnName lookup is done implicitly.
Perhaps we could support row.record.cell (or some other notation) as equivalent to row.record.cells[columnName].
This would be a parallel to the equivalence between cell and cells[columnName]

Actually, that's not a great idea, because the equivalence is actually cell and row.cells[columnName]
and in the context of a row or record, we're dealing with all the columns and there is no implicit current column.

Tom

RolfBly · November 3, 2025, 9:52pm

since normally the columnName lookup is done implicitly.

[…]

in the context of a row or record, we're dealing with all the columns and there is no implicit current column.

if you go there via Add column based on this column, one could expect there to be an implicit (or pretty explicit, actually) current column, no?

Apart from that, I applaud the idea of adding @b2m’s explanation to the documentation.

b2m · November 4, 2025, 9:03am

Added it to my todo list. Winter is comming, so there will be some time to take care of it.

Not so sure because row.record.cells[columnName] basically returns a subset of the column (multiple cells), whereas row.cells[columnName] returns a single cell.

As far as I remember the code, columnName is usually available in the context for GREL expressions, but not in the context of Clojure and Jython expressions. Not sure how other language extensions built and manage their context.

ostephens · November 6, 2025, 3:01pm

I’ve written up an issue to correct the documentation. Added a little extra detail which is that this only returns non-blank cells from the record, which may be worth mentioning in the docs

github.com/OpenRefine/openrefine.org

Correct documentation for row.record.cells variable

opened 02:59PM - 06 Nov 25 UTC

ostephens

The documentation currently has the entry: | Field| Meaning | |--------|-------…-| | `row.record.cells` | An array of the [cells](https://openrefine.org/docs/manual/expressions#cells) in the given column of the record | This is both incorrect (this actually returns a RecordCells object) and isn't incredibly helpful to the user as it is not obvious how to extract cell objects or their properties (particularly values) from this object. It is required to provide a column name to get an array of Cell objects. e.g: `row.record.cells.columnName` or `row.record.cells["column name"]` This returns an array of the non-blank cells in that column in the record. The properties of the cells in the array can be extracted by dot chaining the relevant cell property on the end: - `row.record.cells["column name"].value` - `row.record.cells["column name"].recon` - `row.record.cells["column name"].errorMessage` But note that unless the `value` is set, the cells won't be included in the array - so cells that are storing an `errorMessage` rather than a value won't be included. (I'm not sure if it is possible for a cell to store an error message and a value?) `

Topic		Replies	Views
Returning record IDs when performing cross lookups Support and Helpdesk	7	540	February 16, 2023
Structured Data in Cells (JSON Records and Arrays) Development & Design	0	71	December 28, 2024
Joining Multi Value Cells Not Completing Data cleaning and transformations	6	493	November 7, 2023
Transpose values from all rows of a certain column into one cell for a new column Data cleaning and transformations	9	406	November 9, 2023
Representing hierarchical data: beyond the records mode? Design proposals	9	900	February 4, 2024

Row.record.cells is not an array?

Related topics