Trying to copy data from one column to another with multiple conditions

Hi everyone!

I’m currently working on transforming name data in my dataset, and I'm encountering some challenges with my formulas. I need to rearrange names based on certain conditions while ensuring no data is lost or incorrectly formatted.

Context:

  • I have three columns: Vorname, Nachname, and Weitere Zugehörige Personen.
  • The Vorname column may sometimes contain full names or just first names.
  • The Nachname column can also contain full names (with spaces) or proper last names.
  • The Weitere Zugehörige Personen column may be empty or contain names.

Requirements:

  1. If Nachname contains a full name (two words separated by a space) and Weitere Zugehörige Personen is empty, I want to transfer the Nachname to Weitere Zugehörige Personen.
  2. If Weitere Zugehörige Personen already contains a name, I want to append the Nachname to it, separated by a comma.
  3. If the Nachname column contains a last name, it should remain unchanged unless it meets the criteria above.

Current Issue:

I attempted the following formula in the Weitere Zugehörige Personen column:

javascript

Code kopieren

if(
   value.trim() == "", 
   if(
      Nachname.trim().contains(" ") && 
      !Vorname.trim().contains(Nachname.trim()), 
      if(
         Weitere_Zugehörige_Personen.trim() == "", 
         Nachname.trim(), 
         Weitere_Zugehörige_Personen.trim() + ", " + Nachname.trim()
      ), 
      value
   ), 
   value
)

However, I encountered a parsing error, specifically "Parsing error at offset 60: Unknown function or control named".

Here is an example of what I am working with:

Shared with Zight

My previous transformations worked. The end result I want is that the first names are in Vorname, the last names are in Nachname, and the names of the other people are in Weitere Zugehörige Personen. i have tried multiple formulas and have not yet found a solution. Please advise!

The reason for the syntax error is that GREL does not support logical operators like &&, || and !. Instead you should use boolean functions like and, or and not.

So

becomes

Just a hint before writing too complex formulas in GREL: you can always use Facets and Filters to filter the data that fulfills a certain condition and then perform the necessary transformation. This gives you more control over what is happening and you can immediately see the results or manually fix edge cases.

Hi, and thank you for your response!

I am trying to work with facets now. Your suggestion was good. However, I found this in the Open Refine documentation

Operations that don't respect facets

Certain operations don't respect facet settings. If you perform any of the following operations while filtering your data with a facet, the operation will apply to all relevant data, columns, and/or rows in the table, not just those which you have filtered with the facet:

  • Moving, Removing, Renaming, or Reordering a column
  • Splitting or Joining multi-valued cells
  • Reordering of rows
  • Transposition (columns to rows, rows to columns, and columnization by key/value columns).

Second, I am not sure where to apply the formulas I tried to make for the facets.

My conditions are

  • Nachname contains a space.
  • Vorname does not contain Nachname.
  • Weitere_Zugehörige_Personen is empty or has values.

a. Facet for Nachname Contains a Space

Filter: Create a facet that checks if Nachname contains a space.
Expression: Nachname.trim().contains(" ")

b. Facet for Vorname Not Containing Nachname

Filter: Create a facet that checks if Vorname does not contain Nachname.
Expression: not(Vorname.trim().contains(Nachname.trim()))

c. Facet for Weitere_Zugehörige_Personen Being Empty

Filter: Create a facet that checks if Weitere_Zugehörige_Personen is empty.
Expression: Weitere_Zugehörige_Personen.trim() == " "

What I am trying to do is first set up a facet for the names in Nachname where there is a space between the two bits of text and the field for Weitere zugehörige Personen is empty. That space means the text is probably a full name. Having identified those fields, I then I transfer the contents of only those fields to Weitere zugehörige Personen.

Then I set up a facet like that, except where the field for Weitere zugehörige Personen is empty. Having identified those fields, I then I transfer the contents of only those fields to Weitere zugehörige Personen, and append the new contents to the old contents with a comma separating them.

After sorting out the Nachname, I set up a facet for the Vorname. Like so:
I want to see all the text in Vorname where there is a space between the two text bits. Then I transfer the part after the space to Nachname.

For that, I am not sure what to do if a person has a middle name, since then both middle and last name would be transferred.

I write my plan out in such a way because I want to understand the best way to coordinate all these changes.

Thank you for your help so far!

Yes, there are certain operations that do not respect facets (or filters).
However, what you are doing (copying and modifying data across columns) does respect your filtering.

Use a Textfilter on the column Nachname for that.

Use a custom text facet for that.
There you can input your own condition and receive a facet with True/False.

There is already a facet for that called Facet by null, empty, or blank.

Yes that's always a hassle. Sometimes again the custom text facet helps.
For example with the expression with(value.split(" "), a, a.get(1, a.length()) you will create a facet containing all the middle and last names. Which might be enough to quickly filter through the edge cases of middle names.

Ok, thank you for that!

I set up the Facet for Nachname Contains a Space with a custom text filter.

Now that I have isolated the names that I want, those that I can be sure are full names that should be in Weitere Zugehörige Personen, and I filtered for blank cells for Weitere Zugehörige Personen.

Shared with Zight

I have tried to copy it with
if(
and(not(W Weitere_Zugehörige_Personen.trim()), not(Nachname.trim() == "")),
Nachname.trim(),
Weitere_Zugehörige_Personen.trim() + ", " + Nachname.trim()
)
Keeping in mind what you said about Boolean operators.

I found this, but it did not really solve the problem.

@b2m thanks so much btw for helping out with support :busts_in_silhouette: ! Much appreciated :heart: !

1 Like

There is a tricky part now, because you will first modify the column "Weitere zugehörige Personen" by copying the data from "Nachname" and then probably want to clear the data from the column "Nachname". But your filter will change between these two steps as the column "Weitere zugehörige Personen" is no longer empty.

So I would first fix your current selection by either star or flag all the identified rows and use the corresponding star or flag facet.

Then use the transform dialog on column "Weitere zugehörige Personen" with the expression row.cells["Nachname"].value (no boolean functions and checks needed, as transformation respects your active filters).

Note that you could also use row.cells.Nachname.value but I prefer the "quoted" variant as in my experience it is more robust against weird or special characters in column names.

This will copy the data from "Nachname" to "Weitere zugehörige Personen". You then can remove the data from the column "Nachname" via "Nachname => Edit cells => Transform => Common transforms => To empty string".

1 Like

My tips for cases like this:

  • do things incrementally using facets rather than attempting to do them all at once
  • make copies of your columns rather than editing in place
  • don't assume anything about your data. There are likely to be all kinds of outliers.

My first facet is always a custom numeric facet with the GREL expression value.split(/\s+/).length() You might be surprised at what the maximum value is. I usually am!

Hi @b2m,

I followed your instructions and they worked.

Thank you!

Then for the last step of splitting up what is in the Vorname field.

The GREL code

if(cells["Vorname"].value.contains(" "),
cells["Vorname"].value.split(" ")[1],
value)

This took the part after the first space in Vorname and put it in Nachname.

Then to keep only the First Name in Vorname

if(value.contains(" "),
value.split(" ")[0],
value)

Thank you very much for your help!

1 Like