Filtering/Keep words based on list in column

I am trying to clean up a list of medications, so it shows only the list of 'cardiac' medications. Each patient is a row. I have added a list of cardiac medications in column "Meds List". What I am trying to do is remove any words that are present in any row of "Meds List" column. I've tried playing with filter and facets, but keep getting errors. Can anyone suggest a straightforward way to do this? -see image
Appreciate help :pray:

This reminds me of a question that was asked recently on Stackoverflow: OPenRefine GREL removing words if present in another column - Stack Overflow

As OpenRefine is handling operations in a row based manner, you first have to transform the column MEDs List, so that each row has the complete list of medications.

For that you can use the multi-valued cells feature in OpenRefine.

  1. Join multi-valued cells on a separator (e.g. ,).
  2. Fill down the the value of the first row to the other rows.
  3. Having the complete list in each row you can then use filters and string splitting to remove the medications:
with(
  row.cells["MEDS List"].value.split(","),
  meds,
  filter(value.split(" "), v, not(meds.inArray(v))).join(" ")
)

Thank you so much for your reply. I did not know that OpenRefine handles operations in a row based matter. I got the join cells to work by moving the ID column to the beginning and having the same ID. I am having some difficulty with the filter function to work. Any ideas on what I am doing wrong here?

I tried this expression and it seems to be working.
filter(value.split("\\s*,\\s*"), v, cells["MEDS List"].value.contains(v)).join(", ")

I think we do have contradictory requirements :smile:

Just do clarify:

Do you want to remove words in column "MEDS List" from e.g. "Column1.1" or do you want to only show words that are in "MEDS List"?

The snippet I posted earlier is for removing and according to your screenshot it is working as expected.

If you want to keep only values that are present in "MEDS List" you have to remove the not in the filter part:

with(
  row.cells["MEDS List"].value.split(","),
  meds,
  filter(value.split(" "), v, meds.inArray(v)).join(" ")
)
1 Like

This is perfect thank you! Appreciate all your help.