Is it possible to copy and paste several rows of data into an active project

I have a project I've been working on to cleanse a set of Ecommerce product data.
I have already performed several major changes and have left the most trickiest till last.
However I have now go to a point where I need to fix some incorrect text strings within a specific column, and it's actually going to be quicker to just find the accurate values from an Excel sheet and paste them in.
But I can't see any way to paste in multiple cells of data into a column.
Is this not possible and if not is there any other way to achieve it?

OpenRefine's datagrid is not an openly editable spreadsheet (although we have talked in the past about perhaps changing that behavior by having an Edit mode to make users lives easier for ad-hoc edits - there's an issue somewhere floating in our GitHub issues for that, if you wanted to give it a :+1: on it).

You can paste, but you have to be in an edit of a cell in order to do so (unlike Excel which basically is always open editable for any cell) OpenRefine's grid is not open to edits until you click on a cell's edit button, or transform column values in other ways like Facets, so: 2 commons ways to do this:

  1. If you have a small number of ad-hoc edits, you can just facet/text filter or scroll down the column to those cells and edit manually with the blue Edit link on each cell when you click on a cell.

  2. Alternatively, and the OpenRefine way, is to replace values, typically via a Facet or Transform with GREL like replace("sndwich","sandwich"), and then on the values showing in the Facet, you can click on the little Edit link next to any value in the Facet, which will update and replace all the values in your column with your new edited value. (You might also notice the change link along the top of any Facet, this let's you alter the Facet expression with another GREL expression you might want for filtering by some pattern)
    Exploring facets | OpenRefine

Searched of the issue to thumb it up, (searched on 'paste') but couldn't spot it anywhere.

I basically wanted to paste in some product descriptions which are about 3 sentences each per cell value and I had 29 of them. So doing it one by one was a faff and I thought there must be a quicker way I'm missing somewhere.

Not sure the 2nd options gives me what I need either as they aren't simple replaces.

But thanks for the response.

What about exporting you current OR to Excel, add your lines in there, save, open in OR & proceed?

You'll lose your edit history, yes.

1 Like

However I have now go to a point where I need to fix some incorrect text strings within a specific column, and it's actually going to be quicker to just find the accurate values from an Excel sheet and paste them in.
But I can't see any way to paste in multiple cells of data into a column.
Is this not possible and if not is there any other way to achieve it?

This actually might be one of the limited cases where the operation history replay could be useful. It has a number of limitations (e.g. it doesn't replay single cell edits), but if those are acceptable, you could export your operation history, start fresh with a repaired input dataset, and reapply the operations.

Tom

1 Like

I guess it might be more helpful if I explain what I'm trying to cleanse in a bit more detail as I might be going about it in the wrong way.

I'm cleansing product data and need the end cleansed data to be in the following format:
[Brand Name], [Product Name], [Main Keyword] + [Key Feature]

My existing data is a mess and has all sorts of inconsistencies and issues including: duplicate Brand Names, partial Brand Names, brand names not at the start, misspellings, different delimiters, different position of delimiters, unwanted Product numbers, missing translations, additional characters, wrong Product Names, wrong translations for Product Names and Keywords and Features.etc.

I've managed to run a ton of different find cleansing actions on a subset of data to try and speed up the cleansing process, but it's still took me a fair few hours (6-8approx), while I have been able to use some GREL and Python code to do some of the heavy lifting (thanks to ChatGPT, although it probably needs some improvement with it's GREL knowledge) I probably could have done it better and there was still a fair amount of basic edits.

Finally I have managed to get it to a point where I have clean delimiters in the correct locations and all of the Brand Names, Keywords + Features are all in place, accurate and correctly translated.

But I have around 80 records which have their product names in english but they need to be in either French, German, Dutch, Spanish, Italian or Swedish.

I have created a new file with the translations in to attempt to copy and paste over the top in OpenRefine but it's not possible.

In the end, I took the route of splitting the original title column in three by delimeter.
Then used the following cross formula to pull in the translated product names from the other project data set.
Then merged the split columns back together but this time using these columns
Brand + Translated Product Name + Keyword & Features

Seemed a bit long winded but got there in the end.
I'm new to OpenRefine and so I clearly need to practice to figure out how to perform these changes in a more methodical and efficient way as I have to recreate it on lots more data over the coming months and ideally I should be getting to the point where I can apply a set of steps and it will cut me time right down. But at the moment I have a deadline I had to hit and couldn't mess about, so did lots of find and replace that only impacted low number of records.

Here is my final Step by Step Process:

	1. Import both data sets as separate projects and make any data cleansing required
	2. Select the column which will be your common identifier column (SKU or ASIN) and choose Edit Column > Add column based on this column
	3. Include the following GREL Expression:
		if(value!='null',cell.cross('PROJECT NAME', 'REFERENCE COLUMN HEADER VALUE').cells['COLUMN HEADER YOU WANT TO PULL IN'].value[0],'')

I welcome any feedback on my approach, as I have only performed this clean up on 600 of 2000 records so far, so have plenty more to go at. Plus have lots of other data sets that need similar cleansing.

I've tried watching lots of the free Youtube videos and while they are very usful in some instance sthe data challenges are not the same, so my fall back is to ask ChatGPT and take photos of the errors and feed them back into Chatgpt and eventually after many itereations the coding files will generally work.