Which reproducibility should we focus on?

Both options will improve OpenRefine, but developing the macro is more of a customization project than an enhancement for reproducibility.

The first scenario corresponds to what was proposed in the grant application. Many users have already hacked this process, and an official support would be welcome.

Looking closer, we should carefully define the scope and what we mean by "turning OpenRefine more into a pipeline runner".

  1. If we are considering the potential for an official headless mode, I'd appreciate hearing from @felixlohmeier, given his extensive experience with openrefine-client and openrefine-batch. Felix's approach, which allows developers to integrate OpenRefine into larger scripts handling data retrieval and publishing, is particularly advanced.

  2. Are we considering orchestration capabilities with the option to run on a schedule and send alerts on failure? In that case, the workflow orchestration space is moving fast, and many great open-source solutions are already available. I prefer if OpenRefine nicely integrates with them rather than recreating our own version of it.

I like @tfmorris approach to moving in that direction with smaller, more frequent releases. I would also include: