Funding opportunity: OS4LS proposal around OpenRefine for life-science data curation

Martin · May 14, 2026, 8:02pm

We are preparing a Letter of Intent for the Open Source for the Life Sciences (OS4LS) funding call. OS4LS is a new program that follows from the EOSS funding program.

The LOI is due on June 8, 2026. We are currently looking at a Track 1 proposal, which supports domain-specific tools with funding of up to $250,000 over two years, focused on OpenRefine’s role as open infrastructure for life-science data curation, reuse, and interoperability.

The working draft is here.

Following discussion with the OpenRefine Advisory Committee, we are exploring three related angles. These angles are still high-level and not tightly defined, as we are still scoping the application's direction.

We would like community input on which of these angles seems strongest for OpenRefine and for life-science users. We would also welcome suggestions for other angles to explore or specific points to consider within any of the directions below.

1. Federated repository curation workflows

OpenRefine could support repository-facing workflows where contributors or curators export metadata or tabular data from a repository, clean and validate it in OpenRefine, and publish the corrected data back. This is informed by early conversations with repository stakeholders, including COS/OSF, Dryad, Vivli, Digital Science, and the broader GREI context.

Possible directions include metadata round-trips, validation-report-guided cleaning, and reusable workflows based on formats such as Frictionless Data (#778, which is used by Dryad), DataCite, Dublin Core, OAI-PMH, and other controlled vocabularies such as FAST.

This angle is related to Discussion: How OpenRefine Can Support Federated Data Repositories for Open Science?

2. Life-science reconciliation and validation workflows

OpenRefine is already used in life-science-related work to clean and normalize entities such as publications, authors, institutions, drugs, organisms, clinical variables, and controlled vocabularies. A proposal could strengthen reconciliation and validation workflows around these entities, especially where researchers need to review, correct, and document matching results.

The focus would not be to replace reconciliation services provided by current data providers. Instead, a local reconciliation endpoint could help researchers reconcile against internal authority files or domain-specific reference datasets.

This angle is related to the Native Reconciliation with arbitrary external datasets goalpost.

3. AI-assisted curation and audit workflows

The existing llm-extension points toward AI-assisted cleaning, enrichment, classification, and transformation in OpenRefine. A grant could explore how to make these workflows safer and more useful for research settings through prompt history, transformation provenance, validation, repeatability, local/private model options, and domain-expert review.

The goal would not be to build new AI models, but to make AI-assisted tabular data transformations inspectable and auditable.

Feedback requested

Feedback is especially useful on:

Which of the three angles seems strongest for OpenRefine and for life-science users?
Are there other angles we should explore?
Are there specific risks, existing work, or use cases we should consider for any of these angles?
Are there life-science communities, repositories, or data-curation groups we should considere before submitting the LOI?

Martin · June 3, 2026, 5:17pm

I wanted to provide an update on the grant application. The metadata angle is actually out of scope and since Monday we've been exploring the AI-LLM angle and go further.

You can read more, review and comment the early draft here (see the tab with )
https://docs.google.com/document/d/1KlQ_38L0dRe-yCuv-iVlNloFhAEzzX6xVwKC6ojK7vE/edit?usp=drivesdk

Thank you

Martin · July 9, 2026, 9:49pm

We have been invited to submit the full application by July 21.
The biggest part now is defining the Work Plan (see details here) and the Budget.

As the main developer on the extension, @Sunil_Natraj is helping, and we also welcome insights from the community to finalize those. We started to aggregate ideas in the Internal Scoping tab. Note that the document is at the brainstorming, data/idea collection stage. Feel free to add and comment with your idea.

Topic		Replies	Views
Discussion: How OpenRefine Can Support Federated Data Repositories for Open Science? Community Feedback	2	252	March 20, 2025
Upcoming financing option for OpenRefine Community Feedback funding-opportunity	15	1016	April 4, 2024
Funding opportunity: OTF FOSS Sustainability Fund Community Feedback funding-opportunity	3	51	April 30, 2026
OpenRefine funded to improve its reproducibility Project announcements	0	338	February 2, 2023
November 21, 2025 Advisory Committee Day-to-day project operations minute-advisory	0	41	November 25, 2025

Funding opportunity: OS4LS proposal around OpenRefine for life-science data curation

1. Federated repository curation workflows

2. Life-science reconciliation and validation workflows

3. AI-assisted curation and audit workflows

Feedback requested

Related topics