We are preparing a Letter of Intent for the Open Source for the Life Sciences (OS4LS) funding call. OS4LS is a new program that follows from the EOSS funding program.
The LOI is due on June 8, 2026. We are currently looking at a Track 1 proposal, which supports domain-specific tools with funding of up to $250,000 over two years, focused on OpenRefine’s role as open infrastructure for life-science data curation, reuse, and interoperability.
The working draft is here.
Following discussion with the OpenRefine Advisory Committee, we are exploring three related angles. These angles are still high-level and not tightly defined, as we are still scoping the application's direction.
We would like community input on which of these angles seems strongest for OpenRefine and for life-science users. We would also welcome suggestions for other angles to explore or specific points to consider within any of the directions below.
1. Federated repository curation workflows
OpenRefine could support repository-facing workflows where contributors or curators export metadata or tabular data from a repository, clean and validate it in OpenRefine, and publish the corrected data back. This is informed by early conversations with repository stakeholders, including COS/OSF, Dryad, Vivli, Digital Science, and the broader GREI context.
Possible directions include metadata round-trips, validation-report-guided cleaning, and reusable workflows based on formats such as Frictionless Data (#778, which is used by Dryad), DataCite, Dublin Core, OAI-PMH, and other controlled vocabularies such as FAST.
This angle is related to Discussion: How OpenRefine Can Support Federated Data Repositories for Open Science?
2. Life-science reconciliation and validation workflows
OpenRefine is already used in life-science-related work to clean and normalize entities such as publications, authors, institutions, drugs, organisms, clinical variables, and controlled vocabularies. A proposal could strengthen reconciliation and validation workflows around these entities, especially where researchers need to review, correct, and document matching results.
The focus would not be to replace reconciliation services provided by current data providers. Instead, a local reconciliation endpoint could help researchers reconcile against internal authority files or domain-specific reference datasets.
This angle is related to the Native Reconciliation with arbitrary external datasets goalpost.
3. AI-assisted curation and audit workflows
The existing llm-extension points toward AI-assisted cleaning, enrichment, classification, and transformation in OpenRefine. A grant could explore how to make these workflows safer and more useful for research settings through prompt history, transformation provenance, validation, repeatability, local/private model options, and domain-expert review.
The goal would not be to build new AI models, but to make AI-assisted tabular data transformations inspectable and auditable.
Feedback requested
Feedback is especially useful on:
- Which of the three angles seems strongest for OpenRefine and for life-science users?
- Are there other angles we should explore?
- Are there specific risks, existing work, or use cases we should consider for any of these angles?
- Are there life-science communities, repositories, or data-curation groups we should considere before submitting the LOI?