Discussion: How OpenRefine Can Support Federated Data Repositories for Open Science?

Exploring OpenRefine’s Role in Research Data Repositories

Research data repositories and platforms for sharing datasets are becoming increasingly important. We would like to open a discussion on how OpenRefine can add value by assisting with data curation, standardization, and the publication process across federated data repositories.

Recent papers have highlighted OpenRefine’s potential in improving data curation within federated data repositories:

  • "The Fast and the FRDR: Improving Metadata for Data Discovery in Canada" (Turp et al., 2020) demonstrated how OpenRefine was used to reconcile subject keywords within the Federated Research Data Repository (FRDR) in Canada, enhancing metadata consistency and discoverability.
  • "Can We Standardize Name Reconciliation via OpenRefine?" (Mozzherin et al., 2023) explored OpenRefine’s reconciliation service integration with the Global Names Verifier, highlighting its role in improving interoperability across federated systems.
  • "Improving Open Science Using Linked Open Data: CONICET Digital Use Case" by Marcos Zarate, Carlos Buckle, Renato Mazzanti, and Gustavo Samec (2019) explores the transformation and linking of scientific publication data from the CONICET Digital repository into the web of data. The study highlights the use of OpenRefine for reconciling author names with external datasets such as Wikidata and ORCID, thereby improving metadata quality and interoperability across repositories.

These examples underscore OpenRefine’s relevance in federated open science repositories, reinforcing its role in metadata alignment, reconciliation, and automation within open science ecosystems. By adapting the reconciliation, data structuring, and metadata management capabilities we developed for Wikimedia platforms, OpenRefine can further streamline similar functionality for research repositories. Building on the existing integration between OpenRefine and the Wikibase/Wikidata platforms, we envision OpenRefine further integrating with existing repositories to assist researchers by:

  • Structuring data to meet specific schema requirements of repositories.
  • Standardizing metadata fields to align with repository standards.
  • Automating data uploads to repositories with programmatic submission capabilities.
  • Reconciling datasets with external databases to enhance metadata quality.

By integrating OpenRefine into open science repositories, we can help ensure that research data remains structured, accessible, and reusable across disciplines.

Potential Data Repository Integration Partners

We identified the following data repositories as potential integration partners. These connections can be developed as OpenRefine extensions, allowing seamless interaction with data repositories and enhancing metadata quality in the open science ecosystem.

Engagement & Collaboration Opportunities

We invite interested parties to engage with us to discuss potential implementations. Specifically, we welcome:

Developers & Contributors: We would love to hear from you if you are interested in co-developing an OpenRefine extension to integrate with a specific repository. Contributions could include developing new reconciliation services, automating repository uploads, or enhancing metadata transformation workflows.

Research Institutions & Data Repositories: Organizations interested in partnering to formalize OpenRefine’s integration with federated data repositories. This project will benefit significantly from collaboration with partners in the open science community.

Funding Organizations: If your organization supports open science infrastructure, we would love to discuss how OpenRefine can be part of your funded initiatives. We have identified the potential funders listed below who may be interested in such projects and welcome introductions to them.

Community Feedback: If you have experience working with any of the identified data repositories or funding organizations, we’d love to hear your insights and potential avenues for collaboration regarding:

  • How we can support similar initiatives.
  • If organizations are interested in partnering to formalize this integration better.
  • Connections with the identified data repositories or funding organizations to get their feedback on this project (or introductions to some we haven't listed here).
1 Like

This sounds like a great initiative! I'd love to help develop extensions to make it easier to work with these platforms. Do we have any information about the combination of services and partners OpenRefine's users would most benefit from (e.g. reconciliation with PDS or automated uploads to Dataverse)?

@Rory, I do not currently have this information available. I aim to share this call for participation to see who is interested.

1 Like