Developer and Community Engagement Specialist: onboarding plan

Rory · March 13, 2025, 8:08pm

Hi folks!

I recently joined as a Developer and Community Engagement Specialist. As I continue with my onboarding, I wanted to share some broad areas I'm thinking of focusing on and propose some methods for engaging with the community. Additionally, I'd really appreciate feedback on both of those points. If you have thoughts on where developer time is needed or how you'd like to be supported in the community, I'd love to hear them!

Here are some broad areas I've been thinking of working on over the next few months:

Reconciliation: Over the next few weeks I'd like to pick up a few bugs and then map out a path towards supporting the draft 1.0 specification.
Transformation languages: I'd like to investigate supporting Python 3 natively in OpenRefine. I'd also like to make improvements to the code editing environment more generally. Programming languages and their usage have always been topics of special interest to me, and I'd like to apply that interest to make it easier to transform data.
Maintainability: Every project has a set of tasks that would make a notable difference for contributors, but because they have no user-facing component they are often de-prioritized. I'd like to take this opportunity to address issues that fall into this category.

Additionally, while I work on these tasks, I'd like to do so in a way that encourages participation. One way I'd like to achieve this is by holding office hours on a regular cadence. The idea behind this would be to provide a space to ask questions, pair program, or simply sit in and observe. I'll follow up shortly with details for a session next week.

Please let me know if there are other areas you'd like to see receive developer attention or if there are other ways you'd like to be supported as a developer!

thadguidry · March 14, 2025, 12:44am

Hi @Rory Thanks for taking the time to think about areas to prioritize for OpenRefine. I think for transforming data, that the future will likely embrace more and more AI. OpenRefine itself started as a easier to use tool for transforming data, and not focusing on programming, but to bring programming ABILITIES in easier to use forms exposed to the user (Facets and GREL primarily). GREL was one direction toward a more macro domain-specific language. While all of us working on OpenRefine are capable in programming, our users oftentimes are not.

Having said that, I do think that Python 3 can help some users, but the majority of our users will likely benefit first from the following:

Editing feedback in easier to see and understand UX experience.

Imagine that AI eventually could take over this area, but would still need human input and guidance. I think about this often, what would AI agents need in the underpinnings of OpenRefine to understand data transformation steps and workflow rules that users often take with GREL snippets or Python scripts? What do we need to change to make it easier for future AI agents to hook into OpenRefine's transform "engine" directly?
Errors for GREL are sometimes not helpful. Lookahead for GREL doesn't actually work, and so there is even silence with incomplete syntax. Simple example: value.contains versus value.contains("Thad"), where the first erroneously reports "No syntax error". We have an existing issue or two for label GREL already I think for these things.
Category or Macro/Workflow components where GREL snippets could be tagged with labels (our Projects view sorta has this feature, but it's lacking for Expressions). Currently, we only have Starred, but it lacks any sort of nicer UX experience for directly typing a snippet or script and then starring it. On the internet and forum we often see useful Recipes, but there's no way currently for OpenRefine users to bookmark or type and save those recipes in the Transform editing UX. I can imagine that there's a simple Star button added above the Expression edit window and when clicked opens an additional dialog asking the user which tags they want to apply and showing the listing of existing tags. This experience later can hook into a global Recipes registry that we might simply host in a new repository and that holds JSON or perhaps just TOML files and they are maintained by the community and through an opt-in by the user, can be read by OpenRefine with a better browsable "Recipes" interface instead of the existing plain jane "Starred" interface limits.

thadguidry · March 14, 2025, 12:53am

Also, the notion of a Recipe for users, is often not a single transformation, but a series of transformations that ultimately change an aspect of their data. Do this GREL, then that GREL, then finally this last GREL snippet, where users would save those 3 snippets as steps in a single starred Recipe they created. Imagine that . It's lighter and less complex than our Undo/Redo and can be made portable and shared and hosted!

Anyways, continue to always think: How can I use my programming talents to make non-programmer lives easier in OpenRefine so they don't even have to know programming much if at all. OpenRefine = The original NOCODE data cleaning tool.

Rory · March 20, 2025, 2:53pm

Thanks Thad! Apologies for my delayed response. I think these are really interesting ideas. I'm particularly interested in making GREL feedback more helpful to users with less programming experience. In my opinion, OpenRefine provides a great opportunity to help people build up programming skills. Having better error messages and an easier way to share code would help everyone be more productive. I'd love to help out in this area.

Regarding the notion of sharing recipes, how much of that do you think is satisfied by Antonin's existing reproducibility work?

Martin · March 20, 2025, 3:26pm

@Rory you can refer to #109 regarding the notion of sharing recipes. It ranked in middle of the list of the feature our user ranked last year (score of 58)

In the same survey, there is also the request for more in-context help for GREL or a wizard-like approach to writing GREL (like Excel). However, this issue would require additional design work to better scope and refine the feature.

thadguidry · March 21, 2025, 12:46am

The idea of sharing recipes needs more user feedback. We know they want to be able to more easily share, I think most are thinking publicly and ok with that. It's trivial to setup a repo where OpenRefine can be opt-in to append to a recipes file stored within it. The questions and feature set that users will want like recipe tagging/categorizing, visualizing hierarchies, would mean a very nuanced approach with feedback from users. Because a giant long list of recipes in some dialog in OpenRefine doesn't do much good for users, and they'll expect excellent organizing features.

Antonin's work for reproducible recipes could later utilize the categorizing and organizing features of the shared recipes idea. GREL snippets could also use the same categorizing and organizing features. Two different areas that have the same needs for sharing and organizing. Make sense?

Martin · March 25, 2025, 1:36pm

@Rory

It's great to see your proactive approach in outlining your focus areas and seeking community engagement. Building upon our recent discussions, I wanted to provide some additional thoughts and resources that might assist you in your onboarding journey:

Visibility on the Forum and GitHub

Being more present on the forum and GitHub will be invaluable. Engaging with new contributors on GitHub and sharing your intentions and thought process (see this post by Eliz Ayer on this topic) will foster collaboration and further strengthen our community.

Triaging and Grooming the Issue Log

Dedicating focused time to triaging the GitHub issue log will help familiarize yourself with existing issues and our labeling practice. This 2023 forum post on label cleanup is helpful as it offers insights into our labeling conventions. Also, during the last Barcamp, we discussed approaches to managing timelines and expectations around bug and feature requests.

Feel free to reach out if you need help or ask questions directly on the relevant issue. You might also consider organizing a “triage party” with other contributors; it could be a great way to learn from the community and share insights collaboratively.

Prioritizing Work

Based on our recent conversations and the scope outlined in the job posting, here's a suggested prioritization for your efforts:

Community Engagement & Contributor Support. Focus on responding to forum posts and GitHub interactions. This should be your top priority initially, with the expectation that this workload will decrease as you become more familiar with the project and community dynamics.
Ticket Triage and Pull Request (PR) Review. Use this time to build a strong understanding of the existing issue landscape and gradually review PRs as you gain confidence.
Bug Fixes & Quality Improvements. Approach this work in the following order of priority:
- Reviewing and fixing bugs in OpenRefine.
- Undertaking maintenance tasks that support contributors, such as improvements to Continuous Integration/Continuous Deployment (CI/CD) pipelines, documentation, style guides, Integrated Development Environment (IDE) setup, and in-code comments.
- Exploring feature issues highlighted from recent surveys and community discussions

Your initiative to hold regular office hours is appreciated. It will create a space for questions, pair programming, and collaborative learning, which aligns well with our goal of fostering an inclusive and supportive community.

Please let me know if you want to discuss these points further or need clarification. Your contributions so far have been greatly appreciated.

Rory · March 26, 2025, 1:00am

Thanks for the feedback everyone! I'm away this week and will have a more substantive update next week but I just wanted to say that I really appreciate the warm welcome. I'm looking forward to supporting the people who contribute to OpenRefine!

Rory · April 1, 2025, 6:39pm

Thanks for the feedback everyone! As I mentioned in my original post, I went ahead and scheduled some time for office hours this month: Contributor Office Hours: April 2025

I'd like to use this time as a shared resource to work on the tasks Martin mentioned above: contributor support, issue triaging, pull request review, and general technical discussion. Please feel free to drop by if you're available!

Rory · April 24, 2025, 4:52pm

Now that I've had some time to get acclimated to the project, I wanted to share an idea I had after reading through the project goal posts. I think the second item, adding new rows to an existing project, could build off of a solution to the first goal post, providing a native reconciliation service in OpenRefine. Here is how I think these features could work together:

A user creates a project that acts as the main OpenRefine project for a dataset
New data is imported as a separate project and transformed to conform to the schema of the main dataset
A user reconciles the new dataset against the main dataset and any rows that have no match can be imported into the main dataset

I realize this covers a rather large amount of work, but I believe that incremental progress can be possible and would be valuable before ultimately delivering on both goal posts.

Before I go further into details about how this might work, I wanted to get feedback on the overall process outlined here and see if folks have questions or comments that could guide my explanations of this idea.

Topic		Replies	Views
Developer and Community Engagement update: April 2025 Community	10	81	May 21, 2025
Seeking Input on New Developer Role for OpenRefine Development & Design	18	172	December 29, 2024
Developer and Community Engagement update: May 2025 Community	0	26	June 5, 2025
Contribution phase: instructions for developer applicants Outreachy-GSoC	40	880	April 16, 2023
Announcing Rory Sawyer for the OpenRefine Developer & Contributor Engagement Position Project announcements	2	26	May 19, 2025

Developer and Community Engagement Specialist: onboarding plan

Visibility on the Forum and GitHub

Triaging and Grooming the Issue Log

Prioritizing Work

Related topics