Hi folks, I wanted to post an update on my work and what I'm thinking of working on next. Hopefully this gives the community an opportunity to check my work and comment on my plans for the next month. My main priority is to support the other OpenRefine contributors, so please don't hesitate to reach out if you see something unaddressed that you'd like to see improved.
April was largely focused on maintenance of the codebase. I was more active in issue and pull request triage, I released a new version of OpenRefine, and I kept researching the internals of OpenRefine. With each issue and pull request opened, I'm able to learn more about how OpenRefine works, which leads to these reviews taking less time and my reviews will hopefully be more helpful. On the technical side, I wanted to explore Proxy all (or most) reconciliation API calls through the backend · Issue #7185 · OpenRefine/OpenRefine · GitHub as a means of learning more about OpenRefine's architecture. However, this led to me spending much more time learning about Butterfly (the web framework OpenRefine uses) than reconciliation, as Butterfly seems more in need of knowledge sharing than other parts of the application architecture.
Looking ahead to May, I'd like my research of Butterfly to be useful, so I'm planning to write up some documentation on the framework. I'm also working towards the release of version 3.10, hopefully having a beta out towards the end of this week or early next week. Additionally, I'd like to spend time making a plan to stabilize the APIs for OpenRefine, both internally for extensions and externally for REST API clients.
While general feedback is always welcome, I'd especially like to hear about the following:
What do you find confusing about Butterfly? What kind of materials would be most useful for you (Javadocs, tutorials, etc)?
What do you find most frustrating about extending OpenRefine, either through the REST API or through an extension? I know there is a request for more guides on building extensions and the REST API is generally described as "use at your own risk", but anything specific would be very helpful.
I held office hours every Thursday in April, though attendance was generally low. I would like to hold office hours again in May but would appreciate any feedback about when and how people would prefer to engage.
I hope this information is useful! Please feel free to comment on the format of this update itself. I'd like to be more transparent and accountable to the community and I hope updates such as these help with that effort.
What do you find confusing about Butterfly? What kind of materials would be most useful for you (Javadocs, tutorials, etc)?
My largest issue with Butterfly is that it's separate from OpenRefine, no one else uses it but it has a separate release process and each time one need changes there to become unblocked in OpenRefine it just takes so much more time to get things rolling. It could just become a part of the OpenRefine repository at this point, which would allow it to use the same build system, test, etc as well.
What do you find most frustrating about extending OpenRefine, either through the REST API or through an extension? I know there is a request for more guides on building extensions and the REST API is generally described as "use at your own risk", but anything specific would be very helpful.
The API is in an odd place given that we don't really do breaking changes but we also say that people shouldn't use it. I think a way forward would be to give it the same level of support as for extension points and dependencies, it's the practice anyway and it would likely spark some interesting uses in the wild.
Extensions I think are in a better place, with the main issue being a lack of documented practices, many which would help avoid common issues(extensions impacting core features, layout, etc). CSS and JavaScript namespacing comes to mind.
Both Butterfly and API improvements with their surrounding discussions tend to quickly hit issues in regards to breaking changes which some core developers have been very vocally opposed to in the past. Maybe this is the time to revisit the question if this isn't a good time to rip the bandage off; improve the API, fix up some extension points, drop LESS support, bump Jetty, Velocity and other core dependencies. From my point of view that would be very welcome as I think the improvements that we could make greatly outweigh the burden of the breaking changes.
The following OpenAPI specification for OpenRefine is a few years old, incomplete, and created through request interception. I'm not sure if it's of much use to anyone but in case it is:
Thanks for the OpenAPI spec! I was looking into that so this is a huge help.
Maybe this is the time to revisit the question if this isn't a good time to rip the bandage off; improve the API, fix up some extension points, drop LESS support, bump Jetty, Velocity and other core dependencies. From my point of view that would be very welcome as I think the improvements that we could make greatly outweigh the burden of the breaking changes.
I think this makes sense. We can't avoid breaking changes forever, and I think it would be helpful to have a plan around the frequency and nature of breaking changes (like bumping the minimum required Java version or upgrading a key dependency) so as to minimize the disruption when they inevitably happen.
One thing I would like to examine is the issue surrounding the Butterfly classloader: Butterfly classloader module isolation causes Jackson problems · Issue #15 · OpenRefine/simile-butterfly · GitHub
Decoupling OpenRefine's dependencies from those of the extensions seems like a worthwhile endeavor, as it would allow us to make some of these improvements without breaking extensions (or minimizing the damage if they do break).