Hey folks, as I mentioned in my recent update, I'm interested in exploring ways to standardize the REST API for external clients. Inspired by the recent changes to the API suggested by @abbe98, I wanted to put forward a proposal to support multiple versions of the REST API.
The way I see this working is that we would add support for dispatching requests based on both URL path and a new HTTP header (e.g. X-OpenRefine-Version: 3.9). For existing endpoints, the default behavior (i.e. no version is specified via HTTP header) should be to fall through to the current responses. No existing clients should be broken by a version change this way.
While this creates more code to maintain in the repository, I'd argue that this is an acceptable price to pay for freedom to change API responses in a backwards-compatible way. I also think this change would help us with one of our goal posts: supported REST API for external use.
I'd be happy to put together a proof of concept on my fork of OpenRefine if folks think that would be helpful.
This would be a very welcome change and I would be happy to contribute both to a proposal for how to do this and to actual work on the API further down the road.
I think that before any work is actually started it might be useful to align the practical aspects with the commitment and effort that we can maintain over time. The best approach forward I think would actually be to write up both a (very light) technical proposal as well as something more policy like.
That sounds great, and I'd be happy to put together a more specific technical proposal. Regarding something "policy like", can you elaborate on what you have in mind? Do you mean a kind of style guide for designing/documenting/testing API endpoints?
I'm thinking about all of the things that often end up outside of most typical technical proposals; the relation to OpenRefine's main version scheme, do we bump endpoint by endpoint or is there another approach? what type of changes should be considered breaking? etc
I agree with Albin's comments about the usefulness of policy in addition to mechanics.
For the mechanics, using both URL path versions and head versions seems redundant to me. I think it makes sense to choose one or the other. I think the Reonciliation API folks are also thinking about versioning, so coordinating with what they do may make sense for consistency.
I think that those discussions are focused on schema versioning rather than API versioning(e.g if a client needs a different reconciliation version for a given endpoint). I don't think we should support something like that and instead have different endpoints for different URL versions(/v1/, /v2/, etc).
Ah sorry for the confusion, when I said "dispatching requests based on both URL path and a new HTTP header" I only meant that the HTTP header would be different, the current paths would all remain the same (i.e. the URL would still be something like /command/core/get-all-project-metadata, not /v1/command/core/get-all-project-metadata). To give a concrete example, this is what a curl command from my local proof-of-concept looks like:
(the exact version string was just for testing, I don't imagine including -SNAPSHOT in any API version numbers)
I'm proposing headers as opposed to URL path version numbers since OpenRefine doesn't currently have versions in the path, so using headers felt more consistent. I don't have much of a preference for either one though, maybe just a slight preference towards headers.
I don't think we should use a header to negotiate API versions, it will get rather confusing when an endpoint might only support one API version and it will be quite the pain to document given that the default will always need to be an old version.
That said I think we should take a step back, should the current API even be considered? Shouldn't we just add a new set of endpoints? Considering that we probably want to rework the endpoint names, clean up the HTTP statuses, unify errors, improve response formats, and remove a lot of unused endpoints.
I don't have strong feelings about header vs path, but a header of one type or another (whether it be Accept: with a custom media type or a private version header) seems to be one of the more popular options.
There's a (rather verbose) description of all the different options here: https://daily.dev/blog/api-versioning-strategies-best-practices-guide
One of the things it highlights is just how much work is involved in the documentation, support, etc, etc.
I think taking a step back is a good idea. I would be uncomfortable embarking on this journey without knowing who the users are and what they want to do with the API. Is there a roadmap entry or issue that describes the goals? The closest I found is this: https://openrefine.org/docs/technical-reference/goal-posts#supported-rest-api-for-external-use but it links to a completely different discussion about using OpenAPI to drive URL Fetch (or something - it's kind of hard to follow the description).
I was able to capture an OpenAPI definition from our end-to-end tests which covers 63 commands (60 core, 2 wikidata, 1 database) which is about 70% of the total, but there are a number which are specifically focused on the UI as opposed to something that an automation tool might be interested in. If the goal is "just" to run a saved operation history (reliably, with error checking/reporting) against a project, that would need a much smaller API.
A useful first step would be for someone to write down a description of the problem they want solved.