Grant opportunity from Wikimedia Deutschland / Arcadia

There is a spontaneous funding opportunity from Wikimedia Deutschland:
https://meta.wikimedia.org/wiki/Software_Collaboration_for_Wikidata/Open_Call

This call was noticed in the Wikibase Stakeholder Group, which OpenRefine is part of, and we are coordinating there how to respond to this opportunity.
We had our monthly call yesterday and various project ideas have been floated. One of them would be to apply to develop native support for reconciliation in Wikibase. It gathered some enthusiasm from the group, both from the perspective of Wikbase hosting providers, of Wikibase self-hosters and of Wikidata users.

Here is a rough draft of an application in this direction:

Note that OpenRefine would likely not apply itself as a project (since we are not eligible) but would rather be mentioned as a supporter of the project - unless there are concerns about the initiative?

Since the open call is focused around the REST API that is being developed for Wikidata, I originally thought about another project idea, consisting in adding support for that REST API in Wikidata-Toolkit, the Java library we use to make Wikidata edits. This idea did not gather as much enthusiasm as the reconciliation project in the call (understandably, as its outcomes would be unnoticeable until the existing actions-based API is turned off).

1 Like

Native support in Wikibase. But how much support? Custom scoring, multi-property filters, features, etc. was there talk on specific features or just basic concepts of entity name and property matching as most folks understand the current recon features?

It would be an implementation of the reconciliation protocol, with the intention that OpenRefine can be used with it directly. So the API would be supporting some subset of the features that the protocol allows. I would push for implementing the current draft and not the 0.2 version, since it's a lot more polished and we still have the opportunity to tweak it if needed.

1 Like

A supported Wikidata reconciliation service would be a HUGE win, but my (limited) understanding was that a number of the performance and quality issues with the current implementation are inherited from the underlying APIs. Is this something which is feasible to fix without the involvement of the Wikidata Search team? Would they take the end product and support it?

Tom

I am not completely sure I understood your question, so let me formulate it in a different way. This project wouldn't try to bring about any change in the existing web APIs exposed by Wikibase, on top of which the Python wrapper is currently based. By implementing the reconciliation service as a Wikibase extension, we wouldn't need to rely on those web APIs since the service would live in the same PHP process as Wikibase itself. So it would access Wikibase's data through much more direct means (essentially, SQL database and ElasticSearch access through the PHP layers in place there).

I would expect that this comes with significant performance improvements, but I wouldn't say those are the primary aim of the initiative. Having the endpoint implemented as a Wikibase extension would make it much easier to make reconciliation available out of the box on Wikibases (for instance Wikibase Cloud does not offer it yet and Professional.Wiki only offer it as an option). The difficulty of configuring new Wikibase instances with OpenRefine is something we hear a lot about (whereas people have likely got used to the sluggish speed).

Also we would probably not aim to have the extension deployed on Wikidata in the scope of this project, as it would likely impose some more stringent architectural constraints on the extension (avoiding reliance on the SPARQL query service, primarily). If the extension gets broad adoption in third-party Wikibases, we could then re-evaluate what it would take to have Wikidata deploy it.

OK, I think I follow the implementation details. I consider performance important, but Wikidata support even more important, so not having this address improving Wikidata reconciliation greatly reduces the attractiveness to me.

Tom

I totally agree that getting such an extension deployed on Wikidata would be really, really great.

But my understanding is that this is also really, really hard. From what I remember from various chats with people at or close to WMF / WMDE, the major hurdles are:

  • getting the extension to pass a strict security review (and probably other sorts of reviews as well, such as auditing dependencies I guess? not sure about the specifics). Intuitively that shouldn't be impossible to meet, but it's something that needs resourcing on their side and we'd have no control over the timeline.
  • having people at WMF or WMDE be maintainers of the said extension, so that they are able to keep it running. If I remember well there is some sort of a policy or strong preference against deploying extensions for which they don't have anyone on the payroll acting as a co-maintainer.

So it's literally impossible for a short-term funded project of a few months (which is the case here) to aim to develop a MediaWiki extension from scratch and have it deployed on Wikidata at the end. Our proposal would likely get rejected directly if we were aiming for that.

But if in some years we have a mature reconciliation extension that's well adopted in the Wikibase community, and still an unreliable and feature-poor Wikidata recon endpoint, then we'd be in a much better position to ask for WMDE to look into adopting it.