Just a quick question and/or feature request.
I wonder if it is possible to specify multiple types for an entity.
For example, a case study would be: I would like to search entities which are either city (Q6256) or country (Q). The types could be specified in a list. Then, an autosuggest service can only shows (or prioritize) the entities of these types over other types.
If it is not possible, is there any plan to implement it? I think this functionality would be nice.
My question is not only for OpenRefine, but also for how to create a reconciliation query for Reconciliation API, if this function is possible.
I wonder if it is possible to specify multiple types for an entity.
For example, a case study would be: I would like to search entities which are either city (Q6256) or country (Q). The types could be specified in a list. Then, an autosuggest service can only shows (or prioritize) the entities of these types over other types.
Currently you would need to specify a common super type, such as political territorial entity or territory. There has been some discussion about supporting this in the reconciliation service protocol, but it's not currently part of the specification.
I'm curious to know more about your use case. In typical rectangular data grids, a column contains a single type of data which matches what the column name/head says. What's the column called in this case? Can you say more about why multiple types of entities end up in a single column in this case?
One case that I can think of is something like a "Name" column paired with a "Level" column nearby that contains values like "City," "Country," etc. In that case my approach would be to facet on Territory Level and reconcile each set of values against the more specific type separately. Does that describe your use case or are you dealing with something else?
To restrict results on certain types is dependent on the implementation of the Reconciliation Service API. Some may even let you use (a subset of) the Elastic Search Query Language or SPARQL-Features.
As you provide QIDs as example I suppose you are working with Wikidata or a Wikibase Service?
Just try using OpenRefine's record structure for this. Here is an example:
name
type
france
city
country
paris
city
country
The column name is used for reconciliation and the column type as additional column restricted on P31.
In other terms (simplified): multiple values in a record structure are basically used as OR concatenated values, whereas values in multiple columns are considered AND concatenated.
You may still have the problem, that in the Wikidata there are more levels, like town or large village.
Thank you for your input. My question is a generalised question, which might confuse you. My specific use case is as follows:
It is a web form where the users makes a pre-defined query based on the information filled in the form. For instance, the user looks for people who were educated at somewhere, then fill the form:
1 User specifies a entity type (subject) with autosuggest: e.g. Q5 Human
2 User specifies a property with autosuggest: e.g. P69 educated at
3 User specifies an object entity with autosuggest according to the constraints of P69 (i.e. this is a list of possible entities in the object position: value-type constraint): e.g. Q2385804 educational institution)
4 Users get the result of a query (e.g. 100 people were found in which 2 people were educated at university X, 3 people were at research centre Y)
The point here is that third autosuggest limits the candidate entites based on the second autosuggest. So, in the step 3, the user will see only candidate entities in the autosuggest that are allowed for the property. This is useful, because, otherwise, you get no results very often.
Does this use case make sense?
Other use case may be: an autosuggest can be narrowed down to specified multiple entity types rather than any entity types (e.g. user select an entity from the list of candidates which are either castle or palace)
Thank you for your reply. Yes, I focus on Wikidata (for this question). Im rather using Reconciliation API than OpenRefine. My new comment above (use cases) may clarify what I am looking for, but I am not sure if it is feasible right now. In the documentation of Reconciliation API, I could not find if multiple types can be specified for autosuggest for an entity.
If there are workarounds, I would be more than happy to know them! Cheers
In version 0.2 of the protocol (which is the one OpenRefine 3.x uses so far), multiple types are allowed in a reconciliation query:
The specs say that the type field is an array, although the array structure can be omitted when it has a single element (which is the case in all the examples we give).
In the upcoming version of the protocol, we have been discussing about what to do about this type field. For now, the current draft has removed the support for multiple types (#109 and #115), specifying that the type field cannot be an array anymore. I am not sure it's a move in the right direction though, given that there seems to be high demand for this feature.
By the way, I have no idea to what extent the reconciliation services that are out there actually support multiple types.
It looks like the Wikidata one I wrote does, but obviously since OpenRefine does not support that, it's not a feature that has been very extensively tested.
The main limitation of specifying a type as an additional property (with "instance of", P31) is that the comparison of values will not automatically follow the "subclass of" (P279) links to compare values, which is what happens when you specify it as a type in OpenRefine's UI.
For instance:
if you reconcile "Berlin" against type "human settlement (Q486972)" you will successfully get "Berlin (Q64)" as first candidate
if you reconcile "Berlin" with additional property "instance of (P31)" "human settlement (Q486972)" then you will get "Berlin (Q614184)" (town in Maryland) as first candidate with score 100, and "Berlin (Q64)" (capital of Germany) will only appear further down in the list, with score 71.
This is because "Berlin (Q64)" does not have a "P31:Q486972" claim, but only to a subclass of human settlement. If you wanted to use the "sparql property paths" to solve this then you would need to know how many "P279" links there are between the type that is added on Q64 and "human settlement".
Is this behavior part of the Reconciliation Service Protocol Specification or dependent on the implementation of the Reconciliation Service part? Or something else (triple store technology, ...)?
That's purely down to the reconciliation service: it would totally be possible to change the Wikidata service to support following many P279 links when comparing property values for P31.
In general however (for other reconciliation services), the protocol specifications do not mandate that the notion of "type" that the service exposes has a counterpart as a "property". In the case of Wikidata it turns out that P31 exists for that, but I'm pretty sure there are services out there were there is no equivalent. In fact many services don't have any properties at all.
To be honest, I don't really understand this example 7 in the specs. Can you perhaps explain whatr it is trying to do and how it works?
Now, I experimented a few syntax with API (not OpenRefine), but with a partial success: I was able to put a list in the query syntax without an error, but the result was not what I wanted to see. I tried to see if I can specify two distinctive keywords (vienna and honda) for distinctive types (educational institution and car maker). Note I only changed the keywords in the following syntax.
@GO5IT that is likely a bug in the reconciliation service.
Example 7 in the specs does not demonstrate passing multiple types, instead it demonstrates how different sorts of property values can be passed in a reconciliation query. The query amounts to the following:
give up to 5 entities with a name similar to "Christel Hanewinckel", of type "DifferentiatedPerson" and whose "professionOrOccupation" matches either the string "Politik*" or the entity with id "wissenschaftler".
If you feel like it would be useful to add this sort of translation in the specs, I think it could make sense and could be suggested on the issue tracker for it.