I have a column with a lot of Mid (ex. M141414969).
How can I get the File URL (i.e.: https://commons.wikimedia.org/wiki/File:MdM_Micheline_Legendre_en_1975.jpg) of that image?
Regards, Antoine
I have a column with a lot of Mid (ex. M141414969).
How can I get the File URL (i.e.: https://commons.wikimedia.org/wiki/File:MdM_Micheline_Legendre_en_1975.jpg) of that image?
Regards, Antoine
Right now I'm not sure if there is a direct way in OR similar to getting captions, labels, descriptions and properties from Wikibase. But one way can be fetching the URL (ex: https://commons.wikimedia.org/entity/M141414969) and we will get the data which includes the title of the file. So we can construct the File URL from that.
I will get a redirect (HTTP 301 or 302), but not sure how I can extract that in OR…
Regards, Antoine
I think if you set the HTTP "Accept' Header to be application/json
I think it will return the JSON rather than redirect:
Then maybe something like
value.parseJson().entities.get(cells["Img_WCID"].value).title
to extract the file name?
Instead of using content negotiation headers, you can also just modify the URL:
https://commons.wikimedia.org/wiki/Special:EntityData/M141414969.json
although "title" seems like a very odd place to store the file name and I'm not sure how reliable it is.
Perhaps a better approach would be to look at the RDF/XML or Turtle
https://commons.wikimedia.org/wiki/Special:EntityData/M141414969.ttl
https://commons.wikimedia.org/wiki/Special:EntityData/M141414969.rdf
which has (in Turtle) all the attributes of the image object:
sdc:M141414969 a schema:MediaObject,
schema:ImageObject ;
schema:encodingFormat "image/jpeg" ;
schema:contentUrl <https://upload.wikimedia.org/wikipedia/commons/2/2c/MdM_Micheline_Legendre_en_1975.jpg> ;
schema:url <http://commons.wikimedia.org/wiki/Special:FilePath/MdM%20Micheline%20Legendre%20en%201975.jpg> ;
schema:contentSize "18217"^^xsd:integer ;
schema:height "140"^^xsd:integer ;
schema:width "234"^^xsd:integer .
"title" in the EntityData should be fairly reliable for getting the image page associated with an Mid.
Since the Mid is generated from the pageId you could also just shave of the M
and query https://commons.wikimedia.org/w/api.php?action=query&format=json&pageids=141414969 to get title value that way.
@Andre_Costa: This seems fast. Do you know if I can query many on the same call?
Is not working.
Regards, Antoine
@Gnoeee: I dismissed your answer too fast. It was working, but it’s slow. Thanks.
Regards, Antoine
@tfmorris: the Json seems easier to parse. It’s just slow.
Thanks for the alternative solutions.
Regards, Antoine
The separation is done using either the pipe-character
You are allowed max 50 at a go (unless you are logged in with special permissions)