I have a column of Mid, and I want to extract the File:URL…

I have a column with a lot of Mid (ex. M141414969).
How can I get the File URL (i.e.: https://commons.wikimedia.org/wiki/File:MdM_Micheline_Legendre_en_1975.jpg) of that image?

Regards, Antoine


Right now I'm not sure if there is a direct way in OR similar to getting captions, labels, descriptions and properties from Wikibase. But one way can be fetching the URL (ex: https://commons.wikimedia.org/entity/M141414969) and we will get the data which includes the title of the file. So we can construct the File URL from that.

1 Like

I will get a redirect (HTTP 301 or 302), but not sure how I can extract that in OR…

Regards, Antoine

I think if you set the HTTP "Accept' Header to be application/json I think it will return the JSON rather than redirect:

Then maybe something like
to extract the file name?

1 Like

Instead of using content negotiation headers, you can also just modify the URL:


although "title" seems like a very odd place to store the file name and I'm not sure how reliable it is.

Perhaps a better approach would be to look at the RDF/XML or Turtle



which has (in Turtle) all the attributes of the image object:

sdc:M141414969 a schema:MediaObject,
schema:ImageObject ;
schema:encodingFormat "image/jpeg" ;
schema:contentUrl <https://upload.wikimedia.org/wikipedia/commons/2/2c/MdM_Micheline_Legendre_en_1975.jpg> ;
schema:url <http://commons.wikimedia.org/wiki/Special:FilePath/MdM%20Micheline%20Legendre%20en%201975.jpg> ;
schema:contentSize "18217"^^xsd:integer ;
schema:height "140"^^xsd:integer ;
schema:width "234"^^xsd:integer .


"title" in the EntityData should be fairly reliable for getting the image page associated with an Mid.

Since the Mid is generated from the pageId you could also just shave of the M and query https://commons.wikimedia.org/w/api.php?action=query&format=json&pageids=141414969 to get title value that way.


@Andre_Costa: This seems fast. Do you know if I can query many on the same call?

This: https://commons.wikimedia.org/w/api.php?action=query&format=json&pageids=141414969,141414970,141414968

Is not working.

Regards, Antoine

@Gnoeee: I dismissed your answer too fast. It was working, but it’s slow. Thanks.

Regards, Antoine

@tfmorris: the Json seems easier to parse. It’s just slow.
Thanks for the alternative solutions.

Regards, Antoine

The separation is done using either the pipe-character

e.g. https://commons.wikimedia.org/w/api.php?action=query&format=json&pageids=141414969|141414970|141414968

You are allowed max 50 at a go (unless you are logged in with special permissions)