New items not uploading to Commons (error about not allowing text/html mime type)

Hi there,

I apologize if this is a basic question. I'm trying to upload new items to Commons, but they won't upload. When I try to do the upload, it looks like something is uploading, but when progress hits 100%, I see that nothing has uploaded.

The error I see in the terminal says "MediaWiki error while editing [verification-error]: Files of the Mime type "text/html" are not allowed to be uploaded." I'm guessing my files are being interpreted as text files rather than image files? I've tried uploading just one file that I confirmed is a jpeg, and I get the same result and error message.

I've used OpenRefine to upload to commons successfully in the past, although it's been a year since my last upload. I'll share a screenshots below in the hopes that someone sees something I'm doing incorrectly. I tried to set this up like my previously successful uploads, but perhaps I'm missing a piece or something has changed? I'm using version 3.7.6 of refine. I thank you for any help or advice you can give.

Screenshot of some of my data:

1 Like

Hi there! Huh, that's frustrating. Could it be the dots in the filenames that's confusing things? Maybe try one file where the only dot is between the filename and the file extension - so say 1966-100.jpg instead of 1966.100.jpg..?

1 Like

Thank you so much for responding! I tried replacing the dots as you suggested, but I'm still getting the same result.

hmmm ok. Can you share a screenshot of your schema?

Yes, thank you! I wanted to upload a bunch of screen shots, but was limited to one.

Ok...
If your last upload was a year ago, I'm wondering if you have the most recent version of the Commons extension installed (the most recent was march 2023) in case that makes a difference?

I checked the url that you're uploading from, it's whitelisted so it's not that, you're using OR version 3.7.6 and OR supports bulk commons uploading from 3.6 and above so it's not that.

I'm not as experienced in uploading from URL as from downloaded files, so I don't know if it's an issue or not but the file URL auto-downloads for me on my machine - could it be something to do with the magic that makes it auto-download? (not my area of knowledge, but I'm wondering if downloading one of the images and uploading it from a file path rather than a URL might help you troubleshoot)

I can also see that there's no gap between the "|" and "credit line" in your wikitext - couldn't tell you if that would make a difference or not but it might be worth a shot?

sorry not to be of more help!

Thanks so much for troubleshooting with me. I really appreciate you!

One good thing is it does work when I upload the file from my computer. I have about 1200 total to upload, so I'd prefer not having those all on my desktop, but I suppose it's not the end of the world. I can probably try to download to a folder and delete it when I'm done. Thanks for suggesting it as a work around!

Unfortunately, trying later versions (3.7.9 and 3.8-beta) and adding a space between "|" and "credit line" in my wikitext didn't work. Even if I upload from my computer for this batch, I'd like to figure out what's going on for the future. I can try to reach out to the file host to see if there's something weird about the technical metadata (not sure what method refine uses to do file validation). I'm not sure if anyone else has reported something like this, so not sure if it could be an openrefine bug.

aaah ok glad that worked! I've not come across this before, had a look around the forum and github and couldn't see anything similar I'm afraid.

If you do find out what causes it (esp if it's something to do with the file being an auto-download) please do post, would be very handy to know! :slight_smile:

1 Like

It looks like the problem may stem from redirects in the url. I don't think OpenRefine is handling the redirects. I think it's going to the url I'm supplying, not finding the file, and not following the redirect. That's probably why it thinks the files are text/html. Perhaps I should submit an issue about this in github?

yes, that may be a good plan - I do know that general advice for uploading from URLs is to have the link to the file itself, rather than a page with the file in it, but I'm not sure how redirects are dealt with. I'm assuming that there's no way to get URLs that link to the file itself?

S

Thanks both for debugging so deeply!

Looking up the first URL you shared in the screenshot above, "https://ids.si.edu/ids/download?id=SAAM-1966.100_1.jpg", I do get a redirect indeed, to "https://smithsonian-open-access.s3-us-west-2.amazonaws.com/media/saam/SAAM-1966.100_1.jpg".

It's unclear to me why this should cause any problem, because MediaWiki should be able to follow the redirect according to this Phabricator ticket:

https://phabricator.wikimedia.org/T31154

Perhaps one problem could be that the original domain (ids.si.edu) is whitelisted, but not the second one (smithsonian-open-access.s3-us-west-2.amazonaws.com)?

If the problem lies in the redirect handling, then that's something that you could report to Wikimedia Commons rather than OpenRefine, because the redirect handling is done on their side. But I am not completely sure, of course it can well be that OpenRefine isn't playing by the book and is the cause of the problem. If there is any other upload tool that also lets you upload files by URLs (I don't know if Pattypan does), then it could be worth trying with such a tool to see if the same problem occurs.

1 Like

Thank you for your response! I hadn't thought to check the domain of the redirect. That's a great point! I could look into getting the redirect domain whitelisted.

I've never used Pattypan, but I'll check to see if it does uploads by URL, and if so, if the same issue occurs. I also know someone who has written a tool to upload using these links directly to Commons, so I can check with him to see if he's experiencing this issue.

I think those resources have only lately been moved, so that would explain why I used to be able to do these uploads in OpenRefine, but can't any longer.

Thanks to both you and @SaraThomas for your help. I'll update this if I learn more.

Pattypan doesn't upload from URL I'm afraid, just files from a folder on your computer: https://commons.wikimedia.org/wiki/Commons:Pattypan/Simple_manual. Whitelisting info is here for reference: https://commons.wikimedia.org/wiki/MediaWiki:Copyupload-allowed-domains :slight_smile:

Good luck!

1 Like