The Outreachy internship programme has started accepting applications from mentoring organizations for its Summer 2023 round. The Google Summer of Code (GSoC) will do so in a few days.
We have participated before and we could participate again for this round.
The main question is whether we could find people motivated to mentor potential interns. Candidate mentors can submit project proposals of their liking, but there are a few points to keep in mind:
for Outreachy, it does not have to be code contributions: for instance, documentation projects are in scope too. For GSoC, only programming projects are eligible.
it does not have to be focused on OpenRefine’s core: for instance, working on an extension or reconciliation service could be in scope;
internships run for 3 months in Outreachy, and can be longer in GSoC.
Mentors should be existing OpenRefine contributors. They should be able to carry out themselves the projects they propose, faster than the planned internship, so that they can guide the intern appropriately during the internship. They should obviously also have time to meet regularly with the intern (remotely) during the course of the internship.
Mentors also get a small compensation for their effort (I could not find the exact figure on the GSoC website, but if I remember correctly it was USD 750 in 2020, and Outreachy does not give a compensation itself but we decided to do it from OpenRefine itself at the last round and that was USD 1000).
In my experience, the most demanding period is the phase before the internships themselves where applicants try to get some initial contributions in the project. Those are mandatory for both programmes and ensure that applicants able to make small contributions independently. This generally results in a spike of activity on the repository, which is a good thing but requires some responsiveness from the team.
Would people be interested in mentoring this summer?
I think I should be able to mentor one intern this round in Outreachy, so I would at least have OpenRefine apply to that. @Sandra, could you add me back as a coordinator for OpenRefine, so I can submit the community application?
I see that there’s a suggested project on IIIF integration. This is indeed a clearly identified user request, and I am incredibly happy to see spontaneous follow up on the Wikimedia Commons integration from OpenRefine’s side! But on a personal note, I am extremely worried that the project may put even more requests and work on my plate, while I’m already overwhelmed, and moving on to other things.
When this integration is finished, who do you expect to be (and remain) responsible for IIIF community liaising? Did this person/organization agree on this and are they aware that a lot of requests (training, updated documentation, help on the forum here and in their own communities) will ensue? (I am currently seen as a major representative for Wikimedia Commons integration and cultural heritage use in OpenRefine, and I am overwhelmed with such requests. I personally can’t commit to taking on more, and, as I am moving my professional focus to other topics than software, would really like to transfer this to someone else).
IIIF has a proactive community and they have a Slack channel. I am not active there but am aware of its existence. I’d recommend reaching out there if you haven’t already, and only going ahead if there’s buy in of the consequences of this integration in this community.
As the internship is finished, it is likely that the code base (in whichever form, probably an OpenRefine extension?) will be unmaintained, like the Commons Extension. I am doing my very best to change that situation for the Commons Extension, but it is in many ways swimming against the stream: lack of skilled volunteer developers in the Wikimedia movement who would be capable and willing to work in a complex (and external) technical ecosystem like OpenRefine, lack of broad support for further paid investment (partly also due to the fact that paid Java developers are hard to come by, and for small assignments not an obvious addition to any existing Wikimedia product team, if there already would be any ready to do this, which there isn’t). I see the same risk for this integration, which may result in frustration for end users as bug reports or new feature requests are not being followed up on (for the Commons extension, this frustration now falls on me, and I don’t want more of it, and also don’t want to fall this on other people’s plates if they are not prepared). Is there willingness in the OpenRefine developer community to keep an eye here and do some maintenance?
An underlying diplomatic consideration: many institutions that implement IIIF do this because they want to avoid duplication of upload of their content on many external platforms (including Wikimedia Commons). I’ve frequently heard that the IIIF community would prefer to see Wikipedia just directly integrate IIIF manifests to illustrate Wikipedia articles, not upload files to Commons. The Wikimedia community will not integrate IIIF manifests for a variety of reasons outside the scope of this topic, and this has already been communicated to the IIIF community in the past. Still, I predict that building an integration like this will bring some friction, which will need to be handled. It is likely that I will be one of the people addressed because of my visibility re: the Wikimedia Commons integration in OpenRefine but I can’t commit to being available to mediate. I want to make it clear that I flagged this here.
May I ask about underlying reasons for proposing this so that I can help with suggestions to direct these needs in other ways?
If a goal is to engage people from the IIIF community in OpenRefine, I’d suggest first building a relationship with them and only going ahead if there is broad buy-in and at least a few people who commit to follow up indeed. Organizationally I think it would be much preferable to have such an integration built by the IIIF community itself (and make them aware that this includes maintenance), not by an external Outreachy intern who will disappear after the fact.
If a goal is to build further on the Commons integration (which would make me very happy), I would personally prefer to see investments in improving the current extension, doing further bug fixes and feature additions, and steps to make the general Wikimedia Commons process in OpenRefine easier to use (i.e. reduce the need for training, which would relieve pressure from people like me, and free us to do actual work with the software rather than training others). See this draft list - not complete! - for ideas.
Sorry about this, I had no idea this would bring up such a reaction on your side. Let me first remove this internship proposal immediately, and then answer your points.
First, the way I pick internship topics is not based on project priorities, but rather on suitability of the task for an intern. The goal is that they have a great time working on an exciting topic, that they are able to make interesting design choices, that they get exposed to challenging but manageable technical difficulties, and so on. Outreachy has guidelines about how to pick such projects. They explain at length that interns are not employees and that the internship should rather be seen as a fellowship where they can develop their skills on tasks they find interesting. For Google Summer of Code, they can even propose projects that are not part of any initial list offered by the mentoring organization - I think that makes it really clear that the goal is not to tick boxes on a pre-established roadmap.
As an example: the SPARQL extension was made in an Outreachy internship last year. I do not consider it really usable as things stand, and would definitely oppose shipping it by default with OpenRefine. Unless Antoine finds the time to work on it himself, it is an unmaintained piece of software. That is fine. This was an internship: the intern has learned something and the mentor too. That is the point. We do not offer internships as a way to find cheap contractors to delegate strategic work to. Of course we want their work to be useful (because it’s much more rewarding as a contributor) but putting them on the spot for something critical is out of question.
Second, I find it difficult to understand how the fact that you are overwhelmed with OpenRefine support requests from the Wikimedia community has anything to do with this. I hope you find the right words to turn those requests down and direct people to other channels. The fact that you have been involved on this topic before does not mean you would somehow become responsible for community liaison about this IIIF feature. There are other people knowledgeable on those topics. As a mentor for such an internship, I would primarily reach out to @abbe98 for design considerations, as requestor of the feature and author of the GitHub issue. Turning to a broader IIIF community is a great idea and that would be a great resource for the intern.
In any case, if you are planning to work on IIIF integration in the future and would prefer that it does not get worked on as an internship in the meantime, that is great and a very valid reason not to propose this internship topic.
Thanks Antonin, I’m glad that I managed to flag this and that you’ve removed the proposal for now.
I should have made myself more clear: I am overwhelmed with OpenRefine support requests from the cultural sector at large. The combination of me leading the Commons integration (which has sparked great interest in that community, way beyond Wikimedians) and being project director for a while, doing conversations to explore new governance structures, has made me very visible. I have been around in the sector for quite long, have met people from the IIIF community in the past, and through these existing connections I am a natural person they currently turn to.
Yes, I redirect people and say no a lot, but it’s painful, knowing that it slows people down in their work.
IIIF support is different from, say, SPARQL support, because it’s not just a data format; there is a very vocal and active (and opinionated) community behind it, with major museums and cultural institutions participating in it. Shipping an unfinished feature/extension as a temporary project by an external intern would politically not be a good move.
To say it positively: I think it will be a great gesture to implement IIIF support at some point, but as I said, preferably in collaboration with that influential community.
Thanks! I really appreciate you hearing my concerns and there’s certainly another way forward on this at a later point.
In a similar vein - would it be possible to consider support for import from Flickr as an Outreachy project? Or is that too small/easy? Flickr has an API and I have (with great effort) managed to use it myself, but a dedicated importer would be a true killer feature. It is widely requested by Wikimedians, but I know that for instance the Biodiversity Heritage Library would also put it to good use to work with its own uploads on Flickr outside Wikimedia projects entirely (e.g. retrieve and process user-generated tags). So perhaps the issue would be better at home in OpenRefine’s general issue tracker.
It’s politically uncontroversial because it’s a widely used platform, not connected to specific user groups. For this reason, it is also much less likely to create unplanned ‘community liaison’ workloads.
I am unable to mentor (both time-wise and re: lack of skills) but since you already volunteered for IIIF mentoring, it doesn’t hurt to ask
Shipping an unfinished feature/extension as a temporary project by an external intern would politically not be a good move.
I don’t think that was being suggested. Indeed Antonin gives the SPARQL extension as an example of where an internship resulted in work that was not shipped.
I’m a bit concerned that we are excluding possible interesting internship projects, on the basis that it might not result in a usable product - this would pretty much exclude any projects from being suitable for internship.
IIIF support is different from, say, SPARQL support, because it’s not just a data format; there is a very vocal and active (and opinionated) community behind it, with major museums and cultural institutions participating in it.
My experience is that the linked data, RDF and SPARQL community are very much vocal, active and opinionated - and once again with major institutions and organisations involved.
I predict that building an integration like this will bring some friction, which will need to be handled
OpenRefine is a general tool, and I have to admit that I dislike the implication that we can’t develop new integrations in case we upset users of an existing integration
I see the same risk for this integration, which may result in frustration for end users as bug reports or new feature requests are not being followed up on (for the Commons extension, this frustration now falls on me, and I don’t want more of it, and also don’t want to fall this on other people’s plates if they are not prepared). Is there willingness in the OpenRefine developer community to keep an eye here and do some maintenance?
I completely agree this is a risk with new integrations of this type, and continued support for particular functionality in OpenRefine, is key to successful adoption of the tool by new user communities. However, I’m also aware that in a project like OpenRefine is continually at risk here. It’s hard to make any guarantees and part of the solution is to experiment and try things out - and these internships are a great way of doing this.
My reading on both Outreachy and GSOC and their project guidelines (which I just reread to be sure things have not changed) is that:
Mentors help interns pick appropriate goals that can be obtained within the X+ weeks of the program. There are no guarantees of “shippable code”, but indeed there are guarantees of “evaluation of the code submitted” and mentors’ final evaluations of the interns code & project summaries towards a pass/fail. I.E. “did they learn something and improve”. Indeed, @tfmorris setup a few of the criteria in the past for “Organization Project Criteria” - means the criteria for grading Project Submissions that an Organization determines at its sole discretion.
Both Outreachy and GSOC encourage code experiments that “might” be deemed useful to the project. Both programs highlight that the evaluations are only done towards the learning effort applied and code submitted and the hopes that the interns stick around long after in the open source community.
What is the goal of Google Summer of Code?
Google Summer of Code (GSoC) is a program designed to bring new, excited contributors into open source communities, with the hope that they will continue to contribute to open source communities long after their GSoC program ends.
That goal for GSOC which is closely aligned with Outreachy’s is nothing about shipping the code submitted. That’s merely a bonus if it happens to the org and Outreachy actually clearly states that a bit better than GSOC, if you dig into their FAQ’s and rules, policies.
Update on this: we are unfortunately not participating in GSoC this year, as we did not get in as a mentor organization:
Thank you for applying to be a Google Summer of Code 2023 mentor organization. Sadly, we were unable to accept OpenRefine this year. We had many more applications than available slots. We hope you will apply again in the future!
Sadly, for the upcoming round I think it is already too late as the deadline for “initial applications” has passed. Those initial applications let prospective interns apply to the Outreachy programme as a whole, without applying for a specific project yet. I think it is a bit sad that this is happening before the projects apply to the programme, because it means that when momentum builds up on our side for our participation to Outreachy, it is already too late to encourage people to apply as interns.