How do you deal with AI generated PRs?

I hope this is not a duplicate, I used the search functionality, but could not find any related discussion.

I'm interested in how this community views and deals with AI generated PRs,
or if there are guidelines around the topic.

The reason I'm bringing this up is that I recently opened issues within OpenRefine that received AI generated PRs. If you compare the work that went into investigating and specifying the issue and later on reviewing the code vs copy-pasting content into a chat window you'll note the workload is mostly on my + other reviewers end while the "credit" for the work is mostly on the AI users side. What I'm taking from this experience is that I will not review PRs or significantly reduce my efforts in this regard.

How do other devs see this?

I’m not aware of a similar discussion on here, so thank you for kicking this off.

At a high level, my feeling is that it depends on the contribution history of the person opening the pull request. GitHub activity is the most relevant, but I consider activity in the wider community as well. I guess it comes down to whether or not I think the contributor is acting in good faith. However, I also recognize that’s very subjective and is not a scalable process.

One thing I’d like to personally improve on is more consistent enforcement of the PR template. I’ve been inconsistent in asking for that guideline to be followed, and my feeling is that there’s a high correlation between low-effort PRs and not taking the time to fill out the template.

We don't have an established policy on the use of code generation LLMs. We should probably, at a minimum, add a question or declaration concerning them to the PR template and remind contributors that they are responsible for all code contributed, whether generated by an LLM, copied from StackOverflow, or written personally by them from scratch. LLMs vary widely in their quality for code generation tasks, so knowing which models were used can be useful data.

I wouldn't be opposed to limiting the use of code generation LLMs until they improve. I've experimented a little bit with using Github's Copilot for code reviews, which I think play more to the strength of current LLMs, but even there I've seen Copilot suggest a change and then ask to have the change modified in the next round of reviews. :angry: I wouldn't use a code generation LLM for something important (e.g. OpenRefine) in a context where I wasn't familiar enough to be able to review its output in detail. That is where I think most of the danger comes from - LLMs being prompted by people who can't judge their output.

As for asymmetry in "credit," unfortunately, there are a number of asymmetries built into how open source communities, including this one, operate. There are a number of contributors here who probably don't get the credit that they deserve. Pull request (PR) review, in particular, is a thankless time sink, but incredibly important task. I only did a cursory review of this PR because a) it was from an established contributor and b) it was a maintainability cleanup in test code in remote corner of the product, so I, for better or worse, triaged it to receive less attention than I might otherwise have.

Thank you for creating the issue. I think we should have issues associated with all PRs (other than dependabot updates and trivial fixes), but have received pushback on this stance in the past. In the context of this issue, if you created it because you already had investigated it and had a partial (or complete) solution, I'd recommend self-assigning it to yourself when you create it to clearly indicate your intent to work on it.

Everyone needs to find their own balance between time devoted to coding vs code review, but without code reviews the project will grind to a halt, so we all need to share the load (while minimizing that load to the extent possible). One way we might be able to help minimize the load is by investigating configuring Copilot with our code review guidelines and having it automatically review PRs.

Tom

2 Likes

p.s. I think one of the most important things is transparency, so we definitely want developers to disclose the use of LLMs, so I'm going to draft a change to the PR template for review.

Speaking of transparency, if you'd like to follow along as I chat with Github's Copilot about how it would fix this same issue, have a look at this draft PR.
Spoiler alert - we're up to 6 iterations so far (!) for this simple fix
I posted the prompt that I used in a comment on the issue.

Although it made a number of easily avoidable mistakes, one of the things like, from a transparency point of view, is that it outlined it's work plan up front, which I was able to comment on, and then it checked off the tasks one by one and explicitly responded to all feedback that it received.

We probably want invest some time in configuring Copilot (and agents in general) to work better in our repositories:
https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions

Tom

1 Like

Hi Sandra. I want to thank you first for the reviews and the detailed feedback on my PRs. Without such commentary, their current state would not have been achieved.

The lack of transparency of the source of the contribution was a failure on my part. I regret that I haven’t made the disclosure in time when I opened the PRs. I’d have felt betrayed were I had to put such an effort into reviewing other's work only to be revealed later of their true origins. I certainly overlooked the fact that others would be involved in my pursuit of driving my work forward. I apologise for the discrepancy which was completely avoidable.

-Srihari

1 Like

ok there is quite a lot to unpack here and I notice I should have added a lot more context in my first post.

re: AI use in development
First of all let me say: I'm not the AI police!
I'm not against AI per se and I use it myself occassionally.
Also in this day and age I actually assume that every developer is using it more or less,
so I'm not sure a general disclosure of using AI would be necessary.

This thread was in response to getting PRs where the author commented that they did not do the work, it was AI. I didn't mean to include "AI enhanced development" or letting AI review first to get pointers for your follow-up review which I think are reasonable approaches for developers, because at some point you check the output, run tests, try it out locally, etc.

re: Why did I open the issues and what is credit
When I started working on OpenRefine, I worked on "good first issues" and noticed there weren't too many and some lacked specifics. That's why I commented on some issues that they might be good for starters in the past and also started specifying some myself, and tests or refactoring in a "remote corner of the product" are actually perfect for this :slight_smile:

I put credit in quotes and yes a "thank you" or this green point thing on your github profile would qualify, but in a broader sense I meant it as "every psychological concept that gets you motivated to work on open source in your free time", so in this case it would be helping out new devs or juniors to get started with "good first issues", that's why I tried to be thorough in my review and explain concepts to improve code quality (I had not intention to work on them myself).

re: community and AI
I did use my personal experience as a vehicle to open this conversation but I know that my frustration stems from my ideals/expectations clashing with the real world (which is my "problem" to deal with).

What I'm interested in the context of this community:

  • is there a way we (I) can create and maintain "good first issues" and make them less attractive to AI-only-users so that the repo stays welcoming to beginners?
    I started experimenting with putting "secret" prompts into issues, but when I used ChatGPT it ignored them, so that was unsuccessful. I'm also in favor of enforcing the PR template more (helps in reviewing generally) and in front end issues asking for before/after images where appropriate.

Also if you noted other issues or benefits arising from AI use, feel free to share.


P.S.
Btw I noticed that I came into the code reviews like Kramer into Seinfeld's apartment which was kind of awkward but I was considering alternatives and thought, if I dont say anything then the only thing I could do afterwards would be to re-open the issue or do it myself, so that was the best course of action for me. If you (@tfmorris or @Rory) disagree or have other preferences, please let me know!
Also I'm not opposed to move ahead with the PRs, they both lgtm so if there are no other objections they can be accepted.

1 Like

I’m not really familiar with how people auto-generate pull requests, but I think trying to introduce a little more friction into that process would be a reasonable first step towards curbing AI-generated PRs. One idea I’ve been thinking about is requiring contributors to be assigned an issue before opening a PR (with the contribution guide updated to reflect this). My hypothesis is that asking to be assigned an issue won’t be a significant burden for someone genuinely interested in contributing to the project but would be just enough extra work to dissuade AI-only contributors.


When it comes to the notion of “credit”, I’ve been thinking a lot about how this forum works: as someone engages more with the forum, Discourse (the forum software) tracks that engagement to approximate something like community standing. I’ve realized I try to do something similar when looking at contributions and contributors on GitHub. In my opinion, providing a thoughtful code review is a great way to increase ones standing as a developer in the community. Other than inviting people to the GitHub org, I’m not sure of how to make this “community standing” progress more tangible (like how Discourse notifies forum members of their increased standing). Maybe this is something we could look into with different teams within the org.

To use this as an example, I think adding unsolicited feedback to a pull request is not inherently bad. Depending on someones role in the organization, feedback can also be implicitly solicited (like a maintainer adding a review of a new PR). However, while providing thoughtful reviews increases ones standing, unsolicited and unhelpful reviews can certainly decrease ones standing. I personally thought your feedback was provided with the goal of making the codebase stronger. I’d expect PR authors to be able to respond to that kind of feedback.