Cleaning up our GitHub issue labels

In many FOSS projects, the labels that one can use to categorize issues are grouped by type. And each issue generally has at most one label of a given type.

Sometimes this categorization of labels is supported by features of the bug tracker itself. For instance, GitLab offers this. This is used for instance by the Inkscape project:

For instance, they use this for their importance ranking (“Importance: Low”), to categorize which tool of Inkscape is affected by an issue (“Tool: Shape Builder”) or which operating system is affected (“OS: Windows”).

Gitea also offers this, and is used for instance by Blender: Labels - blender - Blender Projects

As far as I can tell, there is no built-in support for this in GitHub, but some projects hosted there still adopt this structure. For instance Docusaurus: Labels · facebook/docusaurus · GitHub

I would be interested in introducing a similar structure in our labels. The goal would be to make our labeling system more principled, so that new contributors could start labeling things on their own more easily. We could also use this opportunity to give the same color to all labels of the same type, so that we can more easily identify issues where no label of a given type (or multiple ones) have been assigned. The task would first consist in trying to group the existing tags we have into categories. Then, cleaning up tags that are not used much and introducing missing tags identified by the classification effort could be useful.

Do you think this would be a sensible move? Do you wish for any other improvements to our labels?

I worry that it narrows our filtering ability not broadens. But ya know what? Let’s give it a try!

I thought I replied to this before, but I guess I imagined it. I think structured labels are useful. We currently use a Priority: prefix and no others, but it'd be straightforward to add Type:, Status:, Component: (or Module:), etc.

OpenLibrary uses the following (which I think is overkill):

  • Affects
  • Close (ie closure type)
  • Lead (for each of their 23 lead developers, which wouldn't be as useful for us)
  • Module
  • Needs (ie what's blocking this)
  • Priority
  • Theme
  • Type (Bug, Feature, etc)
    At the same time we could make sure that all the labels have clear and accurate descriptions as well as rationalize any overlapping or obsolete labels (e.g. UX vs Usability, Help Wanted vs Good First Issue).

Next steps:

  • Agree a set of categories
  • Propose & review mappings from current labels to new label
  • Relabel issues

Tom

Please consider that renaming and deleting issue labels break existing links to filters, searches, etc in non obvious ways(the later meaning that we wont get error messages).

I put together a spreadsheet containing the current issue labels here and make a quick pass through to try and sort them into categories. I also included issue counts to help identify labels which are unused or lightly use which might be candidates for deletion or merging into other labels.

I don't feel a burning need to own this cleanup, but happy to do it if folks come to agreement on the approach.

Tom

I would like to see the Java and JavaScript labels removed. Designers and developers can filter better on other terms like front end or backend or server etc.

@tfmorris thanks that's really helpful!

@Lydiaofficial given that you have identified this as an issue of relevance for designers, I wonder if you plan to do this work (or some of it) as part of your internship, or if you rather plan to focus on other measures that you identified?

This cleanup is definitely an issue where I would expect people will have opinions, as categorizing things is always a pretty subjective exercise, but I am sure we can reach something more consensual than the current mess we have.

Hi @antonin_d , indeed yes, I plan to as It is one of the deliveriables of my internship.

Amazing! I think @tfmorris' spreadsheet is a great start for a possible classification of the existing labels, but that's just a first step: it can make sense to introduce new labels as well, merge some existing labels together, and so on.

We have some lightweight documentation about how our issues are currently organized (for instance some usage guidelines for the "good first issue" label):

It probably makes sense to take that into account to understand the current situation, and of course it would be useful to update and improve this documentation as we make changes to the system.

Let's keep using this thread to discuss changes before we implement them on GitHub.

@antonin_d - I think there are two separate issues being discussed in this thread - 1) new label structure and 2) cleaning up old labels - and I just want to clarify what you mean when you asked if Lydia will "work on this issue" and make sure we're all on the same page. Part of the deliverables in @Lydiaofficial's internship is proposing a new system for labels in Github - specifically concerning design contributions, but we had not discussed with her that cleaning up old labels will be her task.

Regarding design issues - sure cleanup makes sense, but I think it's out of scope for Lydia to do the whole cleanup. Too many other tasks already planned in the weekly schedule, which you have access to.

I think it's great to keep the discussion going in this thread though, so it's all in one place. And I hope that whatever Lydia proposes can be easily integrated in the rest of the label changes/merges that the dev team sees necessary. I also agree that the spreadsheet from @tfmorris is very useful!

I am not really sure how to keep your 1) and 2) separate.

By trying to introduce label categories (your 1) we are bound to also suggest some clean up in the same go, like @tfmorris did in the spreadsheet. When asking myself which category a label should be part of, I need to consider how the label is currently used and that is likely to suggest actions like deleting the label entirely if it is not deemed useful.

In the other direction, if I just try to start cleaning up existing labels without categorizing them, I will again ask myself how each of those labels is being used, and that's already most of the cognitive effort I need to suggest a fitting category for it. So since there seems to be a consensus that we want to do this categorization, I'd do it in the same go.

Part of the deliverables in @Lydiaofficial's internship is proposing a new system for labels in Github - specifically concerning design contributions, but we had not discussed with her that cleaning up old labels will be her task.

Ah right - well that's your call of course! Thanks for clearing this misunderstanding.

By the way, I think this cleanup task is quite elastic: it can take more or less time depending on the desired end state.
I think it's super valuable to have some insights from someone who is new to the issue tracker and will cast a more critical eye on our usage of labels, so that's why I was keen to see @Lydiaofficial's thoughts on those. But that being said it's also a task that's much easier to do for someone who has been using the labeling system for longer and already has an intuitive understanding of how most of the existing labels are being used.

I think the difference is quite literal 1) is creating a text doc or spreadsheet with a proposed structure; and 2) is implementing the proposed changes in Github.

I imagine Lydia proposing something, but I didn't imagine her implementing it, or at least not in its entirety, largely because I think any proposal will lead to a lot of debate and discussion and then when the consensus will be reached is unclear. So I didn't plan in implementation in the weekly schedule. But if everyone agrees on parts of the proposal within the time frame of the internship, then those can also be implemented.

Ah ok, I had understood it completely differently, sorry :smiley:
Great, then we are on the same page!
Looking forward to the proposals and happy to do the concrete changes on GitHub once we have something consensual (which I do hope should be able to happen before the end of the internship).

Here is an update on my task on improving our Github label structure:

Github label structure for OpenRefine Targeting Designers

The issue categories are to classify and organize various issues, such as bug reports, feature requests, or pull requests, within the project's repository on GitHub.

Each category serves a specific purpose in providing context and structure to the issues and pull requests to make it easier for project maintainers and contributors to manage and prioritize their work efficiently.

*Type > Theme > Status > PR > Priority > Skill level *
Platform > Module > Data format

Labels for Issue Type:

Bug: Indicates issues related to software defects or unexpected behavior. These issues require investigation, debugging, and resolution.
Feature Request: Identifies requests for new features or enhancements. These involve proposing and discussing new functionality or improvements.
Design discussion: Indicates the need for discussion on issues related to UI/UX enhancements, accessibility or other design refinements.
Documentation: Highlights issues related to improving project documentation or tutorials. These issues involve updating or creating documentation to provide better guidance to users.

Labels for Issue Themes with Special Focus on Design:

UI/visual design: Indicates issues that primarily involve visual design elements. These issues focus on improving the visual aesthetics, layout, and typography of the user interface. This also includes maintaining consistency in branding elements and visual representation.
UX/usability: Focuses on issues related to improving the overall user experience and interaction flow. These issues involve conducting user research, creating wireframes, and improving usability. Note: combine with existing Usability label.
Accessibility: Highlights issues related to making the application more accessible to users with disabilities. These issues involve ensuring compliance with accessibility standards and addressing accessibility barriers.
Information Architecture: Focuses on issues related to organizing and structuring information within the application. These issues involve improving data organization, navigation, and information hierarchy.

Plus additional labels coming from the Theme category in the current label spreadsheet. created by Tom

Labels for Status:

Help Wanted: Indicates issues where additional assistance is needed from the community. These issues may require collaboration from various contributors.
In Progress: Shows that someone is actively working on the issue or pull request. This label helps prevent duplication of effort by indicating that the item is already being addressed.
Blocked: Reserved for when an issue or pull request cannot progress further due to external dependencies or other factors. This label signals that the item is on hold until the blocking issue is resolved.
Closed: Denotes that the issue or pull request has been addressed, resolved, or incorporated into the project. It's used to signify that no further action is required.
Duplicate: Assigned to issues that are exact duplicates of other open issues. This label helps keep the issue tracker organized and avoids duplicate discussions.
Pending Review: Indicates that the issue or pull request is awaiting review by project maintainers or collaborators.
Needs More Information: Indicates issues that lack sufficient information for the project team to act upon. This label prompts the reporter to provide additional details.

Plus additional labels coming from the Status category in the current label spreadsheet.

Labels for PR:

WIP (Work in Progress): Indicates that the pull request is still in the early stages and not ready for review or merging. It's a signal that the contributor is actively working on it and seeking feedback.
Needs Review: Shows that the pull request is ready for review and feedback from other contributors or project maintainers.
Changes Requested: Used when reviewers have requested changes to be made in the pull request before it can be considered for merging.
On Hold: Indicates that the PR is temporarily on hold due to external factors or other considerations.
Documentation: Indicates that the pull request contains documentation changes or additions.
Testing: Indicates that the pull request includes changes related to testing or test cases.
Dependency Update: Used for pull requests that update the project's dependencies to newer versions.
Security: Applied to pull requests that address security-related issues or vulnerabilities.

Labels for Priority:

High: Denotes critical issues that require immediate attention and may be blocking progress.
Medium: Represents important issues that need to be addressed but are not as urgent as high-priority items.
Low: Indicates less critical issues that can be dealt with at a later stage when higher-priority tasks are completed.
Priority: Urgent: Similar to "High Priority," it highlights issues that demand immediate action.

Labels for Difficulty Level / Skill Level:

Good First Issue: Indicates issues suitable for newcomers to design or coding, providing a gentle introduction to the project. These issues have clear instructions, mentorship available, and require basic skills.
Intermediate: Identifies moderately challenging issues that require some experience and familiarity with the project. These issues require a deeper understanding of the codebase and may involve complex implementation.
Advanced: Highlights complex issues that require deep expertise and advanced skills. These issues may involve architectural changes, performance optimizations, or in-depth knowledge of specific technologies.
Up for Grabs: Indicates issues that are available for anyone in the community to pick up and work on. These issues are open for contribution and collaboration.

Labels for Development Module:

Frontend: Identifies issues related to the user interface and client-side development. These issues involve working on HTML, CSS, and JavaScript code that affects the user interface.
Backend: Focuses on issues related to server-side functionality and infrastructure. These issues involve working on server-side code, APIs, or data processing logic.
API: Highlights issues related to the application programming interface or integration with external systems. These issues involve working on API design, documentation, or compatibility.

Plus additional labels coming from the Module category in the current label spreadsheet.

Labels for Platform:

Windows: Indicates that the issue or feature is specific to the Windows operating system.
macOS: Denotes that the item is relevant to macOS users or macOS-specific environments.
Linux: Used for issues or features that pertain to Linux-based systems.
Cross-platform: Indicates that the issue or feature affects multiple platforms or has implications across different environments.
Server: Used when the issue or feature pertains to server environments or server-related components.

And here's the document where the above informatuon can be found. https://docs.google.com/document/d/19LLxQxQNgELxSuxT8nwgoWEHMdlGjjnJO20zFsZpLak/edit

@Lydiaofficial can you make this document publicly available? I cannot access it.

thank you

1 Like

You can try again now. The access has been fixed.

Thanks a lot for that!
Looking at your list, it is difficult for me to tell which of your items relate to existing labels (possibly renamed?) or new ones.

Also, for readability, it would help if your message was formatted with bullet points to separate each label (similarly to your google doc). It might also help if the label itself is formatted differently from its description, so that one can easily identify it. For instance:

  • Closed: Denotes that the issue or pull request has been addressed, resolved, or incorporated into the project. It's used to signify that no further action is required.

instead of:

Closed: Denotes that the issue or pull request has been addressed, resolved, or incorporated into the project. It's used to signify that no further action is required.

More generally I wonder how to discuss the proposed changes: should we make comments in your Google Doc on each labels and their descriptions? I can imagine it getting quite dense pretty quickly. Did you have a particular process in mind?

1 Like

Hi @antonin_d , Yes, you can go ahead and make comments on the document

Okay, but it would help me if you could specify for each label in your list whether it exists already, whether it is obtained by renaming an existing one, or if it is a new label. Do you think you could add this information there?

Alright sure.

I can do that.