Standard for Public Code assessment

In my day job at Foundation for Public Code I am one of the maintainers of the Standard for Public Code. As OpenRefine is being used by both academics, GLAM staff etc. it could arguably fit into the definition of what we call public code. Therefore, and also because OpenRefine is a more of a community-based codebase than other codebases developed for public organizations, we thought it would be a useful exercise to do an assessment of OpenRefine towards the Standard for Public Code. I'll paste in our first round of assessment, @antonin_d also had a chance to chime in on it. As you can see, there are still some question marks. Please comment if you see something that has been overlooked or misunderstood, or if you know of some better links for the notes. (I guess this will also be an experiment in very long forum posts.)

Code in the open

:ballot_box_with_check: criterion met.

Meets Requirement Notes and links
N/A All source code for any policy in use (unless used for fraud detection) MUST be published and publicly accessible. Keep an eye open where OpenRefine is used, there might exist at some libraries or universites.
Ok All source code for any software in use (unless used for fraud detection) MUST be published and publicly accessible. Github
Ok The codebase MUST NOT contain sensitive information regarding users, their organization or third parties. Given that it has been public for this long.
Ok Any source code not currently in use (such as new versions, proposals or older versions) SHOULD be published. Releases are available since Jul 24, 2013.
N/A Documenting which source code or policy underpins any specific interaction the general public may have with an organization is OPTIONAL.

Bundle policy and source code

:ballot_box_with_check: criterion met.

Meets Requirement Notes and links
N/A The codebase MUST include the policy that the source code is based on. Keep an eye open where OpenRefine is used, there might exist at some libraries or universites.
N/A If a policy is based on source code, that source code MUST be included in the codebase, unless used for fraud detection. See above.
N/A Policy SHOULD be provided in machine-readable and unambiguous formats. See above.
N/A Continuous integration tests SHOULD validate that the source code and the policy are executed coherently. See above.

Make the codebase reusable and portable

โ˜ criterion met.

Meets Requirement Notes and links
Ok The codebase MUST be developed to be reusable in different contexts. Can be run locally, no configuration required.
Ok The codebase MUST be independent from any secret, undisclosed, proprietary or non-open licensed software or services for execution and understanding.
Ok The codebase SHOULD be in use by multiple parties. Some examples: Google Scholar search results
Ok The roadmap SHOULD be influenced by the needs of multiple parties. No single roadmap, discussions collected under the roadmap tag in forum, explanation in docs
Ok The development of the codebase SHOULD be a collaboration between multiple parties. Most developers are either volunteers or grant funded.
N/A Configuration SHOULD be used to make source code adapt to context specific needs. No configuration needed.
Ok The codebase SHOULD be localizable. Documentation, translation platform
Ok Source code and its documentation SHOULD NOT contain situation-specific information.
Codebase modules SHOULD be documented in such a way as to enable reuse in codebases in other contexts. Work has started through GitHub issue - Packages published on Maven Central but is not quite complete.
Ok The software SHOULD NOT require services or platforms available from only a single vendor.

Welcome contributors

โ˜ criterion met.

Meets Requirement Notes and links
Ok The codebase MUST allow anyone to submit suggestions for changes to the codebase. Pull requests
Ok The codebase MUST include contribution guidelines explaining what kinds of contributions are welcome and how contributors can get involved, for example in a CONTRIBUTING file. CONTRIBUTING
Ok The codebase MUST document the governance of the codebase, contributions and its community, for example in a GOVERNANCE file. GOVERNANCE
The contribution guidelines SHOULD document who is expected to cover the costs of reviewing contributions. README or CONTRIBUTING could include a sentence or two explaining that volunteers reveiw contributions on best-effort basis.
Ok The codebase SHOULD advertise the committed engagement of involved organizations in the development and maintenance. Explained on the wiki but will be replaced by a page on the website
The codebase SHOULD have a publicly available roadmap. No single roadmap, discussions collected under the roadmap tag in forum, explanation in docs (an improvement to this page and linked from the README would be sufficient)
Ok The codebase SHOULD publish codebase activity statistics. GitHub pulse
Ok Including a code of conduct for contributors in the codebase is OPTIONAL. CODE OF CONDUCT

Make contributing easy

:ballot_box_with_check: criterion met.

Meets Requirement Notes and links
Ok The codebase MUST have a public issue tracker that accepts suggestions from anyone. Issues
Ok The documentation MUST link to both the public issue tracker and submitted codebase changes, for example in a README file. GitHub
Ok The codebase MUST have communication channels for users and developers, for example email lists. Forum
Ok There MUST be a way to report security issues for responsible disclosure over a closed channel. GitHub security advisory
Ok The documentation MUST include instructions for how to report potentially security sensitive issues. SECURITY

Maintain version control

โ˜ criterion met.

Meets Requirement Notes and links
Ok All files in the codebase MUST be version controlled. Git
All decisions MUST be documented in commit messages. There is a policy but it is not strongly enforced.
Every commit message MUST link to discussions and issues wherever possible. There is a policy but it is not strongly enforced.
Ok The codebase SHOULD be maintained in a distributed version control system. Git
Contribution guidelines SHOULD require contributors to group relevant changes in commits.
Ok Maintainers SHOULD mark released versions of the codebase, for example using revision tags or textual labels. Releases
Contribution guidelines SHOULD encourage file formats where the changes within the files can be easily viewed and understood in the version control system.
It is OPTIONAL for contributors to sign their commits and provide an email address, so that future contributors are able to contact past contributors with questions about their work.

Require review of contributions

โ˜ criterion met.

Meets Requirement Notes and links
Ok All contributions that are accepted or committed to release versions of the codebase MUST be reviewed by another contributor. Both a policy and branch protection
Ok Reviews MUST include source, policy, tests and documentation. Simple guide and Maintainer guidelines
Ok Reviewers MUST provide feedback on all decisions to not accept a contribution. Policy to answer all PRs
Ok The review process SHOULD confirm that a contribution conforms to the standards, architecture and decisions set out in the codebase in order to pass review. Maintainer guidelines
Ok Reviews SHOULD include running both the software and the tests of the codebase. Simple guide and Maintainer guidelines
Contributions SHOULD be reviewed by someone in a different context than the contributor. De facto mostly true, but no explicit policy more than the reviewer should be someone else than one submitting the PR
Ok Version control systems SHOULD NOT accept non-reviewed contributions in release versions. Master branch branch protected
Reviews SHOULD happen within two business days.
Performing reviews by multiple reviewers is OPTIONAL.

Document codebase objectives

:ballot_box_with_check: criterion met.

Meets Requirement Notes and links
Ok The codebase MUST contain documentation of its objectives, like a mission and goal statement, that is understandable by developers and designers so that they can use or contribute to the codebase. Opening paragraph of README
N/A Codebase documentation SHOULD clearly describe the connections between policy objectives and codebase objectives.
Documenting the objectives of the codebase for the general public is OPTIONAL.

Document the code

โ˜ criterion met.

Meets Requirement Notes and links
All of the functionality of the codebase, policy as well as source code, MUST be described in language clearly understandable for those that understand the purpose of the codebase.
Ok The documentation of the codebase MUST contain a description of how to install and run the software. Brief in README, detailed in user manual
Ok The documentation of the codebase MUST contain examples demonstrating the key functionality. Plenty of small examples in the docs and many tutorials at External Resources
Ok The documentation of the codebase SHOULD contain a high level description that is clearly understandable for a wide audience of stakeholders, like the general public and journalists. The opening sentence of README is okay.
Ok The documentation of the codebase SHOULD contain a section describing how to install and run a standalone version of the source code, including, if necessary, a test dataset. Installing
Ok? The documentation of the codebase SHOULD contain examples for all functionality.
The documentation SHOULD describe the key components or modules of the codebase and their relationships, for example as a high level architectural diagram. Architecture is explained, a diagram would help
There SHOULD be continuous integration tests for the quality of the documentation.
Including examples that make users want to immediately start using the codebase in the documentation of the codebase is OPTIONAL.

Use plain English

โ˜ criterion met.

Meets Requirement Notes and links
Ok All codebase documentation MUST be in English.
Ok All source code MUST be in English, except where policy is machine interpreted as code.
N/A All bundled policy not available in English MUST have an accompanying summary in English.
Any translation MUST be up to date with the English version and vice versa.
There SHOULD be no acronyms, abbreviations, puns or legal/non-English/domain specific terms in the codebase without an explanation preceding it or a link to an explanation.
Documentation SHOULD aim for a lower secondary education reading level, as recommended by the Web Content Accessibility Guidelines 2.
Providing a translation of any code, documentation or tests is OPTIONAL.

Use open standards

โ˜ criterion met.

Meets Requirement Notes and links
For features of the codebase that facilitate the exchange of data the codebase MUST use an open standard that meets the Open Source Initiative Open Standard Requirements.
Any non-open standards used MUST be recorded clearly as such in the documentation.
Any standard chosen for use within the codebase MUST be listed in the documentation with a link to where it is available.
Any non-open standards chosen for use within the codebase MUST NOT hinder collaboration and reuse.
If no existing open standard is available, effort SHOULD be put into developing one.
Open standards that are machine testable SHOULD be preferred over open standards that are not.
Non-open standards that are machine testable SHOULD be preferred over non-open standards that are not.

Use continuous integration

โ˜ criterion met.

Meets Requirement Notes and links
All functionality in the source code MUST have automated tests.
Contributions MUST pass all automated tests before they are admitted into the codebase.
The codebase MUST have guidelines explaining how to structure contributions.
The codebase MUST have active contributors who can review contributions.
Automated test results for contributions SHOULD be public.
The codebase guidelines SHOULD state that each contribution should focus on a single issue. OpenRefine/CONTRIBUTING.md at master ยท OpenRefine/OpenRefine ยท GitHub
Source code test and documentation coverage SHOULD be monitored.
Testing policy and documentation for consistency with the source and vice versa is OPTIONAL.
Testing policy and documentation for style and broken links is OPTIONAL.
Testing the software by using examples in the documentation is OPTIONAL.

Publish with an open license

โ˜ criterion met.

Meets Requirement Notes and links
Ok All source code and documentation MUST be licensed such that it may be freely reusable, changeable and redistributable. LICENSE
Ok Software source code MUST be licensed under an OSI-approved or FSF Free/Libre license. BSD 3-Clause
Ok All source code MUST be published with a license file.
Ok Contributors MUST NOT be required to transfer copyright of their contributions to the codebase.
All source code files in the codebase SHOULD include a copyright notice and a license header that are machine-readable.
Ok Having multiple licenses for different types of source code and documentation is OPTIONAL. Documentation is CC BY 4.0

Make the codebase findable

โ˜ criterion met.

Meets Requirement Notes and links
Ok The name of the codebase SHOULD be descriptive and free from acronyms, abbreviations, puns or organizational branding.
Ok The codebase SHOULD have a short description that helps someone understand what the codebase is for or what it does.
Ok Maintainers SHOULD submit the codebase to relevant software catalogs. Snap store Alternative-to Repology
Ok The codebase SHOULD have a website which describes the problem the codebase solves using the preferred jargon of different potential users of the codebase (including technologists, policy experts and managers). https://openrefine.org/
Ok The codebase SHOULD be findable using a search engine by codebase name.
Ok The codebase SHOULD be findable using a search engine by describing the problem it solves in natural language. first hit on duck duck go open source tool messy data
Ok The codebase SHOULD have a unique and persistent identifier where the entry mentions the major contributors, repository location and website. Wikidata
The codebase SHOULD include a machine-readable metadata description, for example in a publiccode.yml file.
Ok A dedicated domain name for the codebase is OPTIONAL. openrefine.org
Ok Regular presentations at conferences by the community are OPTIONAL. Events page and many listed under External resources

Use a coherent style

โ˜ criterion met.

Meets Requirement Notes and links
The codebase MUST use a coding or writing style guide, either the codebase community's own or an existing one referred to in the codebase. Style is discussed in technical reference, for Java sourse it include linting with mvn formatter:format.
Contributions SHOULD pass automated tests on style. only for tests
The style guide SHOULD include expectations for inline comments and documentation for non-trivial sections. Expectations on documentation, but no mentions of inline comments
Including expectations for understandable English in the style guide is OPTIONAL.

Document codebase maturity

โ˜ criterion met.

Meets Requirement Notes and links
Ok The codebase MUST be versioned. Releases
The codebase MUST prominently document whether or not there are versions of the codebase that are ready to use.
Codebase versions that are ready to use MUST only depend on versions of other codebases that are also ready to use. Not all used libraries are stable, for example Odfdom Java
The codebase SHOULD contain a log of changes from version to version, for example in the CHANGELOG. Each release on the the GitHub releases page contains good notes, and there is a Whats New, but not a singular ChangeLog per se.
The method for assigning version identifiers SHOULD be documented. Looks like it might be semver, but not obviously documented as such.
It is OPTIONAL to use semantic versioning.
3 Likes

I think we are OK on this by default by the fact that our Git repository that stores commits is actually GitHub, and has this optional policy already in it's infrastructure. If we were to move to another provider say like Gitlab or Apache, does this mean we need to have some wording improvements in our CONTRIBUTING.md file? Confused on this one.

@thadguidry Yeah, platforms can really help. We didn't check the complete history yet to see if there were emails for all past commits, I think that was why we left it open for now. And if we move, both adding it to CONTRIBUTING would be good, but also making sure it is enforced before merging.

As a follow-up question, where would be a good place to publish this assessment? If there was an ambition to go for full compliance, I would suggest linking it somewhere from the Contributing to OpenRefine section, but given the lack of this, I am not sure where to put it. It could possibly fit as a blog post. Any other ideas?