In my day job at Foundation for Public Code I am one of the maintainers of the Standard for Public Code. As OpenRefine is being used by both academics, GLAM staff etc. it could arguably fit into the definition of what we call public code. Therefore, and also because OpenRefine is a more of a community-based codebase than other codebases developed for public organizations, we thought it would be a useful exercise to do an assessment of OpenRefine towards the Standard for Public Code. I'll paste in our first round of assessment, @antonin_d also had a chance to chime in on it. As you can see, there are still some question marks. Please comment if you see something that has been overlooked or misunderstood, or if you know of some better links for the notes. (I guess this will also be an experiment in very long forum posts.)
Code in the open
criterion met.
Meets | Requirement | Notes and links |
---|---|---|
N/A | All source code for any policy in use (unless used for fraud detection) MUST be published and publicly accessible. | Keep an eye open where OpenRefine is used, there might exist at some libraries or universites. |
Ok | All source code for any software in use (unless used for fraud detection) MUST be published and publicly accessible. | Github |
Ok | The codebase MUST NOT contain sensitive information regarding users, their organization or third parties. | Given that it has been public for this long. |
Ok | Any source code not currently in use (such as new versions, proposals or older versions) SHOULD be published. | Releases are available since Jul 24, 2013. |
N/A | Documenting which source code or policy underpins any specific interaction the general public may have with an organization is OPTIONAL. |
Bundle policy and source code
criterion met.
Meets | Requirement | Notes and links |
---|---|---|
N/A | The codebase MUST include the policy that the source code is based on. | Keep an eye open where OpenRefine is used, there might exist at some libraries or universites. |
N/A | If a policy is based on source code, that source code MUST be included in the codebase, unless used for fraud detection. | See above. |
N/A | Policy SHOULD be provided in machine-readable and unambiguous formats. | See above. |
N/A | Continuous integration tests SHOULD validate that the source code and the policy are executed coherently. | See above. |
Make the codebase reusable and portable
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
Ok | The codebase MUST be developed to be reusable in different contexts. | Can be run locally, no configuration required. |
Ok | The codebase MUST be independent from any secret, undisclosed, proprietary or non-open licensed software or services for execution and understanding. | |
Ok | The codebase SHOULD be in use by multiple parties. | Some examples: Google Scholar search results |
Ok | The roadmap SHOULD be influenced by the needs of multiple parties. | No single roadmap, discussions collected under the roadmap tag in forum, explanation in docs |
Ok | The development of the codebase SHOULD be a collaboration between multiple parties. | Most developers are either volunteers or grant funded. |
N/A | Configuration SHOULD be used to make source code adapt to context specific needs. | No configuration needed. |
Ok | The codebase SHOULD be localizable. | Documentation, translation platform |
Ok | Source code and its documentation SHOULD NOT contain situation-specific information. | |
Codebase modules SHOULD be documented in such a way as to enable reuse in codebases in other contexts. | Work has started through GitHub issue - Packages published on Maven Central but is not quite complete. | |
Ok | The software SHOULD NOT require services or platforms available from only a single vendor. |
Welcome contributors
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
Ok | The codebase MUST allow anyone to submit suggestions for changes to the codebase. | Pull requests |
Ok | The codebase MUST include contribution guidelines explaining what kinds of contributions are welcome and how contributors can get involved, for example in a CONTRIBUTING file. |
CONTRIBUTING |
Ok | The codebase MUST document the governance of the codebase, contributions and its community, for example in a GOVERNANCE file. |
GOVERNANCE |
The contribution guidelines SHOULD document who is expected to cover the costs of reviewing contributions. | README or CONTRIBUTING could include a sentence or two explaining that volunteers reveiw contributions on best-effort basis. | |
Ok | The codebase SHOULD advertise the committed engagement of involved organizations in the development and maintenance. | Explained on the wiki but will be replaced by a page on the website |
The codebase SHOULD have a publicly available roadmap. | No single roadmap, discussions collected under the roadmap tag in forum, explanation in docs (an improvement to this page and linked from the README would be sufficient) | |
Ok | The codebase SHOULD publish codebase activity statistics. | GitHub pulse |
Ok | Including a code of conduct for contributors in the codebase is OPTIONAL. | CODE OF CONDUCT |
Make contributing easy
criterion met.
Meets | Requirement | Notes and links |
---|---|---|
Ok | The codebase MUST have a public issue tracker that accepts suggestions from anyone. | Issues |
Ok | The documentation MUST link to both the public issue tracker and submitted codebase changes, for example in a README file. |
GitHub |
Ok | The codebase MUST have communication channels for users and developers, for example email lists. | Forum |
Ok | There MUST be a way to report security issues for responsible disclosure over a closed channel. | GitHub security advisory |
Ok | The documentation MUST include instructions for how to report potentially security sensitive issues. | SECURITY |
Maintain version control
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
Ok | All files in the codebase MUST be version controlled. | Git |
All decisions MUST be documented in commit messages. | There is a policy but it is not strongly enforced. | |
Every commit message MUST link to discussions and issues wherever possible. | There is a policy but it is not strongly enforced. | |
Ok | The codebase SHOULD be maintained in a distributed version control system. | Git |
Contribution guidelines SHOULD require contributors to group relevant changes in commits. | ||
Ok | Maintainers SHOULD mark released versions of the codebase, for example using revision tags or textual labels. | Releases |
Contribution guidelines SHOULD encourage file formats where the changes within the files can be easily viewed and understood in the version control system. | ||
It is OPTIONAL for contributors to sign their commits and provide an email address, so that future contributors are able to contact past contributors with questions about their work. |
Require review of contributions
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
Ok | All contributions that are accepted or committed to release versions of the codebase MUST be reviewed by another contributor. | Both a policy and branch protection |
Ok | Reviews MUST include source, policy, tests and documentation. | Simple guide and Maintainer guidelines |
Ok | Reviewers MUST provide feedback on all decisions to not accept a contribution. | Policy to answer all PRs |
Ok | The review process SHOULD confirm that a contribution conforms to the standards, architecture and decisions set out in the codebase in order to pass review. | Maintainer guidelines |
Ok | Reviews SHOULD include running both the software and the tests of the codebase. | Simple guide and Maintainer guidelines |
Contributions SHOULD be reviewed by someone in a different context than the contributor. | De facto mostly true, but no explicit policy more than the reviewer should be someone else than one submitting the PR | |
Ok | Version control systems SHOULD NOT accept non-reviewed contributions in release versions. | Master branch branch protected |
Reviews SHOULD happen within two business days. | ||
Performing reviews by multiple reviewers is OPTIONAL. |
Document codebase objectives
criterion met.
Meets | Requirement | Notes and links |
---|---|---|
Ok | The codebase MUST contain documentation of its objectives, like a mission and goal statement, that is understandable by developers and designers so that they can use or contribute to the codebase. | Opening paragraph of README |
N/A | Codebase documentation SHOULD clearly describe the connections between policy objectives and codebase objectives. | |
Documenting the objectives of the codebase for the general public is OPTIONAL. |
Document the code
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
All of the functionality of the codebase, policy as well as source code, MUST be described in language clearly understandable for those that understand the purpose of the codebase. | ||
Ok | The documentation of the codebase MUST contain a description of how to install and run the software. | Brief in README, detailed in user manual |
Ok | The documentation of the codebase MUST contain examples demonstrating the key functionality. | Plenty of small examples in the docs and many tutorials at External Resources |
Ok | The documentation of the codebase SHOULD contain a high level description that is clearly understandable for a wide audience of stakeholders, like the general public and journalists. | The opening sentence of README is okay. |
Ok | The documentation of the codebase SHOULD contain a section describing how to install and run a standalone version of the source code, including, if necessary, a test dataset. | Installing |
Ok? | The documentation of the codebase SHOULD contain examples for all functionality. | |
The documentation SHOULD describe the key components or modules of the codebase and their relationships, for example as a high level architectural diagram. | Architecture is explained, a diagram would help | |
There SHOULD be continuous integration tests for the quality of the documentation. | ||
Including examples that make users want to immediately start using the codebase in the documentation of the codebase is OPTIONAL. |
Use plain English
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
Ok | All codebase documentation MUST be in English. | |
Ok | All source code MUST be in English, except where policy is machine interpreted as code. | |
N/A | All bundled policy not available in English MUST have an accompanying summary in English. | |
Any translation MUST be up to date with the English version and vice versa. | ||
There SHOULD be no acronyms, abbreviations, puns or legal/non-English/domain specific terms in the codebase without an explanation preceding it or a link to an explanation. | ||
Documentation SHOULD aim for a lower secondary education reading level, as recommended by the Web Content Accessibility Guidelines 2. | ||
Providing a translation of any code, documentation or tests is OPTIONAL. |
Use open standards
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
For features of the codebase that facilitate the exchange of data the codebase MUST use an open standard that meets the Open Source Initiative Open Standard Requirements. | ||
Any non-open standards used MUST be recorded clearly as such in the documentation. | ||
Any standard chosen for use within the codebase MUST be listed in the documentation with a link to where it is available. | ||
Any non-open standards chosen for use within the codebase MUST NOT hinder collaboration and reuse. | ||
If no existing open standard is available, effort SHOULD be put into developing one. | ||
Open standards that are machine testable SHOULD be preferred over open standards that are not. | ||
Non-open standards that are machine testable SHOULD be preferred over non-open standards that are not. |
Use continuous integration
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
All functionality in the source code MUST have automated tests. | ||
Contributions MUST pass all automated tests before they are admitted into the codebase. | ||
The codebase MUST have guidelines explaining how to structure contributions. | ||
The codebase MUST have active contributors who can review contributions. | ||
Automated test results for contributions SHOULD be public. | ||
The codebase guidelines SHOULD state that each contribution should focus on a single issue. | OpenRefine/CONTRIBUTING.md at master ยท OpenRefine/OpenRefine ยท GitHub | |
Source code test and documentation coverage SHOULD be monitored. | ||
Testing policy and documentation for consistency with the source and vice versa is OPTIONAL. | ||
Testing policy and documentation for style and broken links is OPTIONAL. | ||
Testing the software by using examples in the documentation is OPTIONAL. |
Publish with an open license
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
Ok | All source code and documentation MUST be licensed such that it may be freely reusable, changeable and redistributable. | LICENSE |
Ok | Software source code MUST be licensed under an OSI-approved or FSF Free/Libre license. | BSD 3-Clause |
Ok | All source code MUST be published with a license file. | |
Ok | Contributors MUST NOT be required to transfer copyright of their contributions to the codebase. | |
All source code files in the codebase SHOULD include a copyright notice and a license header that are machine-readable. | ||
Ok | Having multiple licenses for different types of source code and documentation is OPTIONAL. | Documentation is CC BY 4.0 |
Make the codebase findable
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
Ok | The name of the codebase SHOULD be descriptive and free from acronyms, abbreviations, puns or organizational branding. | |
Ok | The codebase SHOULD have a short description that helps someone understand what the codebase is for or what it does. | |
Ok | Maintainers SHOULD submit the codebase to relevant software catalogs. | Snap store Alternative-to Repology |
Ok | The codebase SHOULD have a website which describes the problem the codebase solves using the preferred jargon of different potential users of the codebase (including technologists, policy experts and managers). | https://openrefine.org/ |
Ok | The codebase SHOULD be findable using a search engine by codebase name. | |
Ok | The codebase SHOULD be findable using a search engine by describing the problem it solves in natural language. | first hit on duck duck go open source tool messy data |
Ok | The codebase SHOULD have a unique and persistent identifier where the entry mentions the major contributors, repository location and website. | Wikidata |
The codebase SHOULD include a machine-readable metadata description, for example in a publiccode.yml file. | ||
Ok | A dedicated domain name for the codebase is OPTIONAL. | openrefine.org |
Ok | Regular presentations at conferences by the community are OPTIONAL. | Events page and many listed under External resources |
Use a coherent style
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
The codebase MUST use a coding or writing style guide, either the codebase community's own or an existing one referred to in the codebase. | Style is discussed in technical reference, for Java sourse it include linting with mvn formatter:format. | |
Contributions SHOULD pass automated tests on style. | only for tests | |
The style guide SHOULD include expectations for inline comments and documentation for non-trivial sections. | Expectations on documentation, but no mentions of inline comments | |
Including expectations for understandable English in the style guide is OPTIONAL. |
Document codebase maturity
โ criterion met.
Meets | Requirement | Notes and links |
---|---|---|
Ok | The codebase MUST be versioned. | Releases |
The codebase MUST prominently document whether or not there are versions of the codebase that are ready to use. | ||
Codebase versions that are ready to use MUST only depend on versions of other codebases that are also ready to use. | Not all used libraries are stable, for example Odfdom Java | |
The codebase SHOULD contain a log of changes from version to version, for example in the CHANGELOG . |
Each release on the the GitHub releases page contains good notes, and there is a Whats New, but not a singular ChangeLog per se. | |
The method for assigning version identifiers SHOULD be documented. | Looks like it might be semver, but not obviously documented as such. | |
It is OPTIONAL to use semantic versioning. |