Tidy-ness and maintainability of our codebase

  1. Generally, behind the scenes, I have been trying to continue improving our tidy-ness and maintainability to help newcomers. However, there are several problem areas. Sonarlint is pretty good with telling us about "deprecated method" and "import never used" and "Potential null pointer access" and "unused values" and "Resource leak: Closeable value" and many other categories that need to be looked into a bit deeper. Many gaps within our code base from what I see (using VSCode with Sonarlint rules and filtering upon them and also GitHub copilot). Some are extreme outliers for deep branching conditions we missed, and others are simply never reached UNTIL a user has done specific matching conditions, like opening a bunch of very large ".tar.bz2" files specifically and the input stream does not get closed and the heap grows without GC.

  2. There are artifact methods left around unused which I don't know how to address, other than removing them entirely if we know we're not going to use them in the near future?

     /**
         * For Windows file paths that contain user IDs with non ASCII characters, those 
    characters might get replaced with
         * ?. We need to use the environment APPDATA value to substitute back the original 
    user ID.
         */
    String com.google.refine.RefineServer.fixWindowsUnicodePath(String path)
    
  3. After we passed through Java 9+ there remains a few other areas where Long, Integer, etc. might need some cleanup? I'm seeing these mostly were we init within test setups:

    Ex.

    assertEquals(new Long(12345), project.rows.get(1).cells.get(0).value);
    

    The constructor Long(long) is deprecated since version 9Java(67110271)
    java.lang.Long.Long(long value)
    Deprecated It is rarely appropriate to use this constructor. The static factory
    valueOf(long) is generally a better choice, as it is likely to yield significantly better
    space and time performance.

Thoughts on some of those problem/improvement areas?

I would say that work to migrate out of deprecated APIs, fixing resource leaks, security issues is always welcome. But while IDEs and external tooling can be of a big help for such tasks, those tools are not substitutes for a good understanding of the underlying issues, such as: why is this method deprecated? is this warning spurious? is this a cosmetic change or something with deeper implications? Why is this dead code still here?
In general, blindly applying suggestions from the IDE is not so helpful on its own.

If you are worried about specific problems with our code but are not sure about what to do about, it’s always worth opening an issue about it so we can discuss it. It’s worth making clear that it’s a concern you have simply by reading the code and is not related to any user-facing problem.

  1. There are artifact methods left around unused which I don’t know how to address, other than removing them entirely if we know we’re not going to use them in the near future?

Dead code should be removed.

  1. 
    String com.google.refine.RefineServer.fixWindowsUnicodePath(String path)
    

This was missed during the review of #3840 and should be deleted.

Thoughts on some of those problem/improvement areas?

Generally I'm dubious that cosmetic issues like this are a significant impediment to new developers. In the past I've cleaned up sections of code that I'm working on, separating cosmetic changes and functional changes to their own commits, but this has caused PRs to get rejected so I've stopped.

Tom