Extension Operations during Project load

I'm working on a fix for the RDF Transform extension and have come across an issue when a project is loading. I'm using the Project ID to manage some project specific settings in the extension.

During project load, the extension's registered operation constructors are being called directly via JSON annotation @JsonCreator and not calling the class's prescribed reconstruct() method to create the operation instances. This is an issue as the Jackson process is not passing the Project instance to the constructor.

The reconstruct method is described as:
static public AbstractOperation reconstruct(Project theProject, JSONObject jobj) throws Exception

The constructor is described as:
@JsonCreator public SaveRDFTransformOperation( @JsonProperty(RDFTransform.KEY) RDFTransform theTransform)

As the @JsonCreator annotation also constructs a passed RDFTransform instance, it would be sufficient except that it needs the Project instance during construction (it is passed as null by the Jackson process). The alternative reconstruct() method would create a proper RDFTransform instance as it receives a Project instance.

Additionally, in a Change class, the class method load() does not receive a Project instance.

Is there a way to either have the Jackson process pass a Project ID via some key or force the project loading process to use the prescribed reconstruct() method as documented so that the project can be resolved?

NOTE: I can find no reference in the current OpenRefine code that calls a reconstruct() method.

In the HistoryEntry class, I found @JacksonInject("projectID") long projectID that might work.

EDIT:
Project loadFromReader() creates an ObjectMapper copy that injects the Project ID, but then doesn't use it:

ObjectMapper mapper = ParsingUtilities.mapper.copy();
        InjectableValues injections = new InjectableValues.Std().addValue("project", project);
        mapper.setInjectableValues(injections);
...
                    try {
                        OverlayModel overlayModel = ParsingUtilities.mapper.readValue(value, klass);

                        project.overlayModels.put(modelName, overlayModel);
                    } catch (IOException e) {
                        logger.error("Failed to load overlay model " + modelName);
                    }

Should
ParsingUtilities.mapper.readValue(value, klass);
be changed to
mapper.readValue(value, klass);
so that the InjectableValues for the project can be applied?

I'm still trying to familiarize myself with this part of the codebase, but in the meantime I'd like to make sure I'm understanding the issue properly:
An operation class in the RDF transform extension (specifically SaveRDFTransformOperation but potentially others also) requires a Project instance or a project ID when created. However, when OpenRefine is instantiating these operations, it's doing so through the Jackson deserialization method, which is not passing a project ID despite using the recommended JacksonInject annotation.
Does any of that sound incorrect?

I'll need to read more of the code surrounding loadFromReader before I feel comfortable commenting on your second post.

Would be useful to add code comments as you explore and come to understanding. I'd be disappointed that you'd keep gained insight to yourself and would like to see knowledge shared directly in our code.

Certainly! Much of what I've learned has been through existing code comments. Once I have confirmation that I understand the issue correctly I'll both open a GitHub issue and add any relevant comments to the code itself.

Indeed. I figured it out for the most part. The documentation for the process is, well, out-of-date. The current process relies on Jackson with @JsonCreator, @JsonProperty, @JsonAlias, @JsonSetter, @JsonGetter, and such. I've learned a lot about Jackson over the last few days.

Debug Note:
What helped me narrow down the issue was setting up a method with @JsonAnySetter to get the key-value pair so I could see what it was doing.

Solution:
There are a number of solutions to the problem, but, in general, set up either a constructor or static class method with @JsonCreator it "intercept" the Jackson process to catch the location for your key with @JsonProperty (and @JsonAlias as needed) in the JSON blob.

The old static public AbstractOperation reconstruct(Project theProject, JSONObject jobj) method can be used with the @JsonCreator on your Operation class, but the Project will be null (unless a @JacksonInject can be used--it currently can't). That is generally not a problem as the JSON blob should be a complete structure for your class used by the operation. See the Extension Hook below for issues.

Or...use the @JsonCreator with @JsonProperty for each key parameter on an Operator constructor (ctor) that processes...in part or whole...the JSON blob. If the ctor takes a JsonNode, it'll get the whole blob and you can process it like the old reconstruct(). The ctor can alternatively take your bespoke class. In that case, Jackson will pass process to your class ctor you set up with its own @JsonCreator with @JsonProperty for each key parameter...so subdividing the JSON blob for your selected portions. Again, you can use JsonNode parameters, bespoke classes, or simple String and Integer classes for simple values.

Jackson will attempt a best effort to find ctors and set... get... methods to satisfy the JSON load without annotations if your method names are the same as your key names. However, it's better to be explicit.

As an example, see the RDF Transform GitHub project's SaveRDFTransformCommand, SaveRDFTransformOperation, and RDFTransform classes.

Full Disclosure:
The initial project I was using to test the process was fubar'ed. The data file had replaced my 'rdf-transform" key with just "transform" for some reason, so it wasn't finding the key by the normal key name. On top of that, the value was "null", so I wasn't getting the results I was expecting. This was likely because the project was saved while the process was broken (I'm not sure how the the key name changed). Visually inspecting the data file in the data zip helped debug the issue.

Finally, using a "good" project got me on the right path.

Load Process Notes:

  1. The project loading process loads the data file which holds (multiple) JSON blobs for your Operation classes. Presumably, each blob is a prior state. The load process works directly on your Operation class to load these states via Jackson (not its static reconstruct() method).
  2. Your own Command class that uses your Operation class will generally use the static reconstruct() method.
  3. Additionally, your history changes are loaded into your Change class(es) via a static load() method (old skool, not via Jackson):
    static public Change load(LineNumberReader theReader, Pool thePool)
    I think of Change classes as a kind of Operation class (they are not in OR, but they are essentially "change operations". These blobs likely hold your bespoke class and could/should use a static reconstruct() method like:
    static public RDFTransform reconstruct(JsonNode jnodeTransform)
    in the bespoke class to make its own instances (it's really a factory pattern). These JSON blobs contain complete structures ("new" and "old" at the time of the change), so the Project is not required. In fact, a "reconstruct" process always implies a complete structure so the Project shouldn't be required. See the Extension Hook below for issues.

Extention Hooks:
If you are doing something EXTRA, like saving and loading data for your own extensions purpose, you may have both general and per project data.

It is essential that the Project (or Project ID) MUST be included in all extension related calls so that per Project data management can be effective for extensions external to OR when appropriate. This means all registered classes and instances with OR prescribe methods MUST have a Project or ProjectID parameter for those methods.

For instance, RDF Transform uses the cache/rdf-transform directory to store a Lucene engine holding ontology information in general and per project for namespaces and their prefixes for RDF data. It also store other related data files (general and per project) in that directory.

Getting the Project to an extension process can be in the form of:

  1. a method call that takes a Project (or ProjectID)
  2. a Jackson @JacksonInject annotation with a "project" or "projectID" key (these two were found in the OR code).

I found that the "projectID" key is injected for the Operation class during the project load process:

/**
 * Constructor for deserialization via Jackson
 */
@JsonCreator
public SaveRDFTransformOperation(
    @JacksonInject("projectID") long iProjectID,
    @JsonProperty("op") String strOpCode,
    @JsonProperty(RDFTransform.KEY)
    @JsonAlias( { "rdf_transform", "rdfTransform" } )
        RDFTransform theTransform )
{ ... }

For Change classes, however, currently there is no way (that I could find) to get the Project or ProjectID in the load() method. The server-side process cannot determine what project is "active" for a given process. The client-side does this and is received on the server-side by doPost() and doGet() methods in a Command class:

@Override
    public void doPost(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException { ... }

Initialization Note:

  • This is extra credit material!

For extensions, the controller.js file is a server-side JavaScript executable. The server-side JavaScript engine is very limited. So, since it hooks to the Java engine anyway, it can be minimized by using an InitializationCommand class:

public class InitializationCommand extends Command

The controller.js file is reduced to:

var logger = Packages.org.slf4j.LoggerFactory.getLogger("RDFT:Controller");
var RDFTCmd = Packages.org.openrefine.rdf.command;

function init() {
    /*
     * Fool Butterfly:
     *      Make the extension's Initializer do all the heavy lifting instead of the
     *      limited server side JavaScript processor for this "controller.js".
     *      NOTE TO SELF: Doh, I should have seen this a long, long, LONG time ago.
     *
     *  Server-side Initialization Command...
     *    The InitializationCommand constructor calls its initialize() method...where all the magic happens.
     */
    logger("Initializing RDF Transform...");
    new RDFTCmd.InitializationCommand(module); // ...self register
}

function process(path, request, response) { ... } // ...required
function send(request, response, template, context) { ... } // ...if needed

Ideally, the InitializationCommand should return a boolean (maybe with a message) to OR via the controller.js script indicating whether it successfully initialized. OR could then post a client-side message for the extension on failure for the user. This would hint to look at the logs at the very least.

function init() {
    logger("Initializing RDF Transform...");
    const cmdInit = new RDFTCmd.InitializationCommand(module); // ...self register
    return cmdInit.getSuccess();
}

As an example, see the RDF Transform GitHub project's controller.js file and the InitializationCommand class for the full documentation.

I'd like to get some feedback on the above when anyone has some time. It should likely result in some issues posted for the OR code and documentation, but I'd like to make sure I've covered the issues correctly and haven't missed anything.

Thanks!

1 Like

I'd like to get some feedback on the above when anyone has some time. It should likely result in some issues posted for the OR code and documentation, but I'd like to make sure I've covered the issues correctly and haven't missed anything.

I missed this because it was tagged "Help Desk" rather than "Development." I'll try to review that message, but if it's possible to summarize into a key question or two, that would help focus my attention during the review.

I'm using the Project ID to manage some project specific settings in the extension.

I can't, off the top of my head, think of any project specific settings in base OpenRefine, so this impedance mismatch could be part of the issue. Also, it wouldn't surprise me to find that Change is intentionally Project agnostic, which would be another significant disconnect.

Antonin is the one who wrote all the Jackson code and understands it the best, but this seems like it might be more an architectural question than a Jackson mechanics question.

Tom

p.s. Where possible, links to code snippets would be an awesome addition to messages like this so that they could be easily viewed in context

p.s. Looking at the forum, it appears that this thread is actually in the Development subforum. I'm not sure if the email notifications are messed up and adding the wrong subject tags or if it got moved after the thread was started.

I worked my way through that message and, although there are more detailed comments inline below, I think most of the difficulty is caused by starting from a bad premise.

Operations are intentionally Project independent, which is why they don't get passed a Project ID.
You probably want whatever information you were going to stash there to instead be saved in an OverlayModel that you create to hold it.

The documentation for the process is, well, out-of-date.

Documentation PRs to correct this situation gratefully accepted. A link to the documentation in question would help ground the discussion.

The current process relies on Jackson with @JsonCreator, @JsonProperty, @JsonAlias, @JsonSetter, @JsonGetter, and such. [...]

Yes, the JSON serialization/deserialization was switched to use Jackson back in 2018.

Solution:

There are a number of solutions to the problem, but, in general, set up either a constructor or static class method with @JsonCreator it "intercept" the Jackson process to catch the location for your key with @JsonProperty (and @JsonAlias as needed) in the JSON blob.

The "problem" being how to pass the Project to an Operation? I don't think that's something that aligns with the internal architecture because the Javadoc explicitly says

An operation can be applied to different but similar projects.

Which strongly indicates that an Operation doesn't "belong" to a Project. The createProcess() and createHistoryEntry() methods take Project parameters which is another strong clue that Operations are Project independent.

The old static public AbstractOperation reconstruct(Project theProject, JSONObject jobj) method can be used with the @JsonCreator on your Operation class, but the Project will be null (unless a @JacksonInject can be used--it currently can't). That is generally not a problem as the JSON blob should be a complete structure for your class used by the operation. See the Extension Hook below for issues.

That method was added and then removed again in 2018, so references to it are, I believe, obsolete.

Load Process Notes:

  1. The project loading process loads the data file which holds (multiple) JSON blobs for your Operation classes. Presumably, each blob is a prior state. The load process works directly on your Operation class to load these states via Jackson (not its static reconstruct() method).

The project loads the rows of data, the history entries (not Operations) and a few other things. You can see the load process here.

  1. Your own Command class that uses your Operation class will generally use the static reconstruct() method.

I don't think I agree with this, but I'm not sure on what basis it is stated.

  1. Additionally, your history changes are loaded into your Change class(es) via a static load() method (old skool, not via Jackson):
    static public Change load(LineNumberReader theReader, Pool thePool)

The project data uses a custom serialization format, which contains embedded JSON, but none of this should be relevant or visible to extensions.

I think of Change classes as a kind of Operation class (they are not in OR, but they are essentially "change operations".

I wouldn't think of things that way. A Change includes a reference to an Operation, but they are two distinct things.

If you are doing something EXTRA, like saving and loading data for your own extensions purpose, you may have both general and per project data.

If you need extra data associated with the project, you might want to investigate implementing an OverlayModel like the Wikibase extension does here.

For instance, RDF Transform uses the cache/rdf-transform directory to store a Lucene engine holding ontology information in general and per project for namespaces and their prefixes for RDF data. It also store other related data files (general and per project) in that directory.

This makes it sound like the project isn't self-contained and can't be exported, which doesn't sound desirable. For a cache it's fine, but any information critical to the project should be included in your OverlayModel so that it gets serialized with the project.

**Initialization Note**:
  • This is extra credit material!

For extensions, the controller.js file is a server-side JavaScript executable. The server-side JavaScript engine is very limited.

It uses the Rhino Javascript engine, which is a complete, but earlier version, JavaScript implementation.

So, since it hooks to the Java engine anyway, it can be minimized by using an InitializationCommand class:

public class InitializationCommand extends Command

The controller.js file is reduced to:

var logger = Packages.org.slf4j.LoggerFactory.getLogger("RDFT:Controller");
var RDFTCmd = Packages.org.openrefine.rdf.command;

function init() {
    /*
     * Fool Butterfly:
     *      Make the extension's Initializer do all the heavy lifting instead of the
     *      limited server side JavaScript processor for this "controller.js".

I'm not sure what this is trying to say, but whenever I see things like "Fool Butterfly," it sets off alarm bells. If the way that Butterfly handles the initialization is problematic, I'd rather see that addressed than "fooling" the system.

Since a typical controller.js file only contains a handful of extensions registrations, neither performance nor advanced Javascript syntax should be an issue.

Extension writers are, of course, free to organize things in whatever way that makes sense to them, but using common patterns among all the extensions help developers orient themselves.

We've discussed using a different, more declarative, extension registration, but since this is just a handful of lines of code, it hasn't been a priority.

Tom

Tom, thanks for the review. Very much appreciated!. I'll try to address some of the issues.

Yes, the thread was moved.

I do expect I'll be writing up an Extensions HowTo to replace the prior, out-of-date, documents such as Giuliano Tortoreto's OpenRefine Extension Doc and the great Owen Stephens' Writing an Extension to Add New GREL Functions to OpenRefine. It would be ideal to document it all in an OpenRefine space instead of a separate document--likely at OR's Writing extensions.

Even today, the documentation specifies the use of "reconstructors". See Writing extensions: Server-side: Operations and Writing extensions: Server-side: Overlay Models

Project Specific Setting

To clarify, an extension may manage its own settings and data including settings and data that apply to specific projects. Some of this information should certainly be stored in an Overlay and any Change blobs for a specific project. I shouldn't need to store the Project information in these blobs since the Overlay and Change blobs are loading for a specific project anyway.

Commands and Operations:

Operations, in general, are project independent. However, they are applied to a specific project. Since operations are associated with commands (i.e., commands carry out operations), knowing what project they are applied to can help an extension determine context for its operations used in a specific project. Certainly, the user selected a command that implements operations to be applied to the specific project...not every project. This is why the AbstractOperation base class has:

public Process createProcess(Project project, Properties options)

and

protected HistoryEntry createHistoryEntry(Project project, long historyEntryID)

and

protected String getBriefDescription(Project project)

Then, the Project is essential to operations.

When the project is loaded, reconstruction manages the history of those operations on a specific project. However, the Project parameter is NOT specifically required via the Jackson process like it was using the older "reconstruct" method. However, the Project ID is available during reconstruction via:

@JacksonInject("projectID") long iProjectID

If @JacksonInject("projectID") is the way to fix this issue, then problem solved. It was't documented anywhere.

To be fair, a reconstruction should be "complete", i.e., the data loaded from OR should contain everything the Operation object needs to "completely" reconstruct itself. However, an extension may be doing more than what OR "needs"...that is why it is an extension! The JSON data may just contain reference data that an extension uses to "complete" its operations. The Project may be essential reference information for that reconstruction.

In my case, the RDF Transform Operations take a Project in its ctors. The related Command creates those Operations with the Project. The old reconstructor recreated the Operation object with the Project. The Jackson process had no way of knowing that, is undocumented, and essential silently "broke" the reconstruction process...which is what caused the Full Disclosure failure.

Making the project information available universally is a low cost addition since every post or get from the client-side gives the server-side access to the project information. Purposely not making it available seems a bit heavy-handed. OR shouldn't be divining what an extension should or shouldn't be doing or needing in its own internal processes.

Changes:

Change objects are applied to or reverted from a project:

void apply(Project var1);
void revert(Project var1);

Then, why wouldn't the project be accessible in the static Change load() method? Again, like Operations, the Project should be available as reference data so that the extension can do whatever it needs to do. Currently, there is no work-around.

For RDF Transform, there is only one Change class for saving the RDF Transform template state (which is its Overlay data). So, it obviously needs the Project to do that:

... = (RDFTransform) theProject.overlayModels.get(RDFTransform.EXTENSION);
theProject.overlayModels.remove(RDFTransform.EXTENSION);
theProject.overlayModels.put(RDFTransform.EXTENSION, this.thePreviousTransform);

Does RDF Transform need the Project information in the static load() method? No. But it could if it were processing some ancillary change data about the project in the extension (or whatever else it wants to do). We can make esoteric arguments about why an extension should store that information in OR. But, why be pedantic about it? Give extension developers freedom to manage their own system. The Project information is essential to that freedom.

Overlays:

RDF Transform certainly uses an Overlay Model as seen above. It uses the overlay to store the RDF Transform template. However, the overlay model cannot store a Lucene database (well, it could, but that is a heavy lift). Its use is not necessarily critical to the project. It is a support function. It helps the user select class and property values used in the RDF Transform template. Some of the Lucene database entries are project agnostic while other entries reference the project specifically (adding and removing an RDF namespace, its prefix, and all the relate ontology classes and properties to/from a project).

Storing all this in an Overlay would lead to large blobs and need to be reconstructed anyway. Instead, the overlay holds reference information on how the ontology was found (URL, file, or none). For a project, a Lucene database can be reconstructed for a project if it can get to the URL or load a cached server-side, locally stored ontology file. If not, nothing is significantly impacted by its loss...just inconvenienced.

To share a project, an extension can package its own data, if needed, to augment an OR project export. Additionally, it could extend the OR project export if it were given the proper hooks (hint). A developer could just use the Project's directory to store additional content, but I'm not sure how that would be done ATM. I think it would definitely need the Project ID at a minimum.

Extension Initialization

When I was redeveloping the older RDF extension, I was attempting to manage the initialization process. I was having difficulty with many JS features (oddly, simple looping wasn't supported). This may be fixed in the newer Rhino JS.

It finally occurred to me that the server-side JS processor is just calling a bunch of Java classes and objects!!! So, why not just pass all control over to an Initialization Command object and stop the overhead insanity of transferring calls between the JS and Java systems. I agree that the code is relatively small and the performance impact is minimal. However, this version of a "common pattern" seems much cleaner, IMNSHO.

The comment "Fool Butterfly" may be a technically incorrect--I was just trying to indicate that Butterfly's use for executing the "controller.js" file is drastically minimized. The minimized "controller.js" file is a simplified, basic extension launcher. During initialization, it allows the developer to harness all the power of Java with no down side. It also aligns the initialization process via a Command class like all the other extension Command classes. To me, it feels more elegant than using a not-so "pure" JS solution for server-side initialization.

This could reduce (or eliminate?) dependence on Simile Butterfly (and its maintenance). Could OR replace it with the Java Method Server (JMS)?

I'm not sure how the function process(path, request, response) in the "controller.js" file would be impacted as I don't know how other extensions might use it. It seems to be used as a redirector. Could the functionality be moved to Command classes instead?

I'll try to find time to investigate more on Sunday or Monday, but briefly:

  • reconstruct() has been gone since 2018, along with load(), write(), and other artifacts of the org.json days. Arguably we should have made things fail in a more obvious way, but it's not coming back (and nothing calls any of these methods anymore).
  • @JacksonInject("project") is how the Project is communicated to classes who's constructors need it, such as HistoryEntry & OverlayModel, but I don't think I'd recommend using it for Operation if there's any other possible way to communicate the necessary information. AbstractOperation is clearly documented to be Project independent and has been that way since the original developer wrote the comment in 2010.
  • The Writing Extensions page hasn't been updated to reflect the information on Migrating Older Extensions page, so is out of date
  • there are a number of other errors, broken links, etc on the above two pages which need to be cleaned up
  • loops have been in Javascript since the beginning of time. Perhaps the particular flavor of iteration syntax that you wanted to use was more modern than the current Rhino implementation, but loops were almost certainly supported. You can find the compatibility matrix here (we currently use Rhino 1.7.15)

Tom

Tom,

Concerning the "loop" issue, I don't mean to be an ass about this, but I've been programming a multitude of languages for over 40 years...before ECMAScript existed (at the beginning of time). I'm fully aware of the ECMAScript evolution. I also understand the incredulity of the statement I made about loops. When I said loops failed, I meant ALL LOOPS...for, do while, while...FAILED. I tested all of them several different ways INCLUDING simple integer value loops. I couldn't believe it myself. The FACT is, at that time, they ALL FAILED. I can't change that FACT. The taste of bile in my mouth still haunts me.

At the time, something about the Server-Side ECMAScript implementation either limited or did not include loop processing. No examples for the Server-Side ECMAScript implementation for OR extensions demonstrated any use of loops. I speculate that may have been by design since the the initialization process doesn't "need" all ECMAScript features for the rather simple initialization process...saving size, space, and symbolic tree processing.

At the time, instead of complaining about it, inspiration kicked in that it was calling Java anyway, so use Java. Problem solved.

@tfmorris is right - operation classes are meant to hold metadata about an action executed on a project, without being tied to that specific project.

@AtesComp if this is still an issue, I'd be interested to understand better what you need to solve here. If you can give us a bit more background on the original problem then maybe we can point you at solutions which don't require introducing such dependencies. Broadly speaking, I recommend two directions:

  • feeding the operation with the necessary information, which can be injected by the frontend in the operation parameters, or server-side in the Command associated with the operation (unless you're using apply-operations directly)
  • looking up the necessary information from the project at the Change level, when the operation gets executed on a specific project.

I hope it helps…

@tfmorris and @antonin_d,

I think I'm beginning to see our disconnect on this topic. I'll use a concrete example to demonstrate the issue.

@antonin_d, you are actually making my point precisely.
operation classes are meant to hold metadata about an action executed on a project, without being tied to that specific project.
That metadata is the issue. For an extension, the Project ID may be necessary metadata. The issue is about reconstructing extensive, so-called, metadata that doesn't fit well directly in the "OR Project Data" model. I can elect to use a "metadata indicator" that helps reconstruct an extension data model without overloading the "OR Project Data" model. The Project ID is a metadata indicator for that process.

Points:

  • The OR devs believe "data" is the acutely scoped "OR Project Data". Contrary, when I mentioned data I mean the obtusely scoped "any data" including so called "metadata".
  • The OR devs believe Operations only apply to "OR Project Data". Contrary, when an extension dev creates an Operation, it may apply to any data, including an extension's ancillary side-data models. This ancillary side-data may be as large or larger than the "OR Project Data".

Concrete Example:

In RDF Transform, I construct a Jena DatasetGraph, I transform all the "OR Project Data" in to that DatasetGraph for a project. I am free to construct, apply, and revert Operations on that DatasetGraph and NOT the "OR Project Data". In general, I export this data in the various RDF data formats.

  • Question: How does an Operation know which DatasetGraph it is intended for?
  • Answer: The Project ID

I understand in the "OR Project Data" centric sense that an Operation constructor does not need a Project ID--it is constructed for any "OR Project Data". However, the fact that it is constructed multiple times when a specific project loads is in contradiction to that assertion. That metadata is used to kick off an Operation constructor loading metadata directly tied to a specific project. Just because the Project ID isn't available during construction doesn't make it agnostic...it can/will get the Project ID at some time--it is required. I use it in the Operation constructor within a Command.
I don't think I can be any clearer on this subject.

  • Question: Why would the Operation constructor need to have the Project ID available?
  • Answer: The metadata does not contain an extension's ancillary dataset (in my case, a DatasetGraph which could be "large"). The Operation constructor can determine if the DatasetGraph is already loaded (possibly, by a prior extension Operation and Change instance) or not. If not, load it during this project load time. The extension can manage its own list of loaded DatasetGraphs by Project ID. It absolutely needs the Project ID to load and use the proper DatasetGraph.

There are certainly other use cases that could benefit from having universal access to the Project ID at all times.

@antonin_d, you mentioned:
looking up the necessary information from the project at the Change level, when the operation gets executed on a specific project.

  • Question: Why can't I process extension data during apply() or revert() when the Project ID is given?
  • Answer: The issue isn't about processing the extension's data during apply() or revert(). It's about LOADING the data so that it can be processed by these methods on the appropriate data model. The appropriate time for that is during project load time when the Operation and Change instances are loaded.

The Project ID should be universally available everywhere so that an extension can access it for its own needs--even (and especially) during project load processes. Operation constructors can get it via the @JacksonInject. I have not yet looked into whether Change constructors can get it.

I hope the above make it clear, it is not about when an Operation gets executed. It is about preparing an extension with its ancillary "data model" that augments an "OR Project Data" model so that an Operation can get executed.

Other Solutions:

During project load time, an Operation is not instanced once--it is instanced multiple times for every metadata JSON blob--each instance is unique to its metadata and that metadata is definitively tied to a project. Then, having the Project ID value or some access to some parent or global object (the Command) that can serve the Project ID has next to zero impact on the data structure of an Operation.

Having some base class getProjectID() method might be ideal.

Alternatively, if the Project ID is not available for Operation and Change class constructors, we need some mechanism to tell an extension that a project is loading at project load time...some project initialization method in an extension so that an extension can optionally manage its own internal processes and data related to the OR project. Maybe register a "static project initialization method"? It should likely be called earlier than the Operation and Change constructors so that it can prepared for them.

Thanks for expanding on this! From the general tone, I get the impression that you might be upset or frustrated, so I am not sure if my intervention here is actually a good idea? I'm happy to retreat if not. My intention here is to help you, by looking for a solution for your problem that is a good fit for OR's current architecture, but perhaps I come across differently?

In RDF Transform, I construct a Jena DatasetGraph, I transform all the "OR Project Data" in to that DatasetGraph for a project.

I imagine that you are doing this based on the project data at one point in the project history (the current one), right? And so you need to refresh the DatasetGraph every time an operation is done or undone, right?
I would imagine that this is only viable if the computation of the DatasetGraph is relatively quick (otherwise, by editing a single cell, you're triggering a heavy process again, which will be noticeable for the user). Or do I get this wrong?

If the computation of the DatasetGraph from the project data is quick enough, I wonder if it would be acceptable for you to compute it on demand, when it's needed for a particular project. You could still decide to cache it in some central store, and invalidate that cache when the id of the last history entry changes. You'd then be able to access this DatasetGraph via the central store by project id, any time a change is done or undone, or any command is run. Using this lazy loading approach, I imagine you wouldn't need to access the project id in an operation constructor, since there is no longer a need to prepare the ancillary data ahead of time. I'd be happy to give more details about this approach if you would like.

But if you know that you want the project id in the operation no matter what, you can also do that. I would say, you are totally free to tie an operation to a project. You are free to store any metadata as part of the operation's parameters. For instance, the project id could be stored as a field in the operation metadata. By doing that, you're making sure that that project id is available throughout the lifecycle of that Operation instance. You're also breaking the reusability of an operation on another project (via the Extract/Apply dialogs). But that's something you can do.

Maybe register a "static project initialization method"? It should likely be called earlier than the Operation and Change constructors so that it can prepared for them.

It's likely something that could be introduced, but wouldn't you also need this to be called every time the project data changes? In which order would we call each extension whenever that happens? Again, because those computations would need to remain pretty quick to keep OR usable, it makes me think that the approach I proposed above (with the on-demand computation of the DatasetGraph) would be workable as things stand.