Affects Version/s: Release 1.4, Release 1.5, Release 1.6
Fix Version/s: Release 1.6
Environment:HideDiscovered on Ubuntu 12.04 with Oracle JDK 1.7.0_25 running Clojure 1.4. Test case produced on Windows 8 with Oracle JDK 1.7.0_09 running Clojure 1.4 and Clojure 1.5.1. Analysis seems to indicate that OS and Java version are not critical. Clojure 1.6 pre-release code has not been tested, but since clojure.lang.LockingTransaction has not changed since Clojure 1.4, it seems likely the defect is still present.ShowDiscovered on Ubuntu 12.04 with Oracle JDK 1.7.0_25 running Clojure 1.4. Test case produced on Windows 8 with Oracle JDK 1.7.0_09 running Clojure 1.4 and Clojure 1.5.1. Analysis seems to indicate that OS and Java version are not critical. Clojure 1.6 pre-release code has not been tested, but since clojure.lang.LockingTransaction has not changed since Clojure 1.4, it seems likely the defect is still present.
Using the Clojure 1.4 library strictly from Java code, a simple transaction dispatches an action to an Agent. When called from a simple driver, such as a unit test, where there is no interaction with the Clojure library/runtime (specifically, clojure.lang.RT), a ConcurrentModificationException is thrown from inside LockingTransaction.run() while it is iterating through the actions list, dispatching each action to its Agent after committing the transaction.
While the circumstances under which this occurs are probably fairly rare and a simple workaround exists (see final paragraph), thus the "Minor" priority, it seems like it would not be very complicated to fix LockingTransaction to handle the actions list more safely.
Based on some debugging, here's what seems to be happening:
- Transaction A is run, dispatching action Z, which gets added, via LockingTransaction.enqueue(), to the actions list, which is a java.util.ArrayList<Agent.Action>.
- Transaction A completes and is successfully committed.
- LockingTransaction.run() does post-commit cleanup, freeing locks and putting a stop() to transaction A, which nulls the transaction's Info object reference.
- Notifications are sent and we start iterating the list of actions to be dispatched.
- The run() method calls Agent.dispatchAction(). Because the thread's transaction object is no longer considered to be "running" (due to the Info object being null) and no action is being processed on the thread (so its nested vector is null), the action is enqueue()-ed with the Agent.
- As part of the enqueue() process, the action is cons()-ed onto the Agent's ActionQueue. Here's where the unique circumstances come into play.
- At this point, we haven't really interacted with the Clojure runtime, specifically the clojure.lang.RT class, so its initiation machinery kicks in.
- Down in the depths, it executes transaction B to add a library to its list of loaded libraries.
- The still-existing-but-not-running thread-local transaction object fires up, runs and commits, with its existing, intact actions list, still containing action Z, enqueued during transaction A, which has not yet finished its post-commit process.
- The post-commit process for transaction B runs, including a nested attempt to dispatch action Z, again, which succeeds.
- The actions list is cleared before exiting the run() method.
- Upon returning way back up the stack to our not-quite-finished-post-processing transaction A, we continue iterating the now-cleared actions list, which promptly throws the ConcurrentModificationException.
A quick perusal of the LockingTransaction code shows that the only interaction with the actions list is adding an item to it in the LockingTransaction.enqueue() method, iterating it in the post-processing section of run() and clearing it in the finally clause of that section, so it's easy to see how a transaction started by any of the action-dispatching machinery would cause problems. Any such activity in the actions themselves would not be an issue, since they'd occur on other threads, but the dispatch stuff all runs on the same thread. The few moving parts that occur in this code seem fairly safe, as long as the runtime, clojure.lang.RT, is already initialized, but if that occurs during this phase, all bets appear to be off.
The attached Java class can be compiled and run with just the Clojure 1.4 JAR on the class path. With the change described near the end of the file (comment one line and uncomment another), the Clojure 1.5.1 JAR can be used, instead, producing the same result.
A single Agent named count is created, holding an Integer value of 1. A transaction is run which dispatches an action (referred to as Z in the above description) that will increment the value of count to 2. Following this, another action is dispatched to count to enable monitoring the completion of the incrementing action. Lastly, the final value of count is printed before the application exits.
Running the class with no command-line arguments produces the above-mentioned exception and prints an incorrect final result, due to action Z being run a second time as described in step 6.4. Running with any command-line argument triggers a simple workaround that just references a static value from the clojure.lang.RT class, which invokes the class initialization before anything else happens, such that the exception is not thrown and the correct result is produced.