Clojure

ConcurrentModificationException thrown during action dispatching after commit in LockingTransaction.run()

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Declined
  • Affects Version/s: Release 1.4, Release 1.5, Release 1.6
  • Fix Version/s: Release 1.6
  • Component/s: None
  • Labels:
  • Environment:
  • Patch:
    Code
  • Approval:
    Vetted

Description

Summary

Using the Clojure 1.4 library strictly from Java code, a simple transaction dispatches an action to an Agent. When called from a simple driver, such as a unit test, where there is no interaction with the Clojure library/runtime (specifically, clojure.lang.RT), a ConcurrentModificationException is thrown from inside LockingTransaction.run() while it is iterating through the actions list, dispatching each action to its Agent after committing the transaction.

While the circumstances under which this occurs are probably fairly rare and a simple workaround exists (see final paragraph), thus the "Minor" priority, it seems like it would not be very complicated to fix LockingTransaction to handle the actions list more safely.

Analysis

Based on some debugging, here's what seems to be happening:

  1. Transaction A is run, dispatching action Z, which gets added, via LockingTransaction.enqueue(), to the actions list, which is a java.util.ArrayList<Agent.Action>.
  2. Transaction A completes and is successfully committed.
  3. LockingTransaction.run() does post-commit cleanup, freeing locks and putting a stop() to transaction A, which nulls the transaction's Info object reference.
  4. Notifications are sent and we start iterating the list of actions to be dispatched.
  5. The run() method calls Agent.dispatchAction(). Because the thread's transaction object is no longer considered to be "running" (due to the Info object being null) and no action is being processed on the thread (so its nested vector is null), the action is enqueue()-ed with the Agent.
  6. As part of the enqueue() process, the action is cons()-ed onto the Agent's ActionQueue. Here's where the unique circumstances come into play.
    1. At this point, we haven't really interacted with the Clojure runtime, specifically the clojure.lang.RT class, so its initiation machinery kicks in.
    2. Down in the depths, it executes transaction B to add a library to its list of loaded libraries.
    3. The still-existing-but-not-running thread-local transaction object fires up, runs and commits, with its existing, intact actions list, still containing action Z, enqueued during transaction A, which has not yet finished its post-commit process.
    4. The post-commit process for transaction B runs, including a nested attempt to dispatch action Z, again, which succeeds.
    5. The actions list is cleared before exiting the run() method.
  7. Upon returning way back up the stack to our not-quite-finished-post-processing transaction A, we continue iterating the now-cleared actions list, which promptly throws the ConcurrentModificationException.

A quick perusal of the LockingTransaction code shows that the only interaction with the actions list is adding an item to it in the LockingTransaction.enqueue() method, iterating it in the post-processing section of run() and clearing it in the finally clause of that section, so it's easy to see how a transaction started by any of the action-dispatching machinery would cause problems. Any such activity in the actions themselves would not be an issue, since they'd occur on other threads, but the dispatch stuff all runs on the same thread. The few moving parts that occur in this code seem fairly safe, as long as the runtime, clojure.lang.RT, is already initialized, but if that occurs during this phase, all bets appear to be off.

Test Case

The attached Java class can be compiled and run with just the Clojure 1.4 JAR on the class path. With the change described near the end of the file (comment one line and uncomment another), the Clojure 1.5.1 JAR can be used, instead, producing the same result.

A single Agent named count is created, holding an Integer value of 1. A transaction is run which dispatches an action (referred to as Z in the above description) that will increment the value of count to 2. Following this, another action is dispatched to count to enable monitoring the completion of the incrementing action. Lastly, the final value of count is printed before the application exits.

Running the class with no command-line arguments produces the above-mentioned exception and prints an incorrect final result, due to action Z being run a second time as described in step 6.4. Running with any command-line argument triggers a simple workaround that just references a static value from the clojure.lang.RT class, which invokes the class initialization before anything else happens, such that the exception is not thrown and the correct result is produced.

Patch: clj-1260-fixws.diff

Screened by:

  1. clj-1260.diff
    18/Oct/13 2:57 PM
    2 kB
    Guillermo Winkler
  2. clj-1260-fixws.diff
    19/Oct/13 5:32 PM
    2 kB
    Andy Fingerhut
  3. STMAgentInitBug.java
    12/Sep/13 3:59 AM
    2 kB
    Brandon Ibach

Activity

Alex Miller made changes -
Field Original Value New Value
Approval Vetted [ 10003 ]
Priority Minor [ 4 ] Major [ 3 ]
Fix Version/s Release 1.6 [ 10157 ]
Labels STM
Alex Miller made changes -
Priority Major [ 3 ] Critical [ 2 ]
Alex Miller made changes -
Priority Critical [ 2 ] Major [ 3 ]
Guillermo Winkler made changes -
Patch Code [ 10001 ]
Attachment clj-1260.diff [ 12341 ]
Andy Fingerhut made changes -
Attachment clj-1260-fixws.diff [ 12343 ]
Alex Miller made changes -
Description h3. Summary

Using the Clojure 1.4 library strictly from Java code, a simple transaction dispatches an action to an {{Agent}}. When called from a simple driver, such as a unit test, where there is no interaction with the Clojure library/runtime (specifically, {{clojure.lang.RT}}), a {{ConcurrentModificationException}} is thrown from inside {{LockingTransaction.run()}} while it is iterating through the {{actions}} list, dispatching each action to its {{Agent}} after committing the transaction.

While the circumstances under which this occurs are probably fairly rare and a simple workaround exists (see final paragraph), thus the "Minor" priority, it seems like it would not be very complicated to fix {{LockingTransaction}} to handle the {{actions}} list more safely.

h3. Analysis

Based on some debugging, here's what seems to be happening:

# Transaction _A_ is run, dispatching action _Z_, which gets added, via {{LockingTransaction.enqueue()}}, to the {{actions}} list, which is a {{java.util.ArrayList<Agent.Action>}}.
# Transaction _A_ completes and is successfully committed.
# {{LockingTransaction.run()}} does post-commit cleanup, freeing locks and putting a {{stop()}} to transaction _A_, which nulls the transaction's {{Info}} object reference.
# Notifications are sent and we start iterating the list of actions to be dispatched.
# The {{run()}} method calls {{Agent.dispatchAction()}}. Because the thread's transaction object is no longer considered to be "running" (due to the {{Info}} object being null) and no action is being processed on the thread (so its {{nested}} vector is null), the action is {{enqueue()}}-ed with the {{Agent}}.
# As part of the {{enqueue()}} process, the action is {{cons()}}-ed onto the {{Agent}}'s {{ActionQueue}}. Here's where the unique circumstances come into play.
## At this point, we haven't really interacted with the Clojure runtime, specifically the {{clojure.lang.RT}} class, so its initiation machinery kicks in.
## Down in the depths, it executes transaction _B_ to add a library to its list of loaded libraries.
## The still-existing-but-not-running thread-local transaction object fires up, runs and commits, with its existing, intact {{actions}} list, still containing action _Z_, enqueued during transaction _A_, which has not yet finished its post-commit process.
## The post-commit process for transaction _B_ runs, including a nested attempt to dispatch action _Z_, again, which succeeds.
## The {{actions}} list is cleared before exiting the {{run()}} method.
# Upon returning way back up the stack to our not-quite-finished-post-processing transaction _A_, we continue iterating the now-cleared {{actions}} list, which promptly throws the {{ConcurrentModificationException}}.

A quick perusal of the {{LockingTransaction}} code shows that the only interaction with the {{actions}} list is adding an item to it in the {{LockingTransaction.enqueue()}} method, iterating it in the post-processing section of {{run()}} and clearing it in the {{finally}} clause of that section, so it's easy to see how a transaction started by any of the action-dispatching machinery would cause problems. Any such activity in the actions themselves would not be an issue, since they'd occur on other threads, but the dispatch stuff all runs on the same thread. The few moving parts that occur in this code seem fairly safe, as long as the runtime, {{clojure.lang.RT}}, is already initialized, but if that occurs during this phase, all bets appear to be off.

h3. Test Case

The attached Java class can be compiled and run with just the Clojure 1.4 JAR on the class path. With the change described near the end of the file (comment one line and uncomment another), the Clojure 1.5.1 JAR can be used, instead, producing the same result.

A single {{Agent}} named {{count}} is created, holding an {{Integer}} value of {{1}}. A transaction is run which dispatches an action (referred to as _Z_ in the above description) that will increment the value of {{count}} to {{2}}. Following this, another action is dispatched to {{count}} to enable monitoring the completion of the incrementing action. Lastly, the final value of {{count}} is printed before the application exits.

Running the class with no command-line arguments produces the above-mentioned exception and prints an incorrect final result, due to action _Z_ being run a second time as described in step 6.4. Running with any command-line argument triggers a simple workaround that just references a static value from the {{clojure.lang.RT}} class, which invokes the class initialization before anything else happens, such that the exception is not thrown and the correct result is produced.
h3. Summary

Using the Clojure 1.4 library strictly from Java code, a simple transaction dispatches an action to an {{Agent}}. When called from a simple driver, such as a unit test, where there is no interaction with the Clojure library/runtime (specifically, {{clojure.lang.RT}}), a {{ConcurrentModificationException}} is thrown from inside {{LockingTransaction.run()}} while it is iterating through the {{actions}} list, dispatching each action to its {{Agent}} after committing the transaction.

While the circumstances under which this occurs are probably fairly rare and a simple workaround exists (see final paragraph), thus the "Minor" priority, it seems like it would not be very complicated to fix {{LockingTransaction}} to handle the {{actions}} list more safely.

h3. Analysis

Based on some debugging, here's what seems to be happening:

# Transaction _A_ is run, dispatching action _Z_, which gets added, via {{LockingTransaction.enqueue()}}, to the {{actions}} list, which is a {{java.util.ArrayList<Agent.Action>}}.
# Transaction _A_ completes and is successfully committed.
# {{LockingTransaction.run()}} does post-commit cleanup, freeing locks and putting a {{stop()}} to transaction _A_, which nulls the transaction's {{Info}} object reference.
# Notifications are sent and we start iterating the list of actions to be dispatched.
# The {{run()}} method calls {{Agent.dispatchAction()}}. Because the thread's transaction object is no longer considered to be "running" (due to the {{Info}} object being null) and no action is being processed on the thread (so its {{nested}} vector is null), the action is {{enqueue()}}-ed with the {{Agent}}.
# As part of the {{enqueue()}} process, the action is {{cons()}}-ed onto the {{Agent}}'s {{ActionQueue}}. Here's where the unique circumstances come into play.
## At this point, we haven't really interacted with the Clojure runtime, specifically the {{clojure.lang.RT}} class, so its initiation machinery kicks in.
## Down in the depths, it executes transaction _B_ to add a library to its list of loaded libraries.
## The still-existing-but-not-running thread-local transaction object fires up, runs and commits, with its existing, intact {{actions}} list, still containing action _Z_, enqueued during transaction _A_, which has not yet finished its post-commit process.
## The post-commit process for transaction _B_ runs, including a nested attempt to dispatch action _Z_, again, which succeeds.
## The {{actions}} list is cleared before exiting the {{run()}} method.
# Upon returning way back up the stack to our not-quite-finished-post-processing transaction _A_, we continue iterating the now-cleared {{actions}} list, which promptly throws the {{ConcurrentModificationException}}.

A quick perusal of the {{LockingTransaction}} code shows that the only interaction with the {{actions}} list is adding an item to it in the {{LockingTransaction.enqueue()}} method, iterating it in the post-processing section of {{run()}} and clearing it in the {{finally}} clause of that section, so it's easy to see how a transaction started by any of the action-dispatching machinery would cause problems. Any such activity in the actions themselves would not be an issue, since they'd occur on other threads, but the dispatch stuff all runs on the same thread. The few moving parts that occur in this code seem fairly safe, as long as the runtime, {{clojure.lang.RT}}, is already initialized, but if that occurs during this phase, all bets appear to be off.

h3. Test Case

The attached Java class can be compiled and run with just the Clojure 1.4 JAR on the class path. With the change described near the end of the file (comment one line and uncomment another), the Clojure 1.5.1 JAR can be used, instead, producing the same result.

A single {{Agent}} named {{count}} is created, holding an {{Integer}} value of {{1}}. A transaction is run which dispatches an action (referred to as _Z_ in the above description) that will increment the value of {{count}} to {{2}}. Following this, another action is dispatched to {{count}} to enable monitoring the completion of the incrementing action. Lastly, the final value of {{count}} is printed before the application exits.

Running the class with no command-line arguments produces the above-mentioned exception and prints an incorrect final result, due to action _Z_ being run a second time as described in step 6.4. Running with any command-line argument triggers a simple workaround that just references a static value from the {{clojure.lang.RT}} class, which invokes the class initialization before anything else happens, such that the exception is not thrown and the correct result is produced.

*Patch:* clj-1260-fixws.diff

*Screened by:*
Alex Miller made changes -
Resolution Declined [ 2 ]
Status Open [ 1 ] Closed [ 6 ]

People

Vote (0)
Watch (4)

Dates

  • Created:
    Updated:
    Resolved: