Clojure

partition-by holds references to head groups longer than necessary

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Completed
  • Affects Version/s: Release 1.2
  • Fix Version/s: Release 1.3
  • Component/s: None
  • Labels:
    None
  • Environment:
    Clojure 1.2
  • Patch:
    Code
  • Approval:
    Ok

Description

partition-by continues to hold references to items of the input sequence where it should not anymore. That is after (rest s) there are still references held to the head group of the partition-by output sequence.

Activity

Hide
Meikel Brandmeyer added a comment -

Fix for partition-by by realising the drop.

Show
Meikel Brandmeyer added a comment - Fix for partition-by by realising the drop.
Hide
Meikel Brandmeyer added a comment -

I think, I finally found a way to demonstrate the issue. In the following memrem is a function, which just greedy requests memory until it is exhausted. Since SoftReference}}s are guaranteed to be cleared before an {{OutOfMemoryError is thrown, we can use this to check whether a reference to an item is retained.

user=> (def r (SoftReference. (byte-array 100000000)))
#'user/r
user=> (def s (rest (partition-by alength (list (.get r) (byte-array 100000000) (byte-array 200000000)))))
#'user/s
user=> (memrem)
.
.
java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:0)
user=> (prn (.get r))
#<byte[] [B@1cfb802>
nil
user=> (def s (seq s))
#'user/s
user=> (memrem)
.
.
.
.
java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:0)
user=> (prn (.get r))
nil
nil

We place a large array in a SoftReference and call partition-by such, that the array is referenced by the first group. Then we take the rest of the output sequence. Since no reference to the sequence head is retained the array should be now free for garbage collection.

However, after the memrem the SoftReference still references the array, which means there is another reference to the array. And in fact, if we actually realise the rest, this reference is gone. And after another memrem round trip the SoftReference is cleared.

The problem is that the call to drop in partition-by is not realised immediately. Hence it keeps a reference to the head of the input sequence until it is realised. However there is no reason why the drop should not immediately be realised. count is eager and hence realises the run, which in turn realises further items of the input sequence. So the elements are realised anyway. Adding a call to seq to realise the drop hence does not harm laziness of partition-by, but removes the undesired reference to the head of the input sequence. (Alternatively one could use nthnext instead of (seq (drop...)).

user=> (def r (SoftReference. (byte-array 100000000)))
#'user/r
user=> (def s (rest (partition-by-fixed alength (list (.get r) (byte-array 100000000) (byte-array 200000000)))))
#'user/s
user=> (memrem)
.
.
.
.
java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:0)
user=> (prn (.get r))
nil
nil

Note, how the SoftReference is already cleared already after call to rest.

Show
Meikel Brandmeyer added a comment - I think, I finally found a way to demonstrate the issue. In the following memrem is a function, which just greedy requests memory until it is exhausted. Since SoftReference}}s are guaranteed to be cleared before an {{OutOfMemoryError is thrown, we can use this to check whether a reference to an item is retained.
user=> (def r (SoftReference. (byte-array 100000000)))
#'user/r
user=> (def s (rest (partition-by alength (list (.get r) (byte-array 100000000) (byte-array 200000000)))))
#'user/s
user=> (memrem)
.
.
java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:0)
user=> (prn (.get r))
#<byte[] [B@1cfb802>
nil
user=> (def s (seq s))
#'user/s
user=> (memrem)
.
.
.
.
java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:0)
user=> (prn (.get r))
nil
nil
We place a large array in a SoftReference and call partition-by such, that the array is referenced by the first group. Then we take the rest of the output sequence. Since no reference to the sequence head is retained the array should be now free for garbage collection. However, after the memrem the SoftReference still references the array, which means there is another reference to the array. And in fact, if we actually realise the rest, this reference is gone. And after another memrem round trip the SoftReference is cleared. The problem is that the call to drop in partition-by is not realised immediately. Hence it keeps a reference to the head of the input sequence until it is realised. However there is no reason why the drop should not immediately be realised. count is eager and hence realises the run, which in turn realises further items of the input sequence. So the elements are realised anyway. Adding a call to seq to realise the drop hence does not harm laziness of partition-by, but removes the undesired reference to the head of the input sequence. (Alternatively one could use nthnext instead of (seq (drop...)).
user=> (def r (SoftReference. (byte-array 100000000)))
#'user/r
user=> (def s (rest (partition-by-fixed alength (list (.get r) (byte-array 100000000) (byte-array 200000000)))))
#'user/s
user=> (memrem)
.
.
.
.
java.lang.OutOfMemoryError: Java heap space (NO_SOURCE_FILE:0)
user=> (prn (.get r))
nil
nil
Note, how the SoftReference is already cleared already after call to rest.
Hide
Alexander Redington added a comment -

Tested the patch using the following snippet to compare HEAD and the patch:

(import java.lang.ref.SoftReference)
(def r (SoftReference. (byte-array 100000000)))
(def s (rest (partition-by alength (list (.get r) (byte-array 12500000) (byte-array 1250000)))))
(defn memrem [c]
      (recur (conj c (byte-array 12500000))))
(memrem ())
(prn (.get r))

Compared with Meikel's snippets, I lowered the array sizes in partition-by's victim to fit in my default VM args.

Attached patch appears to resolve the issue consistently when measured by this method.

Show
Alexander Redington added a comment - Tested the patch using the following snippet to compare HEAD and the patch:
(import java.lang.ref.SoftReference)
(def r (SoftReference. (byte-array 100000000)))
(def s (rest (partition-by alength (list (.get r) (byte-array 12500000) (byte-array 1250000)))))
(defn memrem [c]
      (recur (conj c (byte-array 12500000))))
(memrem ())
(prn (.get r))
Compared with Meikel's snippets, I lowered the array sizes in partition-by's victim to fit in my default VM args. Attached patch appears to resolve the issue consistently when measured by this method.
Hide
Fogus added a comment -

Ran through the logic of the change including the illustrative examples – the patch is very obscure, but cleverly done (the demonstration even more so). I considered adding a regression, but decided against it given that the nature of the illustrations are brittle with respect to the JVM settings. I would advise that a regression that does not rely on the generation of an OOME be eventually added, but that could come later unless RH objects.

Show
Fogus added a comment - Ran through the logic of the change including the illustrative examples – the patch is very obscure, but cleverly done (the demonstration even more so). I considered adding a regression, but decided against it given that the nature of the illustrations are brittle with respect to the JVM settings. I would advise that a regression that does not rely on the generation of an OOME be eventually added, but that could come later unless RH objects.
Hide
Stuart Sierra added a comment -

Patch is in incorrect format; please use "git format-patch" as per http://clojure.org/patches

Show
Stuart Sierra added a comment - Patch is in incorrect format; please use "git format-patch" as per http://clojure.org/patches
Hide
Meikel Brandmeyer added a comment -

Updated patch according git workflow.

Show
Meikel Brandmeyer added a comment - Updated patch according git workflow.
Hide
Stuart Sierra added a comment -

Patch applied.

Show
Stuart Sierra added a comment - Patch applied.

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: