<< Back to previous view

[CLJ-120] GC Issue 116: partition with pad Created: 17/Jun/09  Updated: 02/Dec/11  Resolved: 02/Dec/11

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Backlog

Type: Enhancement
Reporter: Anonymous Assignee: Stephen C. Gilardi
Resolution: Declined Votes: 0
Labels: None


 Description   
Reported by dimi...@gashinsky.com, May 09, 2009

;; A lot of times I needed a padding option on the partition. This is
;; my attempt to solve this problem. Any suggestions are welcome. I
;; hope this patch or something similar will make its way into the
;; core.

Some discussion that happened here:
http://groups.google.com/group/clojure/browse_frm/thread/6fcc1dd999a5ec02?tvc=1
was integrated into the patch.


 Comments   
Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

Converted from http://www.assembla.com/spaces/clojure/tickets/120
Attachments:
partition-with-pad.patch - https://www.assembla.com/spaces/clojure/documents/ctfCdww4qr3RbzeJe5afGb/download/ctfCdww4qr3RbzeJe5afGb
part.clj - https://www.assembla.com/spaces/clojure/documents/ckiilEyzqr3PTqeJe5afGb/download/ckiilEyzqr3PTqeJe5afGb
part.clj - https://www.assembla.com/spaces/clojure/documents/dmo9mEyzWr3QuceJe5aVNr/download/dmo9mEyzWr3QuceJe5aVNr
partition-with-pad-2.diff - https://www.assembla.com/spaces/clojure/documents/bolonky_8r3RY8eJe5afGb/download/bolonky_8r3RY8eJe5afGb
ticket-120.patch - https://www.assembla.com/spaces/clojure/documents/bnWYoqzBKr3RKeeJe5afGb/download/bnWYoqzBKr3RKeeJe5afGb

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

oranenj said: [file:ctfCdww4qr3RbzeJe5afGb]

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

chouser@n01se.net said: It looks to me like the 3- and 4-arg bodies could be combined resulting in less code and no significant loss of performance. A pad of nil could be treated the same as no pad supplied, which would be different from a numeric pad (including 0).

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: By "combined" do you mean that the 3 argument version should call the 4 argument version with an explicit pad of nil?

Currently, the 3 argument version does no padding. Instead it stops as soon as there are less than n args left:

user=> (partition 3 [1 2 3 4])
((1 2 3))
</code></pre>

In contrast, the 4 argument version with a pad of nil produces (untested):
<pre><code>user=> (partition 3 nil [1 2 3 4])
((1 2 3) (4 nil nil))

Interpreting nil passed to the 4 argument version as a request to get the behavior of the 3 argument version looks wrong to me. There would be no way to express both a desire for padding and that the padding value should be nil.

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: Looking into the issue further, I see I made a mistake in my previous comment. The padding is given as a sequence, not a value. However, the key difference between the 3 and 4 arg versions remains. The 3 argument version never returns a "short" sequence at the end. The 4 arg version can.

Along the lines of your suggestion, we could combine the two by changing the 4 arg version to interpret a pad value of "[]" to mean "return a short sequence at the end if necessary" and a pad value of "nil" to mean "never return a short sequence". This would involve interpreting "nil" different from "the empty sequence" where in many other contexts they're equivalent. I'm not sure whether or not saving some code is a good trade for introducing that subtle difference.

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

chouser@n01se.net said: I had also mistaken pad to be a number. I think you're right, producing different behavior when pad is an empty seq vs. nil is just asking for trouble. Thanks for taking a second look.

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

importer said: (In [[r:e0e8326871983be5615f5c0bc9dbf66140c7017f]]) add optional pad argument to partition. Fixes #120

Signed-off-by: Chouser <chouser@n01se.net>

Branch: master

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

digash said: The original version was using nil but Rich suggested not do it that way.
For different revisions take a look at this gist http://gist.github.com/109002

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

chouser@n01se.net said: Thanks for that link. Your final solution (not using nil) is already committed, but we should get those tests into clojure-test once it's location has been settled.

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

richhickey said: The new partition has this behavior:

user=> (partition 3 1 nil (range 10))
((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9) (8 9))
user=> (partition 3 1 [42] (range 10))
((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9) (8 9 42))

It seems to me that with a step of one it should never use the pad, and, more generally, once it has produced one partition containing the last element of the coll it should be done.

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

richhickey said: An alternative rule is that every step within the supplied coll is yielded, padding as supplied, which would mean ending with:

(7 8 9) (8 9) (9)
(7 8 9) (8 9 42) (9 42)
Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: [file:ckiilEyzqr3PTqeJe5afGb]: proposed alternative (see my comment)

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: I've uploaded part.clj which is another possible alternative. It handles a step of 1 without using the pad. Some invariants in the partitions it produces for the 4 argument case:

  • for a step size of n, if you provide n - 1 objects in the padding sequence, all output sequences will be of size n
  • for a step size n, the output sequences will begin with elements at offsets 0, n, 2n, 3n, ... in the original sequence until no such element exists.

Some outputs:

user=> (part/partition 3 1 nil (range 10))
((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9))
user=> (part/partition 3 1 [42] (range 10))
((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 9))
---
user=> (part/partition 3 2 nil (range 9))
((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8))
user=>  (part/partition 3 2 [42] (range 9))
((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8 42))
user=> (part/partition 3 2 [42 43] (range 9))
((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8 42 43))
---
user=> (part/partition 3 4 nil (range 6))
((0 1 2) (4 5))
user=> (part/partition 3 4 nil (range 7))
((0 1 2) (4 5 6))
user=> (part/partition 3 4 nil (range 8))
((0 1 2) (4 5 6))
user=> (part/partition 3 4 nil (range 9))
((0 1 2) (4 5 6) (8))
---
user=> (part/partition 3 3 [42 43] (range 3))
((0 1 2))
user=> (part/partition 3 3 [42 43] (range 4))
((0 1 2) (3 42 43))
user=> (part/partition 3 3 [42 43] (range 5))
((0 1 2) (3 4 42))
user=> (part/partition 3 3 [42 43] (range 6))
((0 1 2) (3 4 5))
user=> (part/partition 3 3 [42 43] (range 7))
((0 1 2) (3 4 5) (6 42 43))
</code></pre>

Here's the relevant portion of the code:

<pre><code>  ([n step pad coll]
     (lazy-seq
      (when-let [s (seq coll)]
        (let [p (take n s)]
          (cond (= n (count p))
                (cons p (partition n step pad (drop step s)))
                (>= step (count p))
                (list (take n (concat p pad)))))))))

Please see part.clj for details and a test program.

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: Ugh, the second "invariant" I listed doesn't hold for step = 1. It appears to hold for other step values. Perhaps part.clj will still be useful to someone in coming up with a better solution.

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

richhickey said: I don't see why step of 1 should get special treatment. If the rule is the second one (yield partitions as long as step offsets are present in original coll), then

(part/partition 3 1 nil (range 10))
should end with:
(7 8 9) (8 9) (9)
Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: This one appears to work:

([n step pad coll]
     (lazy-seq
      (when-let [s (seq coll)]
        (cons
         (take n (concat s pad))
         (partition n step pad (drop step s)))))))
</code></pre>

<pre><code>user=> (run part)
n = 3, pad = nil
:max 5 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4) (4))
:max 6 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5) (5))
:max 7 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6) (6))
:max 8 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7) (7))
:max 9 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8) (8))

:max 5 :step 2 ((0 1 2) (2 3 4) (4))
:max 6 :step 2 ((0 1 2) (2 3 4) (4 5))
:max 7 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6))
:max 8 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7))
:max 9 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8))

:max 5 :step 3 ((0 1 2) (3 4))
:max 6 :step 3 ((0 1 2) (3 4 5))
:max 7 :step 3 ((0 1 2) (3 4 5) (6))
:max 8 :step 3 ((0 1 2) (3 4 5) (6 7))
:max 9 :step 3 ((0 1 2) (3 4 5) (6 7 8))

:max 5 :step 4 ((0 1 2) (4))
:max 6 :step 4 ((0 1 2) (4 5))
:max 7 :step 4 ((0 1 2) (4 5 6))
:max 8 :step 4 ((0 1 2) (4 5 6))
:max 9 :step 4 ((0 1 2) (4 5 6) (8))

n = 3, pad = [42 43]
:max 5 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 42) (4 42 43))
:max 6 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 42) (5 42 43))
:max 7 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 42) (6 42 43))
:max 8 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 42) (7 42 43))
:max 9 :step 1 ((0 1 2) (1 2 3) (2 3 4) (3 4 5) (4 5 6) (5 6 7) (6 7 8) (7 8 42) (8 42 43))

:max 5 :step 2 ((0 1 2) (2 3 4) (4 42 43))
:max 6 :step 2 ((0 1 2) (2 3 4) (4 5 42))
:max 7 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 42 43))
:max 8 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7 42))
:max 9 :step 2 ((0 1 2) (2 3 4) (4 5 6) (6 7 8) (8 42 43))

:max 5 :step 3 ((0 1 2) (3 4 42))
:max 6 :step 3 ((0 1 2) (3 4 5))
:max 7 :step 3 ((0 1 2) (3 4 5) (6 42 43))
:max 8 :step 3 ((0 1 2) (3 4 5) (6 7 42))
:max 9 :step 3 ((0 1 2) (3 4 5) (6 7 8))

:max 5 :step 4 ((0 1 2) (4 42 43))
:max 6 :step 4 ((0 1 2) (4 5 42))
:max 7 :step 4 ((0 1 2) (4 5 6))
:max 8 :step 4 ((0 1 2) (4 5 6))
:max 9 :step 4 ((0 1 2) (4 5 6) (8 42 43))

nil
user=>
Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: [file:dmo9mEyzWr3QuceJe5aVNr]: improved version

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

richhickey said: This looks fine to me, do you want to make up a patch Steve?

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: [file:bolonky_8r3RY8eJe5afGb]: modified per comments

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: The patch includes and updated doc string, the new 4 argument case, and a change to the whitespace in the 3 argument case for consistent indentation with the 4 argument case (current emacs clojure-mode). I can provide a patch that doesn't touch the 3 argument case whitespace if desired.

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

richhickey said: I think the doc should be even more explicit, something like:

... do not overlap. If no pad argument is supplied, will produce only complete partitions of size n, possibly not including items at the end if the size of coll is not a multiple of n. If the pad argument is supplied, will produce a partition at every offset present in the supplied collection, using the pad elements as necessary to pad trailing partitions up to n items each. If pad is nil or a collection containing fewer than n-1 items, ...

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: How about this:

"Returns a lazy seq of lazy subseqs of coll, each of (nominally) n
  items. The subseqs begin at offsets 0, step, 2*step, etc. in coll. If
  step is not supplied, it defaults to n yielding adjacent, non-overlapping
  subseqs. If pad is not supplied, produces only complete subseqs of n
  items, possibly not including some items at the end of coll. If pad is
  supplied, produces a subseq at every offset present in coll, using any
  available items from pad to pad shorter subseqs up to n items. If pad is
  a seq of at least n-1 items, produces only complete padded subseqs of n
  items. If pad is shorter (or nil) trailing padded subseqs may be
  shorter."

I adopted the mathematical terminology that the partition is the operation (the division into parts) and that the individual pieces are not "partitions", but something else: part, block, chunk, or as I propose here, subseq. I like subseq because it embodies succinctly the fact that the items from coll are always kept sequential.

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

richhickey said: I'm now convinced we are cramming 2 functions into one, and would prefer to see this new functionality as a new function:

(take-subs n coll)
(take-subs n step coll)

'padding' isn't a necessary concept, as we have concat. take implies the possible partial subseqs. Also take-subs can yield a lazy seq of lazy seqs, but partition can't.

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: [file:bnWYoqzBKr3RKeeJe5afGb]: take-subs + tests

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: OK, paddiing isn't necessary because the same thing can be accomplished by using concat to append a padding sequence of exactly n-1 items to coll before processing it with partition.

partition can't return lazy subseqs because it counts them which (in the general case) will realize them.

ticket-120.patch contains modified docs for partition, removed arity 4 case from partition, take-subs implemented, tests for take-subs based on tests for partition, enhanced one sub-test for partition.

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

digash said: I like the new take-subs, but I cannot figure out how to use concat instead of partition without counting it and realizing the whole sequence.
The initial motivation was to use partition with destructuring and creating matrix from a sequences. I do not see an easy way to reproduce this case with the new implementation.

The old implementation:
(for [[a b c] (partition 3 3 (repeat 0) [1 2 3 4])] [a b c]) ==> ([1 2 3] [4 0 0])

The new implementation:
(take 10 (let [coll [1 2 3 4] p 3] (take (/ (count coll) p) (for [[a b c] (take-subs p (concat coll (repeat 0)))] [a b c])))) ==> ([1 2 3] [4 0 0])

Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: Here are two options for how I would write the example using the most recent patch:

user=> (for [[a b c] (partition 3 (concat [1 2 3 4] (take 3 (repeat 0))))] [a b c])
([1 2 3] [4 0 0])
user=> (for [[a b c] (partition 3 (concat [1 2 3 4] [0 0 0]))] [a b c])
([1 2 3] [4 0 0])
user=> 
</code></pre>or in the general case:
<pre><code>user=> (defn padded-partition
[n pad coll]
(partition n (concat coll (take (dec n) pad))))
#'user/padded-partition
user=> (for [[a b c] (padded-partition 3 (repeat 0) [1 2 3 4])] [a b c])
([1 2 3] [4 0 0])
user=>
Comment by Assembla Importer [ 28/Sep/10 7:52 AM ]

scgilardi said: My first code example above was incorrect. For proper operation with all combinations of n and step, the concatenated padding seq needs to be exactly n-1 in length.

Here's the correction:

user=> (for [[a b c] (partition 3 (concat [1 2 3 4] (take 2 (repeat 0))))] [a b c])
([1 2 3] [4 0 0])
user=> (for [[a b c] (partition 3 (concat [1 2 3 4] [0 0]))] [a b c])
([1 2 3] [4 0 0])
user=>
Comment by Chouser [ 18/Nov/11 11:20 PM ]

partition has a pad argument now. Can this be closed?

Comment by Stuart Halloway [ 02/Dec/11 12:17 PM ]

partition has pad now

Generated at Thu Dec 18 21:22:52 CST 2014 using JIRA 4.4#649-r158309.